<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identification of Offensive Language in Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lutfiye Seda Mut Altin LaSTUS-TALN Research Group</string-name>
          <email>yeseda.mut01@estudiant.upf.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Pompeu Fabra C/Ta`nger 122-140</institution>
          ,
          <addr-line>08018 Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>50</fpage>
      <lpage>55</lpage>
      <abstract>
        <p>Recent work shows that offensive language in social media is a serious problem that affects especially vulnerable groups. Therefore, systems designed to detect offensive language automatically have been the focus of attention of several works. Various Machine Learning approaches have been utilised for the classification of offensive text data. Within the scope of this research we aim to develop a neural network system that will effectively classify offensive text considering different aspects of it. In addition, multilingual and multi-task learning experiments are planned.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Social media has become one of the most
important environments for communication
among people. As user-generated content on
social media increases significantly, so does
the harmful content such as offensive
language. Aggressiveness in social media is a
problem that especially affects vulnerable groups
        <xref ref-type="bibr" rid="ref8">(Hamm et al., 2015)</xref>
        ,
        <xref ref-type="bibr" rid="ref9">(Kowalski and Limber,
2013)</xref>
        . Within this context, the need for
automatic detection of offensive content gains a
lot of attraction.
      </p>
      <p>Traditional methods to detect offensive
language include use of blacklisted keywords
and phrases based on profane words, regular
expressions, guidelines and human
moderators to manually review and detect
unwanted content. However, these methods are not
sufficient, particularly considering the users
that tend to use more obfuscated and
implicit expressions.</p>
      <p>
        Automatic identification of offensive
language is essentially considered as a
classification task. Previous research on the topic
include approaches from different
perspectives, utilizing different data sets and focusing
on various contents such as abusive language
        <xref ref-type="bibr" rid="ref14">(Waseem et al., 2017)</xref>
        <xref ref-type="bibr" rid="ref4">(Chu, Jue, and Wang,
2016)</xref>
        , hate speech
        <xref ref-type="bibr" rid="ref14 ref5">(Davidson et al., 2017)</xref>
        <xref ref-type="bibr" rid="ref12">(Schmidt and Wiegand, 2017)</xref>
        <xref ref-type="bibr" rid="ref15 ref18 ref7">(Fortuna and
Nunes, 2018)</xref>
        and cyberbullying
        <xref ref-type="bibr" rid="ref11 ref13">(Van Hee et
al., 2018)</xref>
        .
      </p>
      <p>
        Where machine learning approaches are
of concern,
        <xref ref-type="bibr" rid="ref14 ref5">(Davidson et al., 2017)</xref>
        indicated
using certain terms and lexicons are useful.
        <xref ref-type="bibr" rid="ref15 ref18 ref7">(Zhang, Robinson, and Tepper, 2018)</xref>
        compared different approaches and pointed out
that a deep neural network model
combining convolutional neural network and long
short-term memory network, performed
better than state of the art, including classifiers
such as SVM.
      </p>
      <p>
        There are several previous shared tasks
similar to offensive language detection. The
shared task on Aggression Identification
called ’TRAC’ provided participants a
dataset containing annotated Facebook posts and
comments in English and Hindi
        <xref ref-type="bibr" rid="ref10 ref11">(Kumar et
al., 2018)</xref>
        . Aiming to classify the text among
three classes including nonaggressive,
covertly aggressive, and overtly aggressive. The
best-performing systems in this task used
deep learning approaches based on
convolutional neural networks (CNN), recurrent
neural networks and LSTM
        <xref ref-type="bibr" rid="ref11 ref15 ref18 ref7">(Majumder, Mandl,
and others, 2018)</xref>
        . The Spanish language has
also been considered. For example, in the
recent shared task, MEX-A3T 2018,
regarding aggression detection in Mexican
Spanish; among the methodologies proposed by
participants, there were content based (bag of
words, word n-grams, dictionary words, slang
words etc.) and stylistic-based features
(frequencies, punctuations, POS etc.) as well as
approaches based on neural networks (CNN,
LSTM and others); baselines were
outperformed by the most participants
        <xref ref-type="bibr" rid="ref10 ref11 ref2">(A´
lvarezCarmona et al., 2018)</xref>
        . Furthermore, other
shared tasks focusing on aggression in other
languages include Italian, German
        <xref ref-type="bibr" rid="ref11 ref3">(Bosco et
al., 2018)</xref>
        ,
        <xref ref-type="bibr" rid="ref15 ref18 ref7">(Wiegand, Siegel, and
Ruppenhofer, 2018)</xref>
        . One of the most recent shared task
on the topic is “Categorizing Offensive
Language in Social Media” (SemEval 2019 - Task
6)
        <xref ref-type="bibr" rid="ref16 ref17">(Zampieri et al., 2019b)</xref>
        . Referring to the
problem in a hierarchichal scheme including
the target type of the offense. To classify
offensive text, about 70 % of the participants
used deep learning approaches. Among the
top-10 teams, seven used BERT
        <xref ref-type="bibr" rid="ref11 ref6">(Devlin et
al., 2018)</xref>
        .
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology and Proposed</title>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>After an extensive literature review,
collection of additional previous datasets related
to the topic and preliminary experiments; we
started to experiments through shared tasks
as described below.
2.1</p>
      <sec id="sec-3-1">
        <title>Participation to ‘Categorizing Offensive Language in Social Media (SemEval 2019-Task 6)’</title>
        <p>
          A bi-LSTM neural network model that has
been developed
          <xref ref-type="bibr" rid="ref1">(Altin, Serrano, and Saggion,
2019)</xref>
          within the context of the participation
to shared task which is called ‘Categorizing
Offensive Language in Social Media
(SemEval 2019 - Task 6)’, focusing on identification
of offensive language by considering type and
target of the offense into account
          <xref ref-type="bibr" rid="ref16 ref17">(Zampieri
et al., 2019b)</xref>
          .
        </p>
        <p>This model consists of a
bidirectional Long Short-Term Memory Networks
(biLSTM) model with an Attention layer on
top. The model captures the most important
semantic information in a tweet, including
emojis and hashtags. A simplified schema of
our model can be seen in the following figure.</p>
        <p>Figura 1: Schema of the model</p>
        <p>First, the tweets were tokenized removing
punctuation marks and keeping emojis and
full hashtags because can contribute to
define the meaning of a tweet. Second, the
embedding layer transforms each element in the
tokenized tweet (such as words, emojis and
hashtags) into a low-dimension vector. The
embedding layer, composed of the vocabulary
of the task, was randomly initialized from a
uniform distribution (between -0.8 and 0.8
values and with 300 dimensions). Recent
studies have reported that pre-trained word
embeddings are far more satisfactory than the
randomly initialized embeddings (Erhan et
al., 2010; Kim, 2014). For that reason, the
initialized embedding layer was updated with
the word vectors included in a pre-trained
model based on all the tokens, emojis and
hashtags from 20M English tweets (Barbieri
et al., 2016), which were updated during the
training.</p>
        <p>Then, a biLSTM layer gets high-level
features from previous embeddings. The LSTM
were introduced by Hochreiter and
Schmidhuber (1997) and were explicitly designed
to avoid the longterm dependency problem.
LSTM systems keep relevant information of
inputs by incorporating a loop enabling data
to flow from one step to the following. LSTM
gets a word embedding sequentially, left to
right, at each time step, produces a hidden
step and keeps its hidden state through time.
Whereas, biLSTM does the same process as
standard LSTM, but processes the text in a
left to right as well as right-to-left order in
parallel. Therefore, gives two hidden state as
output at each step and is able to capture
backwards and longrange dependencies.</p>
        <p>A critical and apparent disadvantage of
seq2seq models (such as LSTM) is that they
compress all information into a fixed-length
vector, causing the incapability of
remembering long tweets. Attention mechanism aims
to overcome the limitation of fixed-length
vector keeping relevant information from long
tweet sequences. In addition, attention
techniques have been recently demonstrated
success in multiple areas of the Natural
Language Processing such as question answering,
machine translations, speech recognition and
relation extraction (Bahdanau et al., 2014;
Hermann et al., 2015; Chorowski et al., 2015;
Zhou et al., 2016). For that reason, we added
an attention layer, which produces a weight
vector and merge word-level features from
each time step into a tweet-level feature
vector, by multiplying the weight vector. Finally,
the tweet-level feature vector produced by
the previous layers is used for classification
task by a fully-connected layer. Furthermore,
we applied dropout regularization in order to
alleviate overfitting. Dropout operation sets
randomly to zero a proportion of the
hidden units during forward propagation,
creating more generalizable representations of
data. As in Zhou et al. (2016), we employ
dropout on the embedding layer, biLSTM layer
and before the output layer. The dropout
rate was set to 0.5 in all cases.
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Experimenting with Multi-task Learning: Initial Experiments on Aggressiveness detection</title>
        <p>In this work, we presented a bi-LSTM model
with two dense layers at the end. We have
developed a system in the context of the shared
task: MEX-A3T: Authorship and
aggressiveness analysis in twitter. Specifically, the
Aggressiveness Identification track, which
focuses on the detection of aggressive comments
in tweets from Mexican users and the other
related IberLEF 2019 shared tasks.</p>
        <p>We have used data from different tasks
in order to train more examples in the
model. As we believe that the tasks of humor
and sentiment analysis could help in
detecting aggressive language, we have selected
three additional task to train with MEX-A3T
at the same time. The other tasks were
IroSva, that aims investigating the recognition
of irony in Twitter messages in three
different Spanish variants (from Spain, Mexico,
and Cuba); HAHA which we used the
classification task related to identify if a Spanish
tweet is a joke or not and TASS 2019 that
focuses on the evaluation of polarity
classification systems of tweets written in Spanish.
We used the data related to this task, tweets
written in the Spanish language spoken in
Spain, Peru, Costa Rica, Uruguay and
Mexico, which were annotated with 4 different
levels of opinion intensity (Positive,
Negative, Neutral and Nothing).</p>
        <p>Figura 2: Simplified schema of the multi- task
model</p>
        <p>In this scenario, we defined an
Embedding layer for each Spanish variant in IroSva
task. Classification tasks with the same
Spanish variant used the same Embedding layer
during the training process. Furthermore, all
task shared the biLSTM layer during
training. For the moment this approach was not
very successful; however this may be due to
lack of data to train the different models.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Current work</title>
      <p>Despite the progress in this shared task,
there are potential issues for the future work.
Future experiments were planned mainly in
2 groups:</p>
      <p>
        First, improvement areas will be
investigated for the efficiency of the classification
model developed for SemEval 2019 - Task 6
shared task, with the same dataset that is
called Offensive Language Identification
Dataset (OLID)
        <xref ref-type="bibr" rid="ref1 ref16 ref17">(Zampieri et al., 2019a)</xref>
        .
      </p>
      <p>Initial experiments have been done taking
only the words into account. Using
additional features such as WordNet synsets, Part
of Speech (POS) tags, frequencies, offensive
word dictionaries and so on, is expected to
improve the precision of the results.</p>
      <p>
        Furthermore, changes in the methodology
such as applying ’Bidirectional Encoder
Representations from Transformers’
        <xref ref-type="bibr" rid="ref11 ref6">(Devlin et.
al,2018)</xref>
        is also another option.
      </p>
      <p>Secondly, in a later phase of the study,
it is planned to obtain a new dataset using
Twitter’s streaming API and
crowdannotation and using the new dataset for the
experiments including the metadata such as
usersession time, whether it is a reply or a
retweet.</p>
      <p>For this purpose, first of all, a set of
specific hashtags will be decided with a high
potential of being associated with offensive
tweets.</p>
      <p>After pulling the data and deciding the
annotation scheme, the data will be presented
for crowd annotation.</p>
      <p>After compilation of a corpus, model
training will be carried out with the system given
the most promising results for the OLID
dataset.</p>
      <p>Additional improvements for the system
design and other potential features will be
experimented considering the performance of
the preliminary tests.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Specific Issues of Investigation</title>
      <p>The main research questions that are
intended to answer with this work are as the
following:</p>
      <p>•What algorithms are those that provide
us with greater accuracy to identify offensive
language in a text?</p>
      <p>•What characteristics should be taken
into account in the process of analysing text in
terms of aggressiveness?</p>
      <p>•What type of metadata would be useful
to increase the accuracy while analysing the
text?</p>
      <p>•Finally, how would be the overall system
for this classification task that will bring the
highest accuracy?
5</p>
    </sec>
    <sec id="sec-6">
      <title>Thesis Objectives</title>
      <p>The main objectives of the research can be
listed as follows:</p>
      <p>I.Executing preliminary experiments to
classify offensive messages in social media
(particularly tweets and short messages)
datasets.</p>
      <p>There are several published datasets
belonging previous researches that is
annotated as Offensive or within the similar
context such as cyberbullying, hate speech
related, misogyny 1,2,3.According to the
specific annotation scheme and the content,
handcrafted features might have an important
parameter for the performance. Experimenting
on these previous datasets will help
understanding the strengths and weaknesses of
different design specifications and features and
eventually help optimization of them.</p>
      <p>II.Experiments to improve the
performance of the current system with fine-tuned
system design and feature engineering.</p>
      <p>The neural network system for the initial
experiments took only words into account.
However, there is a potential to improve the
results of this system with additional feature
extraction. Furthermore, detailed analysis on
integration of linguistic annotations into
neural network and other models like convolution
can be considered to improve the
performance.</p>
      <p>III.Creating a new dataset with crowd
annotation. There are several crowd annotation
platforms such as: Mechanical Turk4 ,
crowdflower5 , crowdtruth6 . By uploading the
data and deciding the rules of annotation these
platforms help annotating the data by human
annotators.</p>
      <p>To crowd-annotate tweet data, first of all,
the data will be pulled from Twitter API
according to certain hashtags. Hashtags will be
decided for certain contexts such as political
debate hashtags or hashtags related to
sportive rivalry. After that, annotation schema will
be decided. Annotation schema of previous
datasets are usually in hierarchical order and
contains additional information such as
target or for instance if it contains aggression
whether it is cyberbullying or not.</p>
      <p>IV.Experiments on the new dataset with
various approaches on the system and
features.</p>
      <p>A new dataset can give the opportunity to
1https://www.kaggle.com/alternacx/hateoffensivespeechdetection
2https://www.amnesty.org/en/
3https://zenodo.org/record/1184178.XTBv2pMzaRt
4https://www.mturk.com/
5https://www.figure-eight.com/
6http://crowdtruth.org/
reproduce previous well-performed systems
designed. Moreover, majority of the related
datasets published do not include
metadata. With the new dataset collected through
Twitter API it will be possible to obtain
metadata, as well. Therefore, user-related
features such as the frequency of profanity in
previous messages can be obtained and it would
help understand the importance of metadata
on the performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Altin</surname>
            ,
            <given-names>L. S. M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            ` . B.
            <surname>Serrano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Saggion</surname>
          </string-name>
          .
          <year>2019</year>
          . Lastus/taln at semeval
          <article-title>-2019 task 6: Identification and categorization of offensive language in social media with attention-based bi-lstm model</article-title>
          .
          <source>In Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          , pages
          <fpage>672</fpage>
          -
          <lpage>677</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>A´ lvarez-</article-title>
          <string-name>
            <surname>Carmona</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>A´</article-title>
          .,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Guzm´an-Falc´on, M. Montes-y G´omez, H</article-title>
          . J.
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Villasenor-Pineda</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Reyes-Meza</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Rico-Sulayes</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of mexa3t at ibereval 2018: Authorship and aggressiveness analysis in mexican spanish tweets</article-title>
          .
          <source>In Notebook Papers of 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL)</source>
          , Seville, Spain, volume
          <volume>6</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Felice</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Poletto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Maurizio</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 hate speech detection task</article-title>
          .
          <source>In EVALITA 2018-Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</source>
          , volume
          <volume>2263</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . CEUR.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jue</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Comment abuse classification with deep learning</article-title>
          . Von https://web. stanford. edu/class/cs224n/reports/2762092. pdf abgerufen.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Davidson</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warmsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Macy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Automated hate speech detection and the problem of offensive language</article-title>
          .
          <source>In Eleventh international aaai conference on web and social media.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Fortuna</surname>
            , P. and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Nunes</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A survey on automatic detection of hate speech in text</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>51</volume>
          (
          <issue>4</issue>
          ):
          <fpage>85</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Hamm</surname>
            ,
            <given-names>M. P.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Newton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chisholm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shulhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Milne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sundar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ennis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Scott</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Hartling</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Prevalence and effect of cyberbullying on children and young people: A scoping review of social media studies</article-title>
          .
          <source>JAMA pediatrics</source>
          ,
          <volume>169</volume>
          (
          <issue>8</issue>
          ):
          <fpage>770</fpage>
          -
          <lpage>777</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kowalski</surname>
            , R. M. and
            <given-names>S. P.</given-names>
          </string-name>
          <string-name>
            <surname>Limber</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Psychological, physical, and academic correlates of cyberbullying and traditional bullying</article-title>
          .
          <source>Journal of Adolescent Health</source>
          ,
          <volume>53</volume>
          (
          <issue>1</issue>
          ):
          <fpage>S13</fpage>
          -
          <lpage>S20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Ojha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Benchmarking aggression identification in social media</article-title>
          .
          <source>In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Majumder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Filtering aggression from the multilingual social media feed</article-title>
          .
          <source>In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)</source>
          , pages
          <fpage>199</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A survey on hate speech detection using natural language processing</article-title>
          .
          <source>In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Van Hee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , G. Jacobs,
          <string-name>
            <given-names>C.</given-names>
            <surname>Emmery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Desmet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lefever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Verhoeven</surname>
          </string-name>
          , G. De Pauw,
          <string-name>
            <given-names>W.</given-names>
            <surname>Daelemans</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Hoste</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Automatic detection of cyberbullying in social media text</article-title>
          .
          <source>PloS one</source>
          ,
          <volume>13</volume>
          (
          <issue>10</issue>
          ):
          <fpage>e0203794</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Waseem</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warmsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Understanding abuse: A typology of abusive language detection subtasks</article-title>
          .
          <source>arXiv preprint arXiv:1705</source>
          .
          <fpage>09899</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ruppenhofer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the germeval 2018 shared task on the identification of offensive language</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Zampieri</surname>
            , M.,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Farra</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          . 2019a.
          <article-title>Predicting the type and target of offensive posts in social media</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .09666.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Zampieri</surname>
            , M.,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Rosenthal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Farra</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          . 2019b. Semeval
          <article-title>-2019 task 6: Identifying and categorizing offensive language in social media (offenseval)</article-title>
          .
          <source>arXiv preprint arXiv:1903</source>
          .08983.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Robinson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Tepper</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Detecting hate speech on twitter using a convolution-gru based deep neural network</article-title>
          .
          <source>In European Semantic Web Conference</source>
          , pages
          <fpage>745</fpage>
          -
          <lpage>760</lpage>
          . Springer.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>