<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hate and Ofensive language detection using BERT for English Subtask A</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Md Saroar Jahan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Djamila Romaissa Beddiar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mourad Oussalah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nabil Arhab</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>yazid bounab</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Oulu, Faculty of Information Tech., CMVS</institution>
          ,
          <addr-line>PO Box 4500, Oulu 90014</addr-line>
          ,
          <country country="FI">FINLAND</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents the results and main findings of the HASOC-2021 Hate/Ofensive Language Identiifcation Subtask A. The work consisted of fine-tuning pre-trained transformer networks such as BERT and an ensemble of diferent models, including CNN and BERT. We have used the HASOC-2021 English 3.8k annotated twitter dataset. We compare current pre-trained transformer networks with and without Masked-Language-Modelling (MLM) fine-tuning on their performance for ofensive language detection. Among diferent BERT MLM fine-tuned BERT-base, BERT-large, and ALBERT outperformed other models; however, BERT and CNN ensemble classifier that applies majority voting outperformed other models, achieving 85.1% F1 score on both hate/non-hate labels. Our final submission achieved 77.0 F1 in the HASOC-2021 competition.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BERT fine-tuning</kwd>
        <kwd>Ofensive language identification</kwd>
        <kwd>Hate speech</kwd>
        <kwd>BERT performance comparison</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The emergence of Web 2.0 platforms that enabled user-generated content and participatory
culture has witnessed the proliferation of online hate speech at an unprecedented level,
increasing the likelihood of random people of any age group being subject to online harassment and
abuse through some internet forum message board or social network platform. Hate speech
is a complex phenomenon, intrinsically associated with relationships between groups, and
relies on language nuances. Nobata et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] define hate speech as - "Language which attacks
or demeans a group based on race, ethnic origin, religion, gender, age, disability, or sexual
orientation/gender identity".
      </p>
      <p>
        In the past few years, the automatic detection of hate speech, cyber-bullying, or aggressive
and ofensive language became a vividly studied task in natural language processing (NLP).
Past research has examined various characteristics of ofensive language such as the cyber
aggression [2, 3], abusive language [
        <xref ref-type="bibr" rid="ref1">1, 4</xref>
        ], hate speech [5, 6], Racism [7] and ofensive language
[8, 9].
      </p>
      <p>Several workshops (e.g., SemEval-2019[10], SemEval-2020[11], HASOC-2019[12],
HASOC2020[13] ) have been organized to find the state-of-the-art practices and new solutions for
eficient ofensive text identification.</p>
      <p>For example, in SemEval-2019, Task A (ofensive language detection) was the most popular
sub-task with 104 participating teams. Among the top-performing team Liu et al. [14] used
BERT-base-uncased with default-parameters, with a max sentence length of 64 and trained for 2
epochs and achieved 82.9% F1 score. The top nonBERT model by Mahata et al. [15] was ranked
iffth. They used an ensemble of CNN and BLSTM+BGRU.</p>
      <p>In SemEval-20, 145 teams submitted oficial runs. The best team Wiedemann et al. [ 16]
achieved an F1 score of 0.9204 using an ensemble of ALBERT models of diferent sizes. The
top-10 teams were close to each other and employed BERT, RoBERTa or XLM-RoBERTa models;
sometimes CNNs and LSTMs were also mentioned either for comparison or hybridization
purposes.</p>
      <p>In HASOC-2020, over 40 research groups participated in HASOC-2020 competition. The
top-ranked submission for Hindi-hate speech detection used a CNN with FastText embeddings
as input [17]. The best performance for German hate speech detection task was achieved using
a fine-tuned versions of BERT, DistilBERT and RoBERTa [ 18]. Similarly, the top performance in
English-language hate speech detection was based on BERT and another deep learning-based
model.</p>
      <p>This year 2021, HASOC[19] [20] ofers three diferent tasks and a separate dataset for each
subtask. Subtask-1A ofers tasks in English, Hindi with 2 problems, and Marathi with 1 problem.
The subtasks-1B dataset contains English, Hindi, and subtask2 code-mixed Hindi tweets. In our
participation, we have participated in subtask-A for English identification of Hate/ofensive
Twitter posts, and used the HASOC-2021 provided a dataset for training and validation.
Regarding the state-of-art practice[21] in the field of hate-speech text detection, our contribution in
this paper is threefold:
1. We compare diferent pre-trained transformer-based neural network model’s performance
and explain model performance.
2. We study how an additional fine-tuning step with masked language modeling (MLM) of
the best individual model conducted on in-domain data afects the model performance.
3. An ensemble of the diferent models presented, including CNN+BERT.</p>
      <p>The paper is structured as follows. In Section 2, we describe our methodology (dataset
annotation schema, preprocessing, and classifier architectures including the machine learning
models and the associated feature engineering), In Section 3, the details of our result are reported.
In Section 4, an error analysis is carried out. Finally, conclusive statements are drawn in Section
5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>The overall experimentation methodology includes a three-stage process: (i) data collection
and preprocessing (ii) experiment machine learning (ML) models architecture, and (iii) error
analysis. The experiment environment is the same for all experiments (e.g., data preprocessing,
ML architecture, test data, and error analysis). See Figure 1 for a high level description of our
methodology whose details are presented in the following subsections.</p>
      <sec id="sec-2-1">
        <title>2.1. Dataset</title>
        <p>To train our models and compare our results with state-of-the-art models, we used English
twitter dataset from HASOC-2021. The HASOC task organizer already annotated datasets
for English subtask A. Table 1 shows an example of datasets and annotation. For instance, if
the Twitter post contains any hate or ofensive word or represents any ofensive context, it is
considered HOF (hate or/and ofensive), otherwise NOT. The total size of the dataset is 3843,
among which 2051(65%) contain HOF and the rest 1342 (35%) NOT.</p>
        <p>1. (NOT) Non Hate-Ofensive - This post does not contain any Hate speech, profane, ofensive
content.</p>
        <p>2. (HOF) Hate and Ofensive - This post contains Hate, ofensive, and profane content.</p>
        <p>Example of Tweet Task A label
"@hemantmkpandya @news24tvchannel @Aloksharmaaicc @man- HOF
akgupta You are a donkey that’s why only one is talking."
"@For 18-18 hours the termite went and hollowed out a 70 year old NOT
strong tree in 7 years !!. #ResignModi"
"Fattu hai bjp wala #CruelMamata #BengalViolence #BengalBurning" HOF
"Goodbye Sher-e-Bihar. . . May Allah bless Tala Saheb from high to NOT
high in Jannatul Firdous Amen. JusticeForShahabuddin"
Dataset Size
3843</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Dataset Prepossessing</title>
        <p>We have removed special characters, numbers (e.g., @,0-9), newlines, mention tags, and links
for data preprocessing. We have not removed hashtags since we found them important for
subsequent reasoning. Table 2 shows an example of preprocessing.</p>
        <p>Before preprocessing After preprocessing
@hemantmkpandya @news24tvchannel @Alok- You are a donkey that’s why only one is talking
sharmaaicc @manakgupta You are a donkey
that’s why only one is talking.</p>
        <p>Do not look away. #IndiaCovidCrisis Do not look away. #IndiaCovidCrisis
https://t.co/oHsnIXlEla</p>
        <p>In order to quantify the influence of the various preprocessing units, we carried out a simple
task of HASOC-2021 HOF accuracy rate using Logistic Regression (LR) classifier with TF-IDF
features whose results are summarized in Table 3. One can see for instance, that the use of
uppercase to lowercase conversion and emoji removal in the preprocessing stage does not afect
the overall result. However, Newline + Tab Token, mention tag, and URL + Special Characters
removal worked well and improved almost 0.5% in performance accuracy. Since hashtag (#)
removal decreases 0.8% performance, we have not removed hashtags from our dataset. This
provides a basis for optimal preprocessing pipeline to be used in subsequent tasks.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Models setup</title>
        <p>We have used a set of well acknowledged models in hate speech detection tasks as per previous
HOF competitions. Especially, three types of classifiers have been utilized: BERT models, CNN
and baseline LR model with TF-IDF features as follows:</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Convolution Neural Network (CNN) Model</title>
          <p>we adopted [22] CNN, architecture, where the input layer is represented as a concatenation of
the words forming the post (up to 70 words), except that each word is now represented by its
FastText embedding representation with 300 embedding vectors. A convolution 1D operation
with a kernel size 3 was used together with a max-over-time pooling operation over the feature
map with a layer dense 50. Dropout on the penultimate layer with a constraint on l2-norms of
the weight vector was used for regularization. The details of the implementation are reported
on our GitHub page of this project with datasets and codes1.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Transformer Network Models</title>
          <p>BERT – is the Bidirectional Encoder Representations from Transformers: this seminal
transformerbased language model applies an attention mechanism that enables learning contextual relations
between words in a text sequence [23] (Devlin et al., 2019). Two training strategies that BERT
follows:
1. MLM : where 15 % of the tokens in a sequence replaced (masked) for which the model
learns to predict the original tokens, and
2. Next sentence prediction (NSP): here the model receives two sentences as input and model
learns whether the second sentence is a successor of the first sentence in their original
document context.</p>
          <p>RoBERTa – is a replication of BERT developed by Facebook [24] with known as Robustly
Optimized BERT Pretraining Approach with the following modifications:
1. training the model longer with bigger batches as well as more and cleaner data and discard
the NSP objective,
2. training on longer sequences, and
3. dynamically changes the masking patterns, e.g. taking care of masking complete
multiword units. .</p>
          <p>XLM-RoBERTa – XLM-R: this is a cross-lingual or multilingual version of RoBERTa which
is trained on more than 100 languages at once [25] (Conneau et al., 2019).</p>
          <p>ALBERT – represent A Lite BERT, which is a alteration on BERT especially to overcome
training time and memory limitations issues [14] (Lan et al., 2019): The main contributions that
ALBERT makes over BERT are:
1. decomposing the embedding parameters into smaller matrices that will be projected to
the hidden space separately,
2. in contrast to BERT’s simpler NSP objective it based on sentence order prediction (SOP),
share parameters across layers to improve or stabilize the learned parameters.</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>2.3.3. Ensemble model</title>
          <p>We created an ensemble model using majority voting rule. Especially, we tested ensemble model
of i) All BERT models; ii) BERT-large-uncased + BERT-base-uncased + ALBERT-xxlarge-v2; iii)
CNN + BERT-large-uncased + BERT-base-uncased + ALBERT-xxlarge-v2. In the case of an even
number of models, the ensemble model takes into account the decision weight generated by
each classifier to yield the final output (HOF versus NOT).</p>
          <p>1https://github.com/saroarjahan/HASOC-2021-TASKA (accessed September 08, 2021)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment Setup</title>
      <p>Initially, we employed a random split of the original dataset into 80% for training and 20% for
testing and validation, ensuring the same proportion of dataset for all kinds of model learning.
Four types of classifiers were implemented: Logistics regression (LR) with word-level TF-IDF,
Convolution Neural Network (CNN) with FastText word embedding, and BERT pre-trained
model. For BERT model setup, we fine-tuned diferent transformer models with the
HASOC2021 training data using the corresponding test data for validation. The following models
were tested: BERT-base and BERT-large (uncased), RoBERTa-base and RoBERTa-large,
XLMRoBERTa, BERT-multilingual, and four diferent ALBERT models (large-v1, large-v2, xxlarge-v1,
and xxlarge-v2). Each model was fine-tuned for 6 epochs with a learning rate of 5e-6, maximum
sequence length of 128, and batch size 4. After each epoch, the model was evaluated on the
validation set. The best-performing epoch was saved for the ensembling. We tested ensemble
models by majority vote from all models such as BERT-base, BERT-large, and ALBERT.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Table 4 shows the results of binary ofensive language detection (assuming all tweets as either
hate or non-hate) using LR baseline, CNN with word embedding, as well as for the individual
ifne-tuned transformer models and their corresponding ensembles. CNN, BERT-base-uncased,
BERT-large-uncased, and ALBERT-xxlarge-v2 transformer models largely outperform the LR
baseline. Comparing BERT and CNN models, BERT-based-uncased and BERT-large-uncased
slightly (.1%) outperform CNN model. CNN exhibits a much better performance compared to
baseline and some BERT models. achieving 83.3% F1 score.</p>
      <p>Our best individual model is BERT-based-uncased with an F1-score of 83.4 %. The experiment
also showed that BERT-uncased performed 2% better than BERT-cased. When comparing the
diferent pre-trained transformer models, interesting results emerged as well. For example,
none of the Multilingual BERT models has outperformed the BERT-large or BERT-base model.
This was not fully a surprise since the multilingual pre-trained model was trained on the top
104 languages with the largest Wikipedia using a masked language modeling (MLM) objective;
however, it contains only a small percentage of English tokens. Therefore, it might have fallen
short when the HASOC-2021 dataset is mono English.</p>
      <p>We expected dehateBERT would perform better since it pertained only to English and included
several hate corpus; however, the results showed only 79.4% F1 scores. One possible explanation
could due to our dataset’s diverse pattern. Since this dataset only focuses on the Twitter dataset
and all the tweets collected from recent tweets and recent events (e.g., Indian politics, Covid).
Therefore, dehateBERt cannot generalize much since it was only pre-trained with some specific
hate dataset, limiting the dimension of pre-trained data.</p>
      <p>Regarding the ensembles of model variants, we can see that the performance is very marginal
when all transformers are ensembled together. This is probably due to the fact that some
transformers model have not performed well at first hand, so mixing low performed transformed
models have reduced the overall performance. However, when we ensembled only the best
performing models (BERT-large, BERT-base, and ALBERT), we see our models have increased
1% F1 scores. Furthermore, when we ensembled CNN with best performing BERT models, the
performance further improved.</p>
      <p>We have submitted our best ensemble (CNN + BERT-large-uncased + BERT-base-uncased +
ALBERT-xxlarge-v2) results to the HASOC-21 competition, and oficial result, we have received
an 77% F1 score (Table 5).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Error Analysis</title>
      <p>Although we have obtained 85.5% accuracy and 85.1% F1 score, the model still exhibits a portion
of false detection. To understand this phenomenon better, we performed an error analysis of
the model’s performance. For this purpose, we randomly prepared subsets of test data, then
manually inspected the classifier output. Error test data contained 200 samples, among which
68 were non-hate speech and 132 were hate samples.</p>
      <p>From figure 2, we see that most of the error is related to false positive (FP), where our
classifier is not performing much while detecting non-hate samples. In contrast, 1414 hate
samples were correctly detected, and only 47 resulted as false negative (FN). Since our train
dataset majority (65%) contains hate samples, it seems that our model is better trained or biased
towards hate-ofensive class. Table 6 represents FP and FN metrics. Here, we can see that some
data were actually incorrectly annotated (4 false predictions, 3 were wrongly annotated). Since
this test data was a split of original data, it indicates that our model performs much better, some
false predictions are actually cast into correct prediction, and some of the error comes from the
dataset annotation itself.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>After last year’s HASOC-2020 shared task on ofensive language detection, BERT models
emerged as state-of-the-art in HOF, although a fine-tuning of the models is still challenging and
open to debate. This motivates our approach in this paper where ensembling state-of-the-art
BERT models and CNN, while seeking optimal preprocessing strategy is promoted. Especially,
in 2021 competition, we performed diferent experiments of twitter preprocessing, BERT-fine
tune models, and an ensemble of other models. Our tweet preprocessing showed removing the
mentioned tag and removing special characters and URLs were useful and increased almost 1%
of models’ performance. However, eliminating hashtags and stemming reduced the performance
of the model. Among diferent transformers, BERT-base outperformed other models, including
BERT-multilingual, RoBERTA, and Dehatebert-mono-english. Among BERT-cased and uncased
versions, BERT-uncased showed better performance compared to BERT-cased. Surprisingly,
CNN performed much better than most BERT models and performed close to the best performing
BERT model. In addition, the ensemble of best-performing models showed further improvement.
Our best test ensemble models showed 85.1 F1 scores, and the oficial submission result showed
77% f1 scores.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>This project was partially funded by EU Project WaterLine (Downscaling Remotely Sensed
Products to Improve Hydrological Modelling Performance), and EU Project YoungRes (#823701),
which are gratefully acknowledged.
[2] R. Kumar, A. K. Ojha, S. Malmasi, M. Zampieri, Benchmarking aggression identification
in social media, in: Proceedings of the First Workshop on Trolling, Aggression and
Cyberbullying (TRAC-2018), 2018, pp. 1–11.
[3] D. R. Beddiar, M. S. Jahan, M. Oussalah, Data expansion using back translation and
paraphrasing for hate speech detection, Online Social Networks and Media 24 (2021)
100153.
[4] H. Mubarak, K. Darwish, W. Magdy, Abusive language detection on arabic social media,
in: Proceedings of the first workshop on abusive language online, 2017, pp. 52–56.
[5] Y. J. Foong, M. Oussalah, Cyberbullying system detection and analysis, in: 2017 European</p>
      <p>Intelligence and Security Informatics Conference (EISIC), IEEE, 2017, pp. 40–46.
[6] C. Abderrouaf, M. Oussalah, On online hate speech detection. efects of negated data
construction, in: 2019 IEEE International Conference on Big Data (Big Data), IEEE, 2019,
pp. 5595–5602.
[7] I. Kwok, Y. Wang, Locate the hate: Detecting tweets against blacks, in: Proceedings of the</p>
      <p>AAAI Conference on Artificial Intelligence, volume 27, 2013.
[8] M. Wiegand, M. Siegel, J. Ruppenhofer, Overview of the germeval 2018 shared task on the
identification of ofensive language (2018).
[9] M. S. Jahan, Team oulu at semeval-2020 task 12: Multilingual identification of ofensive
language, type and target of twitter post using translated datasets, in: Proceedings of the
Fourteenth Workshop on Semantic Evaluation, 2020, pp. 1628–1637.
[10] M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, R. Kumar, Semeval-2019 task 6:
Identifying and categorizing ofensive language in social media (ofenseval), arXiv preprint
arXiv:1903.08983 (2019).
[11] M. Zampieri, P. Nakov, S. Rosenthal, P. Atanasova, G. Karadzhov, H. Mubarak, L.
Derczynski, Z. Pitenis, Ç. Çöltekin, Semeval-2020 task 12: Multilingual ofensive language
identification in social media (ofenseval 2020), arXiv preprint arXiv:2006.07235 (2020).
[12] T. Mandl, S. Modha, P. Majumder, D. Patel, M. Dave, C. Mandlia, A. Patel, Overview of the
hasoc track at fire 2019: Hate speech and ofensive content identification in indo-european
languages, in: Proceedings of the 11th forum for information retrieval evaluation, 2019,
pp. 14–17.
[13] T. Mandl, S. Modha, A. Kumar M, B. R. Chakravarthi, Overview of the hasoc track at fire
2020: Hate speech and ofensive language identification in tamil, malayalam, hindi, english
and german, in: Forum for Information Retrieval Evaluation, 2020, pp. 29–32.
[14] P. Liu, W. Li, L. Zou, Nuli at semeval-2019 task 6: Transfer learning for ofensive language
detection using bidirectional transformers, in: Proceedings of the 13th international
workshop on semantic evaluation, 2019, pp. 87–91.
[15] D. Mahata, H. Zhang, K. Uppal, Y. Kumar, R. Shah, S. Shahid, L. Mehnaz, S. Anand, Midas
at semeval-2019 task 6: Identifying ofensive posts and targeted ofense from twitter,
in: Proceedings of the 13th International Workshop on Semantic Evaluation, 2019, pp.
683–690.
[16] G. Wiedemann, S. M. Yimam, C. Biemann, Uhh-lt &amp; lt2 at semeval-2020 task 12:
Finetuning of pre-trained transformer networks for ofensive language detection, arXiv preprint
arXiv:2004.11493 (2020).
[17] R. Raja, S. Srivastavab, S. Saumyac, Nsit &amp; iiitdwd@ hasoc 2020: Deep learning model for
hate-speech identification in indo-european languages (2021).
[18] B. L. R. Kumar, B. Lahiri, A. K. Ojha, A. Bansal, Comma@ fire 2020: Exploring multilingual
joint training across diferent classification tasks, in: Working Notes of FIRE 2020-Forum
for Information Retrieval Evaluation, Hyderabad, India, 2020.
[19] T. Mandl, S. Modha, G. K. Shahi, H. Madhu, S. Satapara, P. Majumder, J. Schäfer, T.
Ranasinghe, M. Zampieri, D. Nandini, A. K. Jaiswal, Overview of the HASOC subtrack at FIRE
2021: Hate Speech and Ofensive Content Identification in English and Indo-Aryan
Languages, in: Working Notes of FIRE 2021 - Forum for Information Retrieval Evaluation,
CEUR, 2021. URL: http://ceur-ws.org/.
[20] S. Modha, T. Mandl, G. K. Shahi, H. Madhu, S. Satapara, T. Ranasinghe, M. Zampieri,
Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Ofensive Content
Identification in English and Indo-Aryan Languages and Conversational Hate Speech, in:
FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, 13th-17th December
2021, ACM, 2021.
[21] M. S. Jahan, M. Oussalah, A systematic review of hate speech automatic detection using
natural language processing, arXiv preprint arXiv:2106.00742 (2021).
[22] Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint
arXiv:1408.5882 (2014).
[23] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[24] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint
arXiv:1907.11692 (2019).
[25] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at
scale, arXiv preprint arXiv:1911.02116 (2019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nobata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tetreault</surname>
          </string-name>
          , A. Thomas,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mehdad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Abusive language detection in online user content</article-title>
          ,
          <source>in: Proceedings of the 25th international conference on world wide web</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>153</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>