<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Zeus at HASOC 2020: Hate speech detection based on ALBERT-DPCNN</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Siyao Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rui Fu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jie Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Information Science and Engineering, Yunnan University</institution>
          ,
          <addr-line>Yunnan</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The use of social media has grown rapidly in the past few years. User generated data often contains objectionable content. Identifying hate speech, cyber-attacks and ofensive language is a very challenging sentiment analysis task. In this paper, we participated in HASOC's English hate speech and ofensive content identification task and proposed the ALBERT-DPCNN model based on emotion analysis, which combined ALBERT and DPCNN obtained richer semantic features, ranking third in the task and achieving good results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ALBERT-DPCNN</kwd>
        <kwd>Hate speech</kwd>
        <kwd>Ofensive language</kwd>
        <kwd>Sentiment analysis</kwd>
        <kwd>HASOC</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In the Internet era, new social networking platforms with high openness have attracted a
large number of global netizens, the amount of information in social media is increasing
exponentially. However, individuals or groups with questionable motives seize the opportunity to
spread extreme hate speech by taking advantage of the fast dissemination of information and
large user groups on social networking platforms. Not only that, some people have reflected
their xenophobic thoughts in real life through the form of group conflicts and violent attacks.</p>
      <p>
        The current approach to tackle the problem of a large portion of hateful posts is filtering.
However, society also needs to ensure that freedom of speech is maintained and social norms
are not violated. Social media needs to censor a lot of content, and censoring and removing
hateful or ofensive words is a fairly cumbersome process. Therefore, people are aware of the
importance of this research, Hate Speech and Ofensive Content Detection in Indo-European
Languages (HASOC 2020 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) provides multi-language research with over 10,000 annotated
tweets from Twitter. HASOC provides 2 subtasks for each language, such as English, German,
and Hindi.
      </p>
      <p>We apply deep learning methods to participate in two subtasks of English, and propose the
ALBERT-DPCNN model to complete the text classification of this task, which depends on very
little preprocessing and feature engineering compared with other methods.</p>
      <p>The rest of the paper is organized into four parts. The second section discusses in detail what
we have done. Next, we will detail the method used by ALBERT-DPCNN in section 3. Then,
the fourth section analyzes our experimental results. Finally, we conclude the paper in section
5 and discuss future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Over the years, as social media has become more popular, abusive language has become more
common on these platforms. Waseem used SVM and LR classifiers to detect racist or sexist
content[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In recent years, there has been a growing body of research on ofensive language.
The existing technologies for the detection of ofensive language and hate speech in social
media are found mainly through Bag-of-words (BOW)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Recurrent Neural Networks (RNN)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
and Word embedding[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In these years of research, the RNN model has achieved good results
in sentiment analysis tasks. Serra et al.[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]achieved good results using character-based RNN to
detect hate speech in tweets. Nobata et al.[7] used the regression model and obtained
character n-gram features that were the most predictive in detecting ofensive remarks through
comparison. Vijay et al.[8] designed a classification system based on sentiment analysis from
the perspective of machine learning. Badjatiya et al.[9] introduced a deep learning approach,
using CNN, LSTM, and other deep learning models to detect ofensive remarks in English and
Hindi. Self-attention[10] technology has been widely used in text classification in recent years.
While BERT[11] model uses the pre-training technology to further increase the generalization
ability of the word vector model and fully describe the character-level, word-level,
sentencelevel, and even inter-sentence relationship characteristics. As mentioned above, there has been
a lot of research on sentiment analysis in many diferent code-mixing types of languages. In
this paper, we propose a language model (ALBERT-DPCNN) architecture based on ALBERT to
detect hate speech, ofensive language, and blasphemy.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data and methodology</title>
      <p>In this section, we introduce the method we used. We try traditional machine learning and
neural network, as well as pre-training methods. Because of ALBERT’s recent success in sentiment
analysis and other language-processing tasks, by comparison, We choose the model based on
ALBERT for the HASOC task.</p>
      <sec id="sec-3-1">
        <title>3.1. Data description</title>
        <p>The data is provided by the HASOC organizer, and we present the statistics of the HASOC
dataset in the following table. In the English task, there were 3708 posts in the training set
and 814 posts in the test set. The texts of Sub-task A is labeled as HOF (Hate and Ofensive)
and NOT (Non Hate-Ofensive), whereas Sub-task B is denoted as NONE, HATE (Hate speech),
OFFN (Ofensive), PRFN (Profane) respectively. For Sub-task A, the number of posts in diferent
categories is about the same, but for Sub-task B, the number of posts in diferent categories is
not similar.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. ALBERT</title>
        <p>The architecture of ALBERT and BERT is similar, using the Transformer encoder and GELU
nonlinear activation function, but ALBERT has a much smaller number of parameters as
compared to the traditional BERT architecture. As with Transformer encoders, the encoder consists
of two layers, a self-attention layer, and a feedforward neural network. Self-attention helps the
current node to focus on more than just the current word, thus getting the semantics of the
context. The decoder also includes the two layers of the network mentioned by the encoder,
but in between the two layers is the attention layer, which helps the current node get the key
content that needs to be paid attention at present. ALBERT takes word sequence as input and
embedding operation of the input data. Embedding finished, the data will be entered into the
encoder layer. After self-attention finish handling the data, the data will be delivered to the
feedforward neural network, and the computation of the feedforward neural network can be
parallel, the result will be output to the next encoder.
3.3. DPCNN
Rie Johnson et al.[12]proposed a deep convolutional neural network called DPCNN (Deep
Pyramid Convolutional Neural Networks). This is a wide and efective deep text
classification convolutional neural network based on word-level, which can extract long-distance text
dependency by deepening the network continuously.</p>
        <p>DPCNN is mainly composed of a Region embedding layer (text area embedding layer) and
two convolution blocks (each block is composed of two convolution functions with a fixed
convolution kernel of 3 and Max-pooling layer), as shown in Figure 1 below. DPCNN adopts
the method of pre-activation. During convolution operation, the feature set of input will pass
through the Max-pooling layer and then be input into the convolution layer. In other words,
The output of the convolution operation linearly activates is   ( ) +  , not  (  +  ).</p>
        <p>After passing through a pooling layer with size 3 and stride 2, the length of the sequence is
reduced to half of the original length. The sequence goes through this pooling operation, with
a convolution kernel of size 3, the pieces of text it can perceive are twice as long. Because of the
Max-pooling layer described earlier, the length of the text sequence decreases exponentially as
the number of blocks increases, causing the sequence length to take on the shape of a pyramid
as the network deepens.
The model makes full use of the content of the whole sentence, vectorize sentences, extract
useful linguistic, syntactic, and semantic features. It takes into account the efect of each word
in context on the other words and the diferent meanings of the same word in diferent contexts.</p>
        <p>First, to adapt the model to downstream tasks, the input sequence adds a [CLS] token at the
beginning of the sentence, [SEP] token used as a separator between sentences or as a flag at the
end of a sentence. For the classification task, ALBERT’s output (output of the pooling layer) is
obtained by passing the [CLS] token of the sequence through the last Hidden layer, containing
the information of the entire sequence. However, output of the pooling layer is usually not
the best result for distinguishing input semantic content. So, once we get the output of the
pooling layer, the model output the first token ([CLS] token) of the last four Hidden layers to
DPCNN. After extracting the text context features, connect them to the classifier. In this way,
the advantages of feature extraction of ALBERT and DPCNN can be utilized, simultaneously,
and the semantics of the text can be well explained. The model architecture is shown in Figure
2.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <sec id="sec-4-1">
        <title>4.1. Experiments Setup</title>
        <p>For this work, we use the ALBERT-DPCNN model, which is implemented based on Pytorch.
We set up stratified 5-fold cross-validation with 42 random seeds for training (StratifiedKFold 1),
and use the form of stratified grouping so that the ratio of each class in each group is as close
as possible to that of each class in the overall data.</p>
        <p>We use Adam optimizer with a learning rate of 2e-5 and CrossEntropy Loss. The epochs
and max sentence length are 3 and 120, respectively. And the batch size is set to 32, and the
gradient steps are set to 4.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>In this work, we will introduce the evaluation results we submitted. Evaluation is carried out
by HASOC task organizer, the results are shown in Table 3. Both Sub-task A and Sub-task B
are evaluated by F1 Macro-Average. Finally, the model we submitted ranked 25th with an F1
score of 0.4954 in English Sub-task A, and 3rd with an F1 score of 0.2619 in English Sub-task B.</p>
        <p>We used the hidden layer state of ALBERT to obtain richer semantic features. As can be seen
from Table 3, our model has achieved good results in English Sub-task B.</p>
        <p>1https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>With the increasing popularity and influence of social media texts, analyzing the emotions
attached to texts becomes more and more important. In this paper, we present the
ALBERTDPCNN model to handle HASOC tasks. Our model achieved remarkable performance and
came in second place in English Sub-task B. The research results provide a strong basis for
further research on hate speech in multiple languages.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>I would like to thank the organizers for their hard work and Rui Fu for her help. Her advice
has been very helpful to me.
[7] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, Y. Chang, Abusive language detection in
online user content, in: Proceedings of the 25th international conference on world wide
web, 2016, pp. 145–153.
[8] D. Vijay, A. Bohra, V. Singh, S. S. Akhtar, M. Shrivastava, Corpus creation and emotion
prediction for hindi-english code-mixed social media text, in: Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational
Linguistics: Student Research Workshop, 2018, pp. 128–135.
[9] P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection
in tweets, in: Proceedings of the 26th International Conference on World Wide Web
Companion, 2017, pp. 759–760.
[10] M. Choi, H. Kim, B. Han, N. Xu, K. M. Lee, Channel Attention Is All You Need for Video</p>
      <p>Frame Interpolation., in: AAAI, 2020, pp. 10663–10671.
[11] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
[12] R. Johnson, T. Zhang, Deep pyramid convolutional neural networks for text
categorization, in: Proceedings of the 55th Annual Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), 2017, pp. 562–570.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC track at FIRE 2020: Hate Speech and Ofensive Content Identification in Indo-European Languages)</article-title>
          ,
          <source>in: Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation</source>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Waseem</surname>
          </string-name>
          ,
          <article-title>Are you a racist or am i seeing things? Annotator influence on hate speech detection on twitter</article-title>
          ,
          <source>in: Proceedings of the first workshop on NLP and computational social science</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>138</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Badjatiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Varma</surname>
          </string-name>
          ,
          <article-title>Deep learning for hate speech detection in tweets</article-title>
          ,
          <source>in: Proceedings of the 26th International Conference on World Wide Web Companion</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>759</fpage>
          -
          <lpage>760</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awekar</surname>
          </string-name>
          ,
          <article-title>Deep learning for detecting cyberbullying across multiple social media platforms</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>153</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Djuric</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grbovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Radosavljevic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bhamidipati</surname>
          </string-name>
          ,
          <article-title>Hate speech detection with comment embeddings</article-title>
          ,
          <source>in: Proceedings of the 24th international conference on world wide web</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Serra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Leontiadis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spathis</surname>
          </string-name>
          , G. Stringhini,
          <string-name>
            <given-names>J.</given-names>
            <surname>Blackburn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakali</surname>
          </string-name>
          ,
          <article-title>Class-based prediction errors to detect hate speech with out-of-vocabulary words</article-title>
          ,
          <source>in: Proceedings of the First Workshop on Abusive Language Online</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>