<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>YUN_DE at HASOC2020 subtask A: Multi-Model Ensemble Learning for Identifying Hate Speech and Ofensive Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zichen Zhang</string-name>
          <email>zczhang@mail.ynu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuhang Wu</string-name>
          <email>yuhangwu@mail.ynu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hao Wu</string-name>
          <email>haowu@ynu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Text Classification</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ON-LSTM</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>TextCNN</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ensemble Learning</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIRE '20, Forum for Information Retrieval Evaluation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information Science and Engineering, Yunnan University</institution>
          ,
          <addr-line>Chenggong Campus, Kunming</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>This paper describes our system in subtask A of HASOC2020: Hate Speech and Ofensive Content Identification in Indo-European Languages. We propose a method of multi-model ensemble learning, which includes BERT, ON-LSTM, and TextCNN models. The multi-model ensemble aims to make better results about text classification than the single model. Our system achieves the Macro average F1-score of 0.5017 and is ranked 11th on the final leader board of the competition among the 36 teams.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the popularity of social media, much of the world communicates on it, for example, nearly
a third of the world’s population active on Facebook alone. Meanwhile, anyone can publish
content and access content of interest in these platforms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, it provides space for
discourses that include ofensive content and hate speech , which is used to express hatred towards
a targeted group or be derogatory to the members of the group [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These words have seriously
destroyed social harmony, even led to violence and conflicts. Although many social media
platforms confronting the trend have created their provision to against someone’s hate speech
under laws prohibiting hate speech, they have to need an eficient automatic detection system
facing the timely transmission of massive hate speech.
      </p>
      <p>
        In recent years, lots of researchers in industry and academia have developed some natural
language processing (NLP) techniques for detecting hate speech and ofensive content. The
QutNocturnal team utilized Convolutional Neural Networks (CNN) to identify whether tweets
are hate speech [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Then, Manolescu et al. proposed a system based on Long Short-Term
Memory (LSTM) with an embedding layer, for detecting hate speech against immigrants and
women in Twitter [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Ordered Neurons LSTM (ON-LSTM) has integrated the hierarchical
structure (tree structure) into the LSTM, which allowed the LSTM to automatically learn the
hierarchical structure information [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], so Wang et al. proposed ON-LSTM with attention
for identifying hate speech and ofensive language, and use the K-fold ensemble approach to
enhance the performance [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Mozafari et al. investigated the ability of BERT [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] at capturing
hateful context within social media content [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These methods can efectively detect hate
speech, however they just take advantage of themselves. After all the ability of the single model
is limited. A good solution is to integrate multiple models to gain the integration advantage.
      </p>
      <p>
        HASOC2020 aims at identifying hate speech and ofensive content in Indo-European
Languages [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which provides a forum and a data challenge for multilingual research on the
identification of problematic content. For subtask A, it needs to classify tweets into two
categories: hate and ofensive (HOF), non-hate and ofense (NOT). We participated in it for the
English language and proposed an ensemble learning method that includes BERT, ON-LSTM,
and TextCNN models. It can integrate the advantages of each model to enhance the efectiveness
of the detection of hate speech and ofensive content. The experimental results indicated that
our method performs better than the single model and achieves the Macro F1-score of 0.8896.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <sec id="sec-2-1">
        <title>2.1. BERT model</title>
        <p>BERT is mainly composed of Bidirectional Transformers blocks, which overcomes the problem
of invariance of word vector representation in previous models. It can generate diferent word
vectors of the same word according to diferent contexts. BERT achieves excellent results in
many natural language tasks these years.</p>
        <p>In Google’s research, a large number of high-quality texts are used to pre-train through
self-supervised methods. The language knowledge contained in texts is encoded and transferred
to Bi-Transformer for training, and their pre-training model parameters1 is released. Often we
only need to fine-tune the pre-trained model to deal with most text classification tasks.</p>
        <p>Because the Bi-Transformer cannot remember the time sequence information, we add the
[CLS] flag to the head of the input text to indicate whether or not it is used for a classification
task. Then, we extract all the first elements of rows from the output of BERT and send it to a
simple fully connected layer to output the classification results.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. TextCNN model</title>
        <p>TextCNN is a variant of the CNN architecture. Firstly, the sentence is embedded with word
vectors represented as [ 1,  2, … ,   ], where   is the k-dimensional word vector corresponding
to the i-th word in the sentence. A convolution operation involves a filter which is applied to
a window of ℎ words to produce a new feature. For example, a feature   is generated from a
window of words [  ∶  +ℎ−1 ] as follows:
  =  ( [  ∶  +ℎ−1 ] +  )
(1)
where  is a weight matrix,  is a bias vector and  (⋅) is a non-linear function. This filter
is applied to  − ℎ + 1 possible windows to produce a feature matrix  = [ 1,  2, … ,  −ℎ+1 ].
Then it takes the maximum value   from  by 1-max pooling layer. Therefore, we adopt 
1https://github.com/google-research/bert
iflters with diferent window sizes to achieve the above process, and combine the maximum
value of each filter into a vector  =
[ 1
,  2 , … ,  

probability of each category by a fully connected softmax layer.
]. Finally, we get this model output the</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. ON-LSTM model</title>
        <p>ON-LSTM is a new variant of LSTM, whose unit architecture is shown in Figure1. It uses an
architecture similar to the standard LSTM, so is also composed of forget gate   , input gate  
and output gate   . The formula is as follows:
  =  (    +   ℎ−1 +   ) ,
  =  (    +   ℎ−1 +   ),
  =  (    +   ℎ−1 +   )
where   is the input variable at time  , ℎ−1 corresponds to the hidden layer at time  − 1 , 
and  are weight matrices,  is the bias vector, and  (⋅) is a sigmod function.</p>
        <p>However, the diferent between ON-LSTM and LSTM is to enforce an order to the update
frequency, we introduce a new activation function, the specific formula is as follows:
 ̃ = ⃗ ( 
  ̃ = 1 − ⃗ ( 
  =  ̃ ∘  ̃ ,

(   ̃  +   ℎ̃−1 +   )̃) ,</p>
        <p>(  ̃  +   ℎ̃−1 +   )̃),
  ̂ = ℎ</p>
        <p>(    +   ℎ−1 +   ),
 =   ∘ (  ∘  −1 +   ∘  ̂ ) + ( ̃ −   ) ∘  −1 + (  ̃ −   ) ∘  ̂
where ⃗ is an accumulative function, which represents the influence of historical information
and current information. the  ̃ and   ̃ are called master forget gate and master input gate
respectively.   gives the vector where the intersection is 1 and the rest is 0. In this way,
(2)
(3)
restructuring the output of long-term memory  , the high-level information may be stored
for a long time, but the low-level information may be updated at each step of input, so the
hierarchical structure is embedded by information hierarchy.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Ensemble learning</title>
        <p>We use the multi-model ensemble learning approach to get a stable system that performs well
in all aspects. We further use hard voting to determine the final category, whose main idea is to
vote for a speech by the classification results of each model and the minority obeys the majority.
Thus, the system integrates the models of TextCNN, BERT, and ON-LSTM by ensemble learning,
as showed in Figure 2.</p>
        <p>With input of speech [ 1,  2, … ,   ], the output value of each model is 0 or 1 which is
represented as a vote   . Adding all the values together to average, we will get the final score of a
candidate, as follows:
We divided the training set and valid set from the HASCO2020 data in English. Statistics of
the dataset are shown in Table 1, where data is relatively balanced and does not need us to do
distribution processing.</p>
        <p>where   ≥ 0.5
models.
3. Experiment
3.1. Dataset</p>
        <p>1 
  = ∑</p>
        <p>=0
shows that the speech is NOT, otherwise HOF. 
(4)
in as the number of</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.2. Implementation details</title>
        <p>steps:
To achieve better results, we clean and preprocess the texts, mainly including the following
• Using regular expressions to remove users and topics.
• Convert emoji into language expression.
• Check the spelling of words.
• Restore abbreviations.</p>
        <p>• Remove URL link.</p>
        <p>To ensure the objectivity and fairness of all experiments, we set all models according to
the hyperparameters: Adam optimizer, Learning rate=5e-6, epoch=15, batch size=32. And for
TextCNN and ON-LSTM model, we use glove.twitter.27B.200d2 for word embedding.</p>
        <p>As for all model training, we adopt the classical cross-entropy loss function, which is used
to measure the approximate degree between the predicted data distribution and the real data
distribution. And the model learns quickly to achieve the best performance by it. The formula
as it follows:</p>
        <p>1
 =1
(,  ) =
∑ −  (
 ) − (1 −   )(1 −   )
(5)
where   is value of label (0 or 1),   is probability of   .</p>
      </sec>
      <sec id="sec-2-6">
        <title>3.3. Result analysis</title>
        <p>To optimize the prediction results, we first trained each model to obtain the parameters under
the best F1-score. The loss of each iteration in the training process is shown in Figure 3. We
found that the best performance of each model was not when the loss value reach the lowest,
especially ON-LSTM can achieve the best performance in the second epoch. Meanwhile, we
save the model parameters when each model has the best performance for making ensemble
learning more efective.</p>
        <p>Then we compared the multi-model ensemble learning approach to the single model about
prediction results, which used Macro F1-score, Precision, and Accuracy to evaluate the prediction
2https://github.com/stanfordnlp/GloVe
results, as shown in Table 2. We can see our multi-model ensemble method has improved over the
single models on three indicators, especially the Macro F1-score is improved by 5.4% compared
with TextCNN. It demonstrated that multi-model ensemble learning can integrate the strengths
of each model and reduce the negative impact of the single model on the results.</p>
        <p>Besides, we also compared the diferent ensemble approaches, using our method based on
hard voting(Ensemble_hard_voting) versus it based on soft voting (Ensemble_soft_voting). We
found that hard voting is better because it is only to obey most of the model and do not need
to synthesize all the model opinions compared to soft voting . This can reduce the impact of
individual models on prediction results.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusions</title>
      <p>In this paper, we presented a system to detect hate speech and ofensive language for the English
language, which uses a method of multi-model ensemble learning for identifying such content.
We achieved better results than the single model in subtask A for the English language. In
future research, we will consider a more eficient ensemble method to further enhance the
performance of the model.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Natural Science Foundation of China (61962061), partially
supported by the Yunnan Provincial Foundation for Leaders of Disciplines in Science and
Technology, Top Young Talents of ”Ten Thousand Plan” in Yunnan Province, the Program for
Excellent Young Talents of Yunnan University.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mondal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          ,
          <article-title>A measurement study of hate speech in social media</article-title>
          ,
          <source>in: Proceedings of the 28th acm conference on hypertext and social media</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warmsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Macy</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <article-title>Automated hate speech detection and the problem of ofensive language</article-title>
          ,
          <source>arXiv preprint arXiv:1703.04009</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Bashar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nayak</surname>
          </string-name>
          , Qutnocturnal@ hasoc'19:
          <article-title>Cnn for hate speech and ofensive content identification in hindi language</article-title>
          , arXiv preprint arXiv:
          <year>2008</year>
          .
          <volume>12448</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Manolescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Löflad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N. M.</given-names>
            <surname>Saber</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Tari</surname>
          </string-name>
          , Tueval at semeval
          <article-title>-2019 task 5: Lstm approach to hate speech detection in english and spanish</article-title>
          ,
          <source>in: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>498</fpage>
          -
          <lpage>502</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sordoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <article-title>Ordered neurons: Integrating tree structures into recurrent neural networks</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>09536</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Ynu_wb at hasoc 2019:
          <article-title>Ordered neurons lstm with attention for identifying hate speech and ofensive language</article-title>
          .,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mozafari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Farahbakhsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Crespi</surname>
          </string-name>
          ,
          <article-title>A bert-based transfer learning approach for hate speech detection in online social media</article-title>
          ,
          <source>in: International Conference on Complex Networks and Their Applications</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>928</fpage>
          -
          <lpage>940</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC track at FIRE 2020: Hate Speech and Ofensive Content Identification in Indo-European Languages)</article-title>
          ,
          <source>in: Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation</source>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>