<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Self-Attention with K-Max pooling for discrimination between Hate , profane and ofensive posts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Huiping Shi@</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huiping Shi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaobing Zhou</string-name>
          <email>zhouxb@ynu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIRE'20, Forum for Information Retrieval Evaluation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of information Science and Engineering,Yunnan University</institution>
          ,
          <addr-line>Yunnan.P.R china</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>This paper describes are system submitted to HASOC2020: Hate speech and ofensive content recognition. The purpose of the task is to discrimination ofensive language in social media. We participated in a subtask A and B of English and German. The subtasks A are to identify hate speech, The subtasks B are to identify hate speech,ofensive speech and profane speech, ofensive blasphemy from a fine-grained perspective. To accomplish these subtasks, we proposed a system based on Multi-Top Self-Attention and K-Max pooling model and used  -fold method for training fitting. We get the macro F1-score of our model is 0.5042 of subtask A in English, 0.2396 of subtask B in English.0.5121 of subtask A in German,0.2736 of subtask B in German.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the development of information technology, social media provides people with more and
more convenient forms of communication. People can freely express their opinions on social
networks. At the same time, some people will use social networks to release their one-sided
emotions to guide the public to attack the innocent, undermine objective discussions, and
intensify conflicts. An immeasurable language environment is the foundation of a harmonious
atmosphere and the cornerstone of universal progress [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To purify the Internet environment,
we need to identify the negative emotions on the Internet. Identifying hate speech and insulting,
derogatory or obscene content and another negative emotional speech on social networks
belongs to the research direction of emotion classification in natural language processing.
Sentiment classification [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is one of the main tasks in classification tasks in natural language
processing, and it is also a hot spot in domestic and foreign research in recent years. The
task of sentiment classification is to help researchers quickly obtain, organize, and analyze
relevant text information, and analyze, summarize, and infer the emotion contained in the text.
Traditionally, regular text language is more beneficial to the analysis, processing, induction,
and reasoning of natural language processing. This is because the regular text used conforms
to specific rules, it is critical for natural language processing to find the rules in the document.
However, in emotional classification, the language used is more perceptual rather than rational,
and many sensitive texts are somewhat diferent from regular texts. Most of the languages used
on social networks are related to personal life experience, and these texts are even afected by
the personal national language in diferent ways of expression. Although the texts on social
network media are in the same language, its meaning is quite diferent. Besides, the language
text on social media is updated very quickly which is because frequently updated hot events
on social media make netizens reach a consensus on a certain event. Besides, the language
text on social media is updated very quickly. Facing the ultra-fast updating of social media
texts, some scholars of natural language processing still insist on studying the characteristics of
hate texts. From the initial manual feature extraction to the rule-based feature extraction of
machine learning, and now, the feature extraction of neural networks. The feature extraction
has experienced a long and complicated text research process.
      </p>
      <p>
        HASOC competition, which is an evaluation task to identify hate speech and ofensive
content in Indo-European languages [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. This year, HASOC provides 2 subtasks with each for
Langusge. Subtask A: Coarse-grained classification, ofensive and profanity content. Subtask B:
Fine-grained classification, the distinction between profanity and ofensive posts.
      </p>
      <p>We participated in subtasks A and B in English and German, and the datasets discussed in
this article come from HASOC. Based on the method of deep learning, we have developed
an end-to-end neural network model, which takes Multi-Top Self-Attention as the core and
joins K-Max pooling. During training, K-Folding that can alleviate data imbalance and data
overfitting, and the approach of fitting generation is used for batch training. This model has
achieved an excellent result of subtasks A and B in English and German of HASOC 2020.</p>
      <p>The structure of this paper is as follows: In Second 2, we introduce the related work of
identifying hate speech and ofensive language. In Section 3, we describe the data set and the
model structure in detail. In Section 4, we describe the experimental result and data analysis.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The research of natural language processing on hate speech on social networks can be traced
back to 2010. Gries et al. collected and annotated the text on social media(tweets), completed
a love hate data set about social media speech, and provided these data sets to scholars who
need to do research [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Provided these datasets to scholars who needed to do research. The
scholars who made the dataset also held regular competitions to improve the dataset regularly to
encourage more people to participate in this task. Dhillon et al. proposed the SAS model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. SAS
provides a direction for studying computer network methods, technologies and systems. SAS
uses natural language processing technology to identify sentence components (such as subject,
verb, and object), disambiguate and identify entities so that SAS can identify whether there is a
basic relationship. SAS provided one or more user interface tools for sentiment analysis. In
1996, Hochreiter et al. proposed Long Short Term Memory (LSTM) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The emergence of LSTM
made the technology of natural language processing an epoch-making milestone. Malmaisi
et al. combined the method of feature representation (character n-gram, word n-gram and
word skip-gram) with the Support Vector Machine (SVM) classifier to distinguish hate speech
on social media from general profanity, and the accuracy of 0.78 was obtained [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]. Feature
representation plays an important role in natural language processing. In 2018, Malmaisi et al.
learned from the previous research, and based on the feature representation model, they tried
to integrate multiple natural language classification models and achieved an accuracy of 0.80
on the three classification task [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        The study of hate speech is not limited to the study of feature representation [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. In the
negative polarity and emotional intensity [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ], the research of hatred classification features
[
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] has also made great progress. In deep learning, there are some similarities between
natural language processing and image processing. Some neural network researchers apply
the attention mechanism originally used for image processing to natural language processing,
such as [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ]. It is found that the attention mechanism is highly adaptive in natural language
processing, and can be applied to various tasks, and achieve better than the neural network
model originally used for natural language processing. In the attention mechanism, in addition
to determining the ’value’, the initial values of the ’key’ and the ’query’ must be manually set.
The manual operation has not only increased the experimental error. In 2017, the self-attention
mechanism was first proposed by Lin et al [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], and they set the value, key and query to the
same value, thereby reducing the external influence and making the model self-optimizing. The
self-attention mechanism is used in various tasks. In 2019, based on the study of self-attention
mechanism, Child et al. proposed a sparse self-attention mechanism [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. While reducing
the calculation time of the algorithm, the model uses the filtering function to optimize the
self-attention research, and the model also maintains a good efect on various tasks.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>We preprocess the data set using word representation (Fasttext) to convert the text into vector
that the computer can process. Two Multi-Top Self-Attention were applied to refine the
representative features of the text. After model using the K-Max pooling and Tanh functions
get text characteristics, finally, softmax and tanh function were used to calculate scores of each
classification. The model is shown in figure 1</p>
      <sec id="sec-3-1">
        <title>3.1. Input Layer</title>
        <p>The input layer accepts the preprocessed text data. Put the sorted data into the model, which
data are in accordance with the format of data processed by the model</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Embedding Layer</title>
        <p>This layer accepts the input of the Input layer and vectorizes the words in the existing dictionary,
which are input by the pre-training vector model.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Encode layer</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3.3.1. Multi-Top Self-Attention</title>
      <p>
        In 2019, the results of Transformer [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] are obvious to all, but the problems of Transformer
are also very obvious. The transformer takes a non-discriminatory approach to feature
representation. As a result, Transformer extracts some of the meaningless features, preventing
further improvement. To solve this problem, we propose Multi-Top Self-Attention, As shown
in figure
      </p>
      <p>
        2. Multi-Top Self-Attention adds a filtering mechanism based on the Transformer.
Multi-Top Self-Attention screens out the  − 
features that are important to the text and
sets the unimportant features to −∞ [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The Attention block is shown in figure 3
As depicted in the figure 3, we initialize  = 
 ,  =
      </p>
      <p>,  =   , then calculate  ,
 =
 
√
(1)
here  is the first dimension of Embedding.  ,  , and  all come from input, but the weights of
the matrix of a linear transformation are diferent. Then we multiply  and  by dot product
to get the dependency between the input word and the word. Finally, all values are mapped
to a space with dimension  . We get the score for the attention mechanism, and  contains all
the characteristic merit scores. At this point, our model is the same as that of the Transformer.
Then we start to filter the feature score of  .</p>
      <p>
        The  ′ here is the filtered matrix. we use the heap algorithm [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], maintains a small top heap
of size  and puts the data into the heap in turn. When the heap size is full, we only need to
compare the top element with the next number. If it is larger than the top element, the current
top element is discarded and inserted into the heap. Finally, all of the   are in the heap. But
this p prime right here is the same thing as  . The purpose of this   operation is simply to
mark   elements in  . We maintain a matrix called  ∗ .
      </p>
      <p>′ =    − ℎ( )
 (, ) =
{</p>
      <p>1
−∞   (, )</p>
      <p>(, )  
  =  ×</p>
      <p>is 1, and the reverse is negative infinity. Then we
(2)
(3)
(4)
(5)
(6)
(7)
 =  −   (  −  
   )</p>
      <sec id="sec-4-1">
        <title>3.4. Output Layer</title>
        <p>function and finally use  ℎ
of each class.</p>
        <p>Concerning the output layer, in order to improve training eficiency, we first use the
 ℎ
function and softmax function to calculate the probability output
When  (, )</p>
        <p>is in the range of   ,  (, )
iflter out the values of the</p>
        <p>.</p>
        <p>=  ( ) × 
At this moment, we have completed an Attention block. We use the output of the Attention
block as the input for the next Attention block. Finally, all values are combined.
  −  
    =</p>
        <p>ℎ

∑
0&lt;ℎ&lt;=
3.3.2. K-Max pooling
In the previous operation, we connected parameters from two Multi- 
this point, we take advantage of the</p>
        <p>filtering mechanism again. But the diference is, here’s</p>
        <p>
          , we use the algorithm is the merge algorithm [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], take the first  value. The idea of the
integration algorithm is to combine and sort the parameters on the dimension by using the
 Self-Attention. At
division method. That is.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experiment and Result</title>
      <sec id="sec-5-1">
        <title>4.1. Result</title>
        <p>
          In this experiment, we took part in subtasks A and B in English languages and German languages
provided by HASOC. We have completed a total of 4 subtasks. We submit the result on the
platform given by HASOC [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], use F1-macro as the evaluation criteria. In these tasks, We
get the macro F1-score of our model is 0.5042 of subtask A in English, 0.2396 of subtask B in
English.0.5121 of subtask A in German,0.2736 of subtask B in German. The result and ranking
as shown in table 1.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Data distribution</title>
        <p>
          The distribution of the data is shown in table 2. All data is sourced from the data provided by the
HASOC platform. Subtask A in English and German, identification hate or none hate positions,
Hate speech and ofensive posts in Subtask A fall into two categories: hate speech(hate),
normal text(NOT). Subtask B in English and German, identification hate, profanity and ofensive
positions. This subtask is a fine classification of English, German, and Hindi datasets. Hate
speech and ofensive posts in Subtask B fall into four categories: hate speech(hateful), (OFFN)
ofensive, (PRFN) profanity, (NONE) normal text [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>We can see that the total amount of data is small and unbalanced distributed, which can easily
lead to model overfitting. Due to the small amount of data, we can obtain too few representative
features during the training process. So that the model to learn that weak representation or
only individual text has characteristics. These features are often irrelevant to the expected
features that need to be extracted. To address the irregular amount and distribution of data, we
use cross-validation. We messed up the data set into five parts, validate five times, each time
taking 1/5 of the dataset as the validation set.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Conclusion</title>
        <p>In this paper, we describe the attention mechanism model based on Multi-Top Self-Attention
and K-Max pooling, which used to subtask A and B in English and German of HASOC 2020.
And in these subtasks,we have achieved a good result. Unbalanced data distribution will lead
to poor model generalization and easy over-fitting. Therefore, in the future, we will focus on
research on data imbalance processing.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , Ynu_wb at hasoc 2019:
          <article-title>Ordered neurons lstm with attention for identifying hate speech and ofensive language</article-title>
          ,
          <source>in: FIRE (Working Notes)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>191</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kanakaraj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M. R.</given-names>
            <surname>Guddeti</surname>
          </string-name>
          ,
          <article-title>Nlp based sentiment analysis on twitter data using ensemble classifiers</article-title>
          ,
          <source>in: 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN)</source>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <article-title>Overview of the hasoc track at fire 2019: Hate speech and ofensive content identification in indo-european languages (</article-title>
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Modha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Shahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nandini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <article-title>Overview of the HASOC track at FIRE 2020: Hate Speech and Ofensive Content Identification in Indo-European Languages)</article-title>
          ,
          <source>in: Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation</source>
          ,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Gries</surname>
          </string-name>
          ,
          <article-title>Corpus linguistics and theoretical linguistics: A love-hate relationship? not necessarily</article-title>
          ,
          <source>International Journal of Corpus Linguistics</source>
          <volume>15</volume>
          (
          <year>2010</year>
          )
          <fpage>327</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Dhillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <source>Nlp-based sentiment analysis</source>
          ,
          <source>2014. US Patent 8</source>
          ,
          <issue>838</issue>
          ,
          <fpage>633</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory</article-title>
          ,
          <source>Neural computation 9</source>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Detecting hate speech in social media</article-title>
          ,
          <source>arXiv preprint arXiv:1712.06427</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>Visualizing and understanding neural models in nlp</article-title>
          ,
          <source>arXiv preprint arXiv:1506.01066</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Malmasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zampieri</surname>
          </string-name>
          ,
          <article-title>Challenges in discriminating profanity from hate speech</article-title>
          ,
          <source>Journal of Experimental &amp; Theoretical Artificial Intelligence</source>
          <volume>30</volume>
          (
          <year>2018</year>
          )
          <fpage>187</fpage>
          -
          <lpage>202</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mehdad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tetreault</surname>
          </string-name>
          ,
          <article-title>Do characters abuse more than words?</article-title>
          ,
          <source>in: Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>299</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Nobata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tetreault</surname>
          </string-name>
          , A. Thomas,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mehdad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Abusive language detection in online user content</article-title>
          ,
          <source>in: Proceedings of the 25th international conference on world wide web</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>153</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dinakar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Havasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lieberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Picard</surname>
          </string-name>
          ,
          <article-title>Common sense reasoning for detection, prevention, and mitigation of cyberbullying</article-title>
          ,
          <source>ACM Transactions on Interactive Intelligent Systems (TiiS) 2</source>
          (
          <issue>2012</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. O.</given-names>
            <surname>Sood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Churchill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Antin</surname>
          </string-name>
          ,
          <article-title>Automatic identification of personal insults on social news sites</article-title>
          ,
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>63</volume>
          (
          <year>2012</year>
          )
          <fpage>270</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>C. Van Hee</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Lefever</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Verhoeven</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mennes</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Desmet</surname>
            , G. De Pauw,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
          </string-name>
          ,
          <article-title>Detection and fine-grained classification of cyberbullying events</article-title>
          ,
          <source>in: International Conference Recent Advances in Natural Language Processing (RANLP)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>672</fpage>
          -
          <lpage>680</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Burnap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. F.</given-names>
            <surname>Rana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Avis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Housley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Edwards</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Morgan</surname>
          </string-name>
          , L. Sloan,
          <article-title>Detecting tension in online communities with computational twitter analysis</article-title>
          ,
          <source>Technological Forecasting and Social Change</source>
          <volume>95</volume>
          (
          <year>2015</year>
          )
          <fpage>96</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <article-title>A capsule network for recommendation and explaining what you like and dislike</article-title>
          ,
          <source>in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Sutskever, Generating long sequences with sparse transformers</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>10509</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. N. d.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>A structured self-attentive sentence embedding</article-title>
          ,
          <source>arXiv preprint arXiv:1703.03130</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kurita</surname>
          </string-name>
          ,
          <article-title>An eficient agglomerative clustering algorithm using a heap, Pattern Recognition 24 (</article-title>
          <year>1991</year>
          )
          <fpage>205</fpage>
          -
          <lpage>209</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , T. Pavlidis,
          <article-title>Segmentation by texture using a co-occurrence matrix and a split-and-merge algorithm</article-title>
          ,
          <source>Computer graphics and image processing 10</source>
          (
          <year>1979</year>
          )
          <fpage>172</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>