<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Aggression Identification in Posts - two machine learning approaches.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Universite´ de Toulouse</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France faneva.ramiandrisoa@irit.fr</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universite ́ d'Antananarivo</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Social media have changed the way people communicate. One of the aspects is cyber-aggression and interpersonal aggression that can be catalyzed by perceived anonymity. Automatically monitoring user-generated content in order to help moderating it is thus a hot topic. In this paper, we present and evaluate two supervised machine learning models to identify aggressive content and the level of aggressiveness. The first model uses random forest and linear regression while the second model uses deep learning techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>Social media</kwd>
        <kwd>Social media analysis</kwd>
        <kwd>Cyber-agression</kwd>
        <kwd>TRAC Trolling</kwd>
        <kwd>Aggression and Cyberbulling</kwd>
        <kwd>Machine learning based model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Social media have changed the way people communicate [
        <xref ref-type="bibr" rid="ref13 ref14 ref3 ref5">3,13,14,5</xref>
        ]. One of these
aspects is cyber-aggression and interpersonal aggression that can be catalyzed by
perceived anonymity [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Automatically monitoring user-generated content in order to
help moderating social media is thus an important although difficult topic [
        <xref ref-type="bibr" rid="ref17 ref4">4,17</xref>
        ].
      </p>
      <p>
        In 2018, the Shared Task on Aggression Identification was organised as part of
the First Workshop on Trolling, Aggression and Cyberbullying (TRAC - 1) at
COLING 2018 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The objective of this task is to detect aggressive content and the level
of aggressiveness. Thirty teams submitted their test runs. The best system obtained a
weighted F-score of 0.64 on a data set composed of annotated Facebook comments.
      </p>
      <p>In this paper, we report two models we developed in order to answer the aggression
identification task. The first model uses random forest and linear regression which can
be considered as relatively mature approaches while the second model combines CNN
and LSTM recent deep learning techniques. No strong conclusion could be made on the
superiority of one or the other model since it depends on the collection.</p>
      <p>This paper is organized as follows: Section 2 reports related works, Section 3
describes our two approaches, Section 4 describes the dataset used in this work, reports
the results and discuss them while Section 5 concludes this paper and presents future
works.</p>
      <p>”Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).”</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Approaches based on features and supervised classifiers such as Support Vector
Machines (SVM) are often used in order to learn to detect whether a text contains
aggressiveness [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]; in recent years, deep learning has been also employed for this task
[
        <xref ref-type="bibr" rid="ref19 ref2">19,2</xref>
        ].
      </p>
      <p>
        Deep learning has also been used by TRAC challenge participants. TRAC [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
challenge is the first that focuses on detecting aggressive text. The task training set is
composed of Facebook posts/comments; there is also two kinds of test sets: one from
Facebook and another from Twitter.
      </p>
      <p>
        Among the thirty participants, Saroyehun [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] obtained the best results. The authors
investigated the efficacy of deep neural network by experimenting different models
: CNN, LSTM, BiLSTM, and combinations thereof. In their experiments they used
translation technique to enlarge the training set and added an external dataset on hate
speech3. The LSTM model which was trained on the augmented training set only,
achieved the best weighted F1 score of 0.6425 on Facebook test set ; it is the first
ranked system on TRAC challenge ; the same system does not performed as well on the
Twitter data set. The other system of the same team which implements a combination
of CNN and LSTM and which was trained on the augmented training set and the
additional dataset, achieved a weighted F1 score of 0.5920 and the third rank on the twitter
test set.
      </p>
      <p>
        Raiyani et. al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], meanwhile, tested different models for text classification in
TRAC, from classic machine learning model to deep learning models. At the end, they
kept three models: FastText model, Dense neural networks, and Voting of the two. The
Dense neural networks gives better performance than the two others and achieved a
weighted F1 score of 0.5813 on Facebook test set; it is the fourteenth rank on TRAC
challenge. While it achieved the best weighted F1 score of 0.6009 and the first rank on
the twitter test set, although it was trained on a Facebook dataset.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Machine learning based models</title>
      <p>
        We developed two supervised machine learning based models that we evaluated in this
paper. The first method combines random forest and logistic regression while the second
approach is deep learning based. We also developed a model based on CNN only for
which results can be found in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]; it performs in between the two models reported in
this paper.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Trac-RF LR: combination of two classifiers</title>
        <p>
          In this model we combined random forest (RF) based on surface features and linguistic
features with logistic regression (LR) based on document vectorization. We chose this
combination because a combination of multiple machine learning models placed first
in many prestigious machine learning competitions [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], such as Netflix Competition,
3 https://github.com/ZeerakW/hatespeech, accessed on January 10, 2020
Kaggle,... Moreover, when using non-combined models on the training dataset, the
results were lower in the case of TRAC as well and this was confirmed on the test set (see
section 4.3).
        </p>
        <p>
          RF Classifier. The random forest model uses different features extracted from the
comments as presented in Table 1. Some are adapted from [
          <xref ref-type="bibr" rid="ref1 ref22">1,22</xref>
          ] where the authors tried
to detect depression from texts; another source of inspiration is [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] where the authors
suggested an information nutritional label for describing text qualities.
        </p>
        <p>Name
Part-of-speech
quency
Negation
Capitalized
Punctuation marks
Emoticons
Sentiment
Emotions</p>
        <p>freGunning Fog Index
Flesch Reading Ease
Linsear Write Formula
New Dale-Chall
Readability
Swear words</p>
        <p>Hypothesis or tool/resource used
Normalized frequencies of each tag: adjectives, verbs, nouns
and adverbs (four features).</p>
        <p>Normalized frequencies of negative words like: no, not, didn’t,
can’t, ... The idea behind is to detect non direct aggressiveness.</p>
        <p>The idea behind is that aggressive texts tend to put emphasis
on the target they mention. It can indicate feelings or speaking
volume.
! or ? or any combination of both can emphasize offensiveness
of texts.</p>
        <p>Another way to express sentiment or feeling.</p>
        <p>Use of NRC-Sentiment-Emotion-Lexicons4 to trace the polarity
in text.</p>
        <p>Frequency of emotions from specific categories: anger, fear,
surprise, sadness and disgust. The idea behind is to check the
categories related to aggressiveness.</p>
        <p>Estimate of the years of education that a person needs to
understand the text at first reading.</p>
        <p>Measure how difficult to understand a text is.</p>
        <p>Developed for the U.S. Air Force to calculate the readability of
their technical manuals5.</p>
        <p>Measure the difficulty of comprehension that persons encounter
when reading a text. It is inspired from Flesch Reading Ease
measure.</p>
        <p>The intuition behind is that the texts containing insults are often
aggressive.</p>
        <p>Lexical analysis with Empath is a tool for analyzing text across lexical categories.
python library empath By default, it has 194 lexical categories and each category is
considered as feature.</p>
        <p>Table 1: List of features used in RF to represent texts (Facebook comments or tweets).
4 http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm, accessed on
201702-23</p>
        <p>Some of these features are used for abusive language detection, hate speech,
cyberbullying and the others are used for sentiment or personality analysis that we judged
useful for aggression detection.</p>
        <p>A RF classifier was trained on train and validation sets by representing each text
(Facebook comment or tweet) with a vector composed by the features we mentioned in
Table 1.</p>
        <p>The following parameters were used during the training: class weight=”balanced”,
max features=”sqrt”, n estimators=60, min weight fraction leaf=0.0, criterion=’entropy’,
random state=2.</p>
        <p>At prediction time, a text from the test set is represented with features and then run
the trained model. The output is the estimated probabilities for the three classes (overtly
aggressive, covertly aggressive and non-aggressive).</p>
        <p>
          LR Classifier. This model is based on document vectorization using Doc2vec [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
Doc2vec is used to represent sentences, paragraphs, or whole documents as vectors and
it can be trained on small corpora, which is case of the task datasets.
        </p>
        <p>
          Before building the LR Classifier, we first trained two separate Doc2vec models: a
Distributed Bag of Words and a Distributed Memory model [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. For the training, we
used the same configuration as in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] for representing user’s text. The two Doc2vec
models were trained on the train and validation sets. We used the Python package
gensim6[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. We also concatenated the output vectors of these two models, as done in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ],
resulting in a representation by a 200-dimension vector per text.
        </p>
        <p>Then a logistic regression classifier was trained on the vectors for both the train and
validation sets with the following parameters : class weight=”balanced”, random state=1,
max iter=100, solver=”liblinear”.</p>
        <p>At prediction time, the texts from the test set were vectorized by using the two
Doc2vec models and the 200-dimension vectors were given as input of trained classifier.
The output is also a set of class probabilities.</p>
        <p>Combination of two classifiers. The class probabilities obtained from RF classifier
and LR Classifier were averaged and finally the class with the highest probability was
considered as the class the text belongs to. We also tested different ways to combine the
output probabilities obtained from the two classifiers RF and LR, such as maximum,
minimum, etc., but the average method gave the best results.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Trac-CNN LSTM: Combination of CNN and LSTM</title>
        <p>
          This model combines two deep learning techniques: CNN and LSTM. The main idea
is to pass input representation (sentence matrix in Figure 1) to the CNN and pass the
local features learnt by the CNN (concatenated vectors in Figure 1) to the LSTM.
Indeed, CNN and LSTM are complementary due to the fact that each of them captures
information at different scales [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
5 http://www.streetdirectory.com/travel_guide/15675/writing/how_to_choose_
the_best_readability_formula_for_your_document.html, accessed on 2018-02-25
6 https://radimrehurek.com/gensim/index.html
        </p>
        <p>The architecture of our combined model is illustrated in Figure 1. It is as follows:
first, we convert sentences/texts into sentences matrix7 where each row is a vector
representation8 of each word in the sentences/texts. Then, convolutions are applied on the
sentences matrix where we used three filter region sizes: bigrams (height = 2), trigrams
(height = 3) and fourgrams (height = 4). Each region has 100 filters; thus, in total there
are 300 filters. The result of convolutions is called feature maps; vectors with
variablelength according to the region filter and each filter region has 100 feature maps.
Afterwards, a 1-max pooling is performed over feature maps. More precisely, for each
region the largest number from each feature map is kept and then concatenated to form
a vector. As a result, we obtain one vector of size 1009 per region filter. Then, these three
vectors are concatenated to form a feature vector and a dropout is applied on this feature
vector. The concatenated feature vector is passed to the LSTM layer. Then, we added
one fully connected hidden layer to reduce the dimension of the concatenated vector,
followed by a dropout. Finally, an output layer, which is also a fully connected layer
with three possible output states, is added. On the output layer, the activation function
used is the softmax function.</p>
        <p>
          The architecture of our model is inspired from the CNN architecture Zhang et.
al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] proposed and which is used for sentences classification. In that task, their CNN
architecture outperforms baseline methods which use SVM as well as the one that used
CNN in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
4
4.1
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <sec id="sec-4-1">
        <title>Data set</title>
        <p>
          The evaluation is based on the TRAC 2018 shared task [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The task dataset is a subset
of Kumar et al’ [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and consists in English and Hindi randomly sampled Facebook
comments. In this study, we focused on the English part of the dataset which is detailed
in Table 2. It is composed of (a) 11,999 Facebook comments for training and 3,001
comments for validation. It is annotated with 3 levels of aggression - Overtly
Aggressive (OAG), Covertly Aggressive (CAG) and Non- Aggressive (NAG), (b) 916 English
comments for test. Additionally, 1,257 English tweets were given as a second test set.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation measure</title>
        <p>
          The evaluation metric used in this paper is the weighted F1 which was also used in the
TRAC shared task. The weighted F1 is equal to the average, weighted by the number
of instances for each label, of the F1 (given by equation 1) of each class label.
7 The dimension of a sentence matrix is l d, where l is the length of the longest text/sentence
in the dataset and d is the dimension of word vector representation.
8 The word vector representation is obtained with word2vec model [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] trained on the training
and validation sets.
9 Because there is 100 feature maps.
t p t p
where P = t p+ f p is the precision, R = t p+ f n is the recall, t p denotes the true
positives, f p the false positives, and f n the false negatives.
4.3
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>Results</title>
        <p>Table 3 reports the results we obtained with the two models presented above. For
comparison, we report also results obtained with the RF classifier only and with the LR
classifier only. The baseline mentioned in the first row was given by the TRAC shared
task organizers while the second row is the best result from participants in the TRAC
workshop.</p>
        <p>System
Random Baseline</p>
        <p>
          Saroyehun [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
        </p>
        <p>Trac-RF LR
Trac-CNN LSTM</p>
        <p>Trac-RF only
Trac-LR only</p>
        <p>Weighted F1</p>
        <p>
          We can see that our two models outperform the baseline on both Facebook and
Twitter subsets. Trac-RF-LR is better than Trac-CNN-LSTM on the Facebook
collection while it is the opposite on the Twitter collection. This could be due to the train
dataset which is only composed of texts crawled from Facebook. Indeed, we can
observe the same behaviour for the other systems that participated to the challenge [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
The only exception is for Saroyehun [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] system which performs better on the Twitter
dataset.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we presented two different supervised machine learning approaches for
aggression identification on TRAC 2018 English collections (Facebook and Twitter
based). The combination of random forest and linear regression classifiers based on
a set of surface features and document vectorization leaded to the sixteenth ranked
system out of thirty on the Facebook collection. The combination of CNN and Long
Short-Term Memory was ranked fifteenth out of thirty systems.</p>
      <p>
        To extend this work, we plan to update our models by adding new features such as
bag of words or features more specific to the aggression. We also plan to apply feature
engineering on the features we used in this paper in order to see which one are the
most useful. On the other hand, feature selection could also be applied to build models
that use features as less as possible [
        <xref ref-type="bibr" rid="ref11 ref6">11,6</xref>
        ]. Finally, an investigation on deep learning
models will be conducted by using different architectures such as hierarchical attention
network. We do believe that these tracks can help designing more performing models.
      </p>
      <p>Ethical issue. While TRAC challenge has its proper ethical policies, detecting
aggressive content from user’s posts raises ethical issues that are beyond the scope of the
paper.</p>
      <p>Acknowledgement. This work has been partially funded by the European Union’s
Horizon 2020 H2020-SU-SEC-2018 under the Grant Agreement n°833115
(PREVISION project). This work has also been partially supported by the Ministe`re des
Affaires e´trange`res et du De´veloppement international under the scholarship
EIFFELDOCTORAT 2017/ n°P707544H for Faneva Ramiandrisoa’s PhD thesis.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Abdou</given-names>
            <surname>Malam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Arziki</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Nezar</given-names>
            <surname>Bellazrak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Benamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>El Kaidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Es-Saghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Housni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Moriceau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ramiandrisoa</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>IRIT at e-Risk (regular paper)</article-title>
          .
          <source>In: International Conference of the CLEF Association, CLEF 2017 Labs Working Notes. ISSN 1613-0073</source>
          , vol.
          <year>1866</year>
          . CEUR Workshop Proceedings, http://CEUR-WS.org (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aroyehun</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Aggression detection in social media: Using deep neural networks, data augmentation, and pseudo labeling</article-title>
          .
          <source>In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)</source>
          . pp.
          <fpage>90</fpage>
          -
          <lpage>97</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Caron</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Light</surname>
          </string-name>
          , J.:
          <article-title>“social media has opened a world of 'open communication:'” experiences of adults with cerebral palsy who use augmentative and alternative communication and social media</article-title>
          .
          <source>Augmentative and Alternative Communication</source>
          <volume>32</volume>
          (
          <issue>1</issue>
          ),
          <fpage>25</fpage>
          -
          <lpage>40</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whinston</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          :
          <article-title>Moderated online communities and quality of usergenerated content</article-title>
          .
          <source>Journal of Management Information Systems</source>
          <volume>28</volume>
          (
          <issue>2</issue>
          ),
          <fpage>237</fpage>
          -
          <lpage>268</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. De´cieux,
          <string-name>
            <given-names>J.P.</given-names>
            ,
            <surname>Heinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Willems</surname>
          </string-name>
          , H.:
          <article-title>Social media and its role in friendship-driven interactions among young people: A mixed methods study</article-title>
          .
          <source>YOUNG</source>
          <volume>27</volume>
          (
          <issue>1</issue>
          ),
          <fpage>18</fpage>
          -
          <lpage>31</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. De´jean,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Ionescu</surname>
          </string-name>
          , R.T.,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullah</surname>
            ,
            <given-names>M.Z.</given-names>
          </string-name>
          :
          <article-title>Forward and Backward Feature Selection for Query Performance Prediction</article-title>
          .
          <source>In: ACM Symposium on Applied Computing (SAC)</source>
          .
          <article-title>ACM : Association for Computing Machinery (</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fuhr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grefenstette</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanselowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jarvelin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Nejdl</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          , et al.:
          <article-title>An information nutritional label for online documents</article-title>
          .
          <source>In: ACM SIGIR Forum</source>
          . vol.
          <volume>51</volume>
          , pp.
          <fpage>46</fpage>
          -
          <lpage>66</lpage>
          . ACM (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>CoRR abs/1408</source>
          .5882 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ojha</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Benchmarking Aggression Identification in Social Media</article-title>
          .
          <source>In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbulling (TRAC)</source>
          .
          <source>Santa Fe</source>
          , USA (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reganti</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhatia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maheshwari</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Aggression-annotated corpus of hindienglish code-mixed data</article-title>
          .
          <source>arXiv preprint arXiv:1803</source>
          .
          <volume>09402</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Laporte</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flamary</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , De´jean,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          , J.:
          <article-title>Non-convex Regularizations for Feature Selection in Ranking with Sparse SVM</article-title>
          .
          <source>IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>25</volume>
          (
          <issue>6</issue>
          ),
          <fpage>1118</fpage>
          -
          <lpage>1130</lpage>
          (june
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of sentences and documents</article-title>
          .
          <source>In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014</source>
          , Beijing, China,
          <fpage>21</fpage>
          -
          <lpage>26</lpage>
          June 2014. pp.
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lipschultz</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Social media communication: Concepts, practices, data, law and ethics</article-title>
          .
          <source>Routledge</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Marganski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melander</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Intimate partner violence victimization in the cyber and real world: Examining the extent of cyber aggression experiences and its association with inperson dating violence</article-title>
          .
          <source>Journal of interpersonal violence 33(7)</source>
          ,
          <fpage>1071</fpage>
          -
          <lpage>1095</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8</source>
          ,
          <year>2013</year>
          ,
          <string-name>
            <given-names>Lake</given-names>
            <surname>Tahoe</surname>
          </string-name>
          , Nevada, United States. pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mishna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Regehr</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lacombe-Duncan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daciuk</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fearing</surname>
            , G., Van Wert,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Social media, cyber-aggression and student mental health on a university campus</article-title>
          .
          <source>Journal of mental health 27(3)</source>
          ,
          <fpage>222</fpage>
          -
          <lpage>229</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Myers</given-names>
            <surname>West</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms</article-title>
          .
          <source>New Media &amp; Society</source>
          <volume>20</volume>
          (
          <issue>11</issue>
          ),
          <fpage>4366</fpage>
          -
          <lpage>4383</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Osama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>El-Beltagy</surname>
            ,
            <given-names>S.R.:</given-names>
          </string-name>
          <article-title>A transfer learning approach for emotion intensity prediction in microblog text</article-title>
          .
          <source>In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics</source>
          <year>2019</year>
          ,
          <article-title>AISI 2019</article-title>
          , Cairo, Egypt,
          <fpage>26</fpage>
          -
          <lpage>28</lpage>
          October
          <year>2019</year>
          . pp.
          <fpage>512</fpage>
          -
          <lpage>522</lpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -31129-2 47
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Priyadharshini</surname>
          </string-name>
          , G.:
          <article-title>A pragmatic supervised learning methodology of hate speech detection in social media (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Raiyani</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Gonc¸alves, T.,
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nogueira</surname>
          </string-name>
          , V.B.:
          <article-title>Fully connected neural network with advance preprocessor to identify aggression over facebook and twitter</article-title>
          .
          <source>In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying</source>
          , TRAC@COLING,
          <string-name>
            <surname>Santa</surname>
            <given-names>Fe</given-names>
          </string-name>
          , New Mexico, USA. pp.
          <fpage>28</fpage>
          -
          <lpage>41</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ramiandrisoa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
          </string-name>
          , J.: Irit at trac
          <year>2018</year>
          .
          <source>In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbulling</source>
          , TRAC@COLING,
          <string-name>
            <surname>Santa</surname>
            <given-names>Fe</given-names>
          </string-name>
          , New Mexico, USA. pp.
          <fpage>19</fpage>
          -
          <lpage>27</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Ramiandrisoa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benamara</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moriceau</surname>
          </string-name>
          , V.:
          <article-title>IRIT at e-Risk 2018 (regular paper)</article-title>
          .
          <source>In: Conference and Labs of the Evaluation Forum, Living Labs (CLEF</source>
          <year>2018</year>
          ), Avignon, France,
          <volume>10</volume>
          /09/2018-14/09/2018. p.
          <source>(on line)</source>
          .
          <source>CEUR-WS : Workshop proceedings</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Rehurek</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sojka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Software framework for topic modelling with large corpora</article-title>
          .
          <source>In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegand</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A Survey on Hate Speech Detection Using Natural Language Processing</article-title>
          .
          <source>In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. SocialNLP@EACL 2017</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . Valencia,
          <string-name>
            <surname>Spain</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Trotzek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koitka</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.:</given-names>
          </string-name>
          <article-title>Linguistic metadata augmented classifiers at the CLEF 2017 task for early detection of depression</article-title>
          .
          <source>In: Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          . (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallace</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>A sensitivity analysis of (and practitioners' guide to) convolutional neural networks for sentence classification</article-title>
          .
          <source>CoRR abs/1510</source>
          .03820 (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>