<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Deep on Cyberbullying is Always Better Than Brute Force</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fumito Masui</string-name>
          <email>f-masuig@cs.kitami-it.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kitami Institute of Technology</institution>
          ,
          <addr-line>Kitami</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Michal Ptaszynski</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tampere University of Technology</institution>
          ,
          <addr-line>Tampere</addr-line>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>In this paper we present our research on detection of cyberbullying (CB), which stands for humiliating other people through the Internet. CB has become recognized as a social problem, and its mostly juvenile victims usually fall into depression, selfmutilate, or even commit suicide. To deal with the problem, school personnel performs Internet Patrol (IP) by reading through the available Web contents to spot harmful entries. It is crucial to help IP members detect malicious contents more efficiently. A number of research has tackled the problem during recent years. However, due to complexity of language used in cyberbullying, the results has remained only mildly satisfying. We propose a novel method to automatic cyberbullying detection based on Convolutional Neural Networks and increased Feature Density. The experiments performed on actual cyberbullying data showed a major advantage of our approach to all previous methods, including the best performing method so far based on BruteForce Search algorithm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Recent years brought to light the problem of cyberbullying
(CB), defined as exploitation of open online means of
communication, such as Internet forum boards, or social network
services (SNS) to convey harmful and disturbing information
about private individuals, often children and students [Patchin
and Hinduja, 2006]. The problem was further exacerbated
by the popularization of smartphones and tablet computers,
which allow nearly constant use of SNS at home, work/school
or in motion [Bull, 2010].</p>
      <p>Cyberbullying messages commonly ridicule someone’s
personality, appearance or spread rumors. It can lead its
victims to self mutilation, even suicides, or on the opposite, to
attacking their offenders in revenge [Hinduja and Patchin,
2010]. Global increase of cyberbullying cases opened a
public debate on whether such messages could be spotted earlier
to prevent the tragedies, and on the freedom of speech on the
Internet in general.</p>
      <p>In some countries, such as Japan, the problem has become
serious enough to be noticed on a ministerial level [MEXT,
2008]. As one of the ways to deal with the problem school
personnel have started Internet Patrol (IP) to detect Web
forum sites and SNS containing cyberbullying contents.
Unfortunately, as IP is performed manually, reading through
countless amounts of Websites makes it an uphill struggle.</p>
      <p>Some research have started developing methods for
automatic detection of CB to help in this struggle [Ptaszynski
et al., 2010; Dinakar et al., 2012; Ptaszynski et al., 2016].
Unfortunately, even with multiple improvements, the results
have remained merely partially satisfying. This is caused by
a multitude of language ambiguities and styles used in CB.</p>
      <p>In this paper we propose a novel, Convolutional Neural
Networks (CNN) approach to automatic cyberbullying
detection. Moreover, based on the analysis of the characteristics
of CNN and the initial results, we propose an optimization of
CNN by increasing Feature Density of training data.</p>
      <p>The rest of the paper is organized int he following way.
Firstly, we describe the problem of cyberbullying and present
some of the previous research related to ours. Next, we
describe the proposed method and other methods used for
comparison. Further, we present the dataset used in this research,
and explain the evaluation settings, followed by analysis of
experiment results and discussion.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Research Background</title>
      <sec id="sec-2-1">
        <title>Cyberbullying: Description of a Problem</title>
        <p>The choice of media used in communication can cause
increased psychological distance between interlocutors [Rutter,
1987], which can lead to empathy deficit, especially in
Internet behavior [Zheng, 2012]. This is one of the reasons
offensive messages have existed for many years on the Internet.
With the increase of our dependence on technology in
everyday lives, the problem gained on seriousness, and
conceptualized itself in the form of online harassment, or cyberbullying
(CB) [Patchin and Hinduja, 2006; Hinduja and Patchin, 2010;
Pyz˙alski, 2012; Lazuras et al., 2012].</p>
        <p>Some of the first research on CB, based on numerous
surveys [Patchin and Hinduja, 2006] revealed that such
harmful information may include threats, sexual remarks,
pejorative labels, or false statements aimed to humiliate others.
When posted on a social network, such as Facebook, or
Twitter, it could disclose humiliating private information of the
victim defaming and ridiculing them publicly. Some
reported that CB happens for up to eight percent of children
in schools in: Australia [Cross et al., 2009], United States
[Kowalski and Limber, 2007], or Finland [Sourander et al.,
2010]. Studies on CB across Europe indicate that even one
in five young people (not limited to school environment)
could be exposed to cyberbullying [Hasebrink et al., 2008;
Pyz˙alski, 2012]. As of 2015 the urgent need to deal with CB
has even made insurance companies offer policies from costs
that could occur as a result of cyberbullying1.</p>
        <p>In Japan, after a several suicide cases of CB victims,
Ministry of Education, Culture, Sports, Science and Technology
(MEXT) increased the priority of the problem, provided a
yearly updated manual for handling CB cases and
incorporated it in school staff education program [MEXT, 2008].</p>
        <p>To actively deal with the problem, school staff are engaged
in Internet Patrol (IP). Based on the MEXT definition of CB,
they read through all Internet contents, and when they find a
harmful entry they send a deletion request to the Web page
administrator and report the event to the police. Unfortunately,
since IP has been performed manually as a voluntary work,
and the amounts of Internet fora and SNS to read through
grows exponentially, manual Web surveillance has been an
uphill task, and a psychological burden for the IP members.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Previous Research on Cyberbullying Detection</title>
        <p>Although the problem of CB has been studied in social
sciences and child psychology for over ten years[Patchin and
Hinduja, 2006; Pyz˙alski, 2012], only few attempts were made
so far to detect and study the problem with the help of
information technology. Below we present the most relevant
research to this day (also summarized in Table 1).</p>
        <p>As the first recorded study, [Ptaszynski et al., 2010]
performed affect analysis on a small dataset of CB entries to find
out that distinctive features for CB were vulgar words. They
applied a lexicon of such words to train an SVM classifier.
With a number of optimizations they were able to detect
cyberbullying with 88.2% of F-score. However, increasing the
data caused a decrease in results, which made them abandon
SVM as not ideal for language ambiguities typical for CB.</p>
        <p>[Sood et al., 2012] focused on detection of personal
insults, negative influence of which could at most cause the
Internet community to fall into recession. In their research they
used as features single words and bigrams, weighted them
using either presence (1/0), term frequency or tf-idf, and used
them to train an SVM classifier. As a dataset they used a
corpus of six thousand entries they collected from various online
fora. To prepare gold standard for their experiments they used
a crowd-sourcing approach with untrained layperson
annotators hired for a classification task through Mechanical Turk.</p>
        <p>Later, [Dinakar et al., 2012] proposed their approach to
detection and mitigation of cyberbullying. An improvement
of this paper in comparison to previous research was its wider
perspective, in which they did not only focus on the detection,
but also proposed some ways for mitigation. The classifiers
they used scored up to 58-77% of F-score depending on the
kind of detected harassment. Their best proposed classifier
was SVM, which confirmed considerably high effectiveness
of SVM for cyberbullying in English, similarly to the research
done by Ptaszynski et al., for Japanese in 2010.</p>
        <p>An interesting work was done by [Kontostathis et al.,
2013], who performed a thorough analysis of cyberbullying
entries on Formspring.me. They were able to identify
common cyberbullying terms, and applied them in classification
with the use of a machine learning method based on Essential
Dimensions of atent Semantic Indexing (EDLSI).</p>
        <p>[Cano Basave et al., 2013] proposed Violence
Detection Model (VDM), a weekly supervised Bayesian model.
They did not however focused strictly on cyberbullying, but
widened their scope to more generally understood “violence,”
which made the problem more understandable, and thus
feasible for untrained annotators. The datasets were extracted
from violence-related topics on Twiter and DBPedia.</p>
        <p>[Nitta et al., 2013] proposed a method to automatically
detect harmful entries with an extended SO-PMI-IR score
[Turney, 2002] to calculate the relevance of a document with
harmful contents. They also grouped the seed words into
three categories (abusive, violent, obscene) and maximized
the relevance of categories. Their method was evaluated
comparatively high with the best achieved Precision around 91%
(although with Recall less then 10%).</p>
        <p>Unfortunately, a re-evaluation of their method done by
[Ptaszynski et al., 2016] two years later, indicated that the
method lost most of its Precision (over 30 percentage-point
drop) in that time. They hypothesized that this was caused
by external factors such as Web page re-ranking, or changes
in SNS user policies, etc. They improved the method by
automatically acquiring and filtering new harmful seed words
with some success (P=76%). Unfortunately, they were
unable to revive the method to their original performance.</p>
        <p>[Sarna and Bhatia, 2015] based their method on a set of
features like “bad words”, positive/negative sentiment words,
and other common features like pronouns, etc., to estimate
user credibility. They applied those features to four
standard classifiers (Na¨ıve Bayes, kNN, Decision Trees, SVM).
The results of the classification were further used in User
Behavior Analysis model (BAU), and User Credibility
Analysis (CAU) model. Unfortunately, although their approach
suggested inclusion of phenomena such as irony, or rumors,
in practice they only focused on messages containing “bad
words.” Moreover, neither these words, the dataset, nor its
annotation schema were sufficiently described in the paper.</p>
        <p>Finally, [Ptaszynski et al., 2015a] proposed a method of
pattern-based language modeling. The patterns, defined as
ordered combinations of sentence elements, were extracted
with a Brute-Force search algorithm and used in
classification. They reported encouraging initial results, and further
improved the method by applying multiple data
preprocessing techniques [Ptaszynski et al., 2015b]. At present their
method is considered as the best performing method, thus we
will use it in comparison with the method proposed herein.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Research Gaps</title>
        <p>Dataset preparation Some of the above-mentioned
methods suffer from subjective data preparation. In [Cano Basave
et al., 2013] or [Dinakar et al., 2012], the problem was not
defined strictly enough and annotated by laypeople, while CB
is a complex social phenomenon and needs to be handled by
experts. [Sood et al., 2012; Cano Basave et al., 2013]
reformulated the problem to be feasible by laypeople. [Dinakar
et al., 2012] focused on overlapping concepts like sexual or
racial harassment. Finally, [Sarna and Bhatia, 2015] collected
the datasets with no specific standard.</p>
        <p>Feature selection Previous research included as features
mostly words, or simple n-grams (bigrams). Some [Nitta
et al., 2013] applied only a small number of features, while
others [Dinakar et al., 2012] build up more complex
models, however still based mostly on words. Moreover,
using only top-down selected features [Nitta et al., 2013;
Sarna and Bhatia, 2015], while somewhat reasonable (e.g.,
violent or obscene words) requires human workload and
background knowledge on the dataset, thus being inefficient.
Classification methods Although various classifiers have
been tested (SVM, Naive Bayes, or Decision Trees), usually
SVM reached highest, though mildly satisfying scores. To
overcome the performance of previous methods, we apply
Convolutional Neural Networks in classification and optimize
them by studying correlation of results with Feature Density.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Proposed Methods</title>
      <sec id="sec-3-1">
        <title>Data Preprocessing</title>
        <p>The dataset used in this research (see sect. 4.1) was in
Japanese. In transcription of Japanese language, spaces (“ ”)
are not used. Therefore we needed to preprocess the dataset
and make the sentences separable into elements for feature
extraction. We used MeCab2, a Japanese morphological
an2http://taku910.github.io/mecab/
alyzer and CaboCha3, a Japanese dependency structure
analyzer to preprocess the dataset in the following ways4.
• Tokenization: All words, punctuation marks, etc. are
separated by spaces (later: TOK).
• Lemmatization: Like the above but the words are
represented in their generic (dictionary) forms, or “lemmas”
(later: LEM).
• Parts of speech: Words are replaced with their
representative parts of speech (later: POS).
• Tokens with POS: Both words and POS information is
included in one element (later: TOK+POS).
• Lemmas with POS: Like the above but with lemmas
instead of words (later: LEM+POS).
• Tokens with Named Entity Recognition: Words
encoded together with with information on what named
entities (private name of a person, organization,
numericals, etc.) appear in the sentence. The NER information
is annotated by CaboCha (later: TOK+NER).
• Lemmas with NER: Like the above but with lemmas
(later: LEM+NER).
• Chunking: Larger sub-parts of sentences separated
syntactically, such as noun phrase, verb phrase, predicates,
etc., but without dependency relations (later: CHNK).
• Dependency structure: Same as above, but with
information regarding syntactical relations between chunks
(later: DEP).
• Chunking with NER: Information on named entities is
encoded in chunks (later: CHNK+NER).
• Dependency structure with Named Entities: Both
dependency relations and named entities are included in
each element (later: DEP+NER).</p>
        <p>Five examples of preprocessing are represented in Table
2. Theoretically, the more generalized a sentence is, the less
unique and frequent patterns it will contain, but the produced
patterns will be more frequent (e.g., there are more ADJ N
patterns than “pleasant day”). We compared the results for
different preprocessing methods to find out whether it is
better to represent sentences as more generalized or specific.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Feature Extraction</title>
        <p>From each of the eleven dataset versions a Bag-of-Words
language model is generated, producing eleven different models
(Bag-of-Words/Tokens, Bag-of-Lemmas, Bag-of-POS,
Bagof-Chunks, etc.). Sentences from the dataset processed with
those models are used later in the input layer of
classification. We also applied traditional weight calculation scheme,
namely term frequency with inverse document frequency
(tf*idf). Term frequency t f (t; d) refers here to the traditional</p>
        <sec id="sec-3-2-1">
          <title>3https://taku910.github.io/cabocha/</title>
          <p>4Performance of MeCab is reported around 95-97% [Mori and
Neubig, 2014], and Cabocha around 90% [Taku Kudo, 2002] for
normal language. Although we acknowledge that in some cases the
language of the Web could cause errors in POS tagging and word
segmentation, we did not want to retrain the basic tools to fit our
data because we wanted the method to work using widely available
resources, so it was easily reproducible. Also, we assumed that even
if such errors occur, as long as they are systematic, they will not
cause trouble.
–TOK: Kyo¯ j wa j nante j kimochiii j hi j nanda j !
–POS: N j PP j ADV j ADJ j N j AUX j SYM
–TOK+POS: Kyo¯ Njwa PPjnante ADVjkimochi ii ADJj hi Nj
nanda AUXj! SYM
–CHNK: Kyo¯ wa j nante j kimochi ii j hi nanda!
–DEP: *0 3D Kyo¯ wa j *1 2D nante j *2 3D kimochi ii j
*3 -1D hi nanda!
raw frequency, meaning the number of times a term t (word,
token) occurs in a document d. Inverse document frequency
id f (t; D) is the logarithm of the total number of documents
jDj in the corpus divided by the number of documents
containing the term nt . Finally, t f id f refers to term frequency
multiplied by inverse document frequency as in equation 1.
id f (t; D) = log jDj
nt
(1)
3.3</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Classification methods</title>
        <p>
          SVM or Support-vector machines [Cortes and Vapnik,
1995] are a set of classifiers well established in AI and
NLP. SVM represent data, belonging to specified categories,
as points in space, and find an optimal hyperplane to
separate the examples from each category. SVM were often
used in cyberbullying detection (see Table 1). We used
four types of SVM functions, namely, linear - the original
function which finds the maximum-margin hyperplane
dividing the samples; plynomial kernel, in which training
samples are represented in a feature space over polynomials of
the original variables
          <xref ref-type="bibr" rid="ref12 ref16 ref17 ref34 ref7">(also used in [Dinakar et al., 2012])</xref>
          ;
radial basis function (RBF) kernel, which approximates
multivariate functions with a single univariate function, further
radialised to be used in higher dimensions; and sigmoid, i.e.,
hyperbolic tangent function [Lin and Lin, 2003].
        </p>
        <p>Na¨ıve Bayes classifier is a supervised learning algorithms
applying Bayes’ theorem with the assumption of a strong
(naive) independence between pairs of features, traditionally
used as a baseline in text classification tasks.
kNN or the k-Nearest Neighbors classifier takes as input
kclosest training samples with assigned classes and classifies
input sample to a class by a majority vote. It is often applied
as a baseline, next to Na¨ıve Bayes. Here, we used k=1 setting
in which the input sample is assigned to the class of the first
nearest neighbor.</p>
        <p>JRip also known as Repeated Incremental Pruning to
Produce Error Reduction (RIPPER) [Cohen, 1995], which learns
rules incrementally to further optimize them. It is efficient
in classification of noisy text [Sasaki and Kita, 1998] and for
this purpose was used in cyberbullying detection previously
[Dinakar et al., 2012].</p>
        <p>J48 is an implementation of the C4.5 decision tree
algorithm [Quinlan, 1993], which firstly builds decision trees
from a labeled dataset and each tree node selects the optimal
splitting criterion further chosen to make the decision.
Random Forest in training phase create multiple decision
trees to output the optimal class (mode of classes) in
classification phase [Breiman, 2001]. An improvement of RF to
standard decision trees is their ability to correct overfitting to
the training set common in decision trees [Hastie et al., 2013].
SPEC or Sentence Pattern Extraction arChitecture
[Ptaszynski et al., 2015a] is a custom feature extraction and
classification system. The features are defined as ordered
combinations of sentence elements and contain patterns of
tokens and n-grams with disjoint elements. The way the
features are extracted (combinatorial approach) resembles
brute-force search algorithms. Pattern occurrences for each
side of binary class dataset are used to calculate normalized
weight of patterns. Next, the score of a sentence is calculated
as a sum of weights of patterns found in input sentence.
With multiple modifications, such as deletion of ambiguous
patterns, or various dataset preprocessing [Ptaszynski et
al., 2015b] were able to optimize the method to achieve
somewhat high results, and has been considered as best
performing method so far for cyberbullying detection.</p>
        <p>In comparison we used their results optimized either for F1
or BEP (break-even point of Precision and Recall).
CNN or Convolutional Neural Networks are a type of
feedforward artificial neural network are an improved neural
network model, i.e., multilayer perceptron. Although originally,
CNN were designed for image recognition, their performance
has been confirmed in many tasks, including NLP [Collobert
and Weston, 2008] and sentence classification [Kim, 2014].</p>
        <p>We applied a Convolutional Neural Network
implementation with Rectified Linear Units (ReLU) as a neuron
activation function [Nair and Hinton, 2010], and max pooling
[Scherer et al., 2010], which applies a max filter to
nonoverlying sub-parts of the input to reduce dimensionality and
in effect correct over-fitting. We also applied dropout
regularization on penultimate layer, which prevents co-adaptation
of hidden units by randomly omitting (dropping out) some of
the hidden units during training [Hinton et al., 2012].</p>
        <p>We applied two version of CNN. First, with one hidden
convolutional layer containing 100 units was applied as a
proposed baseline. Second, the final proposed method consisted
of two hidden convolutional layers, containing 20 and 100
feature maps, respectively, both layers with 5x5 size of patch
and 2x2 max-pooling, and Stochastic Gradient Descent
[LeCun et al., 2012].
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Experiments</title>
      <p>As the dataset for experiments we used the one created
originally by [Ptaszynski et al., 2010], and also widely used
by [Nitta et al., 2013; Ptaszynski et al., 2015a; 2015b;
2016]. It contains 1,490 harmful and 1,508 non-harmful
entries in Japanese collected from unofficial school Web sites
and fora. The original data was provided by the Human
Rights Research Institute Against All Forms for
Discrimination and Racism in Mie Prefecture, Japan5. The harmful and
non-harmful sentences were collected and manually labeled
by Internet Patrol members (expert annotators) according to
instructions included in the governmental manual for dealing
with cyberbullying [MEXT, 2008]. Some of those
instructions are explained shortly below.</p>
      <p>The MEXT definition assumes that cyberbullying happens
when a person is personally offended on the Web. This
includes disclosing the person’s name, personal information
and other areas of privacy. Therefore, as the first feature
distinguishable for cyberbullying MEXT defines private names
(also initials and nicknames), names of institutions and
affiliations, private information (address, phone numbres, entries
revealing personal information, etc.)</p>
      <p>Moreover, literature on cyberbullying indicates
vulgarities as one of the most distinctive features of
cyberbullying [Patchin and Hinduja, 2006; Hinduja and Patchin, 2014;
Ptaszynski et al., 2010]. Also according to MEXT vulgar
language is distinguishable for cyberbullying, due to its ability
to convey offenses against particular persons. In the prepared
dataset all entries containing any of the above information
was classified as harmful. Some examples from the dataset
are represented in Table 3.
4.2</p>
      <sec id="sec-4-1">
        <title>Experiment Setup</title>
        <p>The preprocessed original dataset provides eleven separate
datasets for the experiment see sect. 3.1 for details). Thus the
experiment was performed eleven times, one time for each
kind of preprocessing. Each of the classifiers (sect. 3.3) was
tested on each version of the dataset in a 10-fold cross
validation procedure The results were calculated using standard
Precision (P), Recall (R), balanced F-score (F1) and
Accuracy (A) As for the winning condition, we looked at which
classifier achieved highest balanced F-score.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Feature Density</title>
        <p>To get a better grasp on the results we also analyzed the
influence of how a dataset was preprocessed on the results.
A dataset is the more generalized, the fewer number of
frequently appearing unique features it produces. Therefore to
estimate dataset generalization level we applied the notion of
Lexical Density (LD) [Ure, 1971]. It is a score representing
an estimated measure of content per lexical units for a given
corpus, calculated as the number of all unique words divided
by the number of all words in the corpus. Since in our
research we use a variety of different features, not only words,
we will further call this measure Feature Density (FD).</p>
        <p>After calculating FD for all used datasets we calculated
Pearson’s correlation coefficient (r-value) to see if there is
any correlation between dataset generalization (FD) and the
results (F-scores).
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Results and Discussion</title>
        <p>All results were summarized in Table 4. The results of the
baselines (kNN, Na¨ıve Bayes) were low, as assumed.
Although these classifiers can be tuned to high scores in typical
sentiment analysis, they were not able to grasp the noisy
language used in cyberbullying. However, it must be noticed
that, especially with the help of named entities (NER), NB
performed rather well, comparably to J48 or JRip.</p>
        <p>When it comes to decision trees-based classifiers, J48
scored low, similarly as in [Dinakar et al., 2012]. However,
Random Forest usually scored better even than SPEC.
Unfortunately, RF is highly time-inefficient, especially compared
to SVM, and thus impractical.</p>
        <p>In many previous research on CB detection SVM were
most commonly used with various success. As we can
observe, choice of appropriate function with good
preprocessing makes SVM comparable even to the proposed CNN. The
best setting was linear-SVM trained on lemmatized dataset
(F1=.825). Moreover, although not scoring the highest, when
the ratio of time-performance to the results is considered,
SVM can be considered as the most efficient classifier6.</p>
        <p>As for the method of preprocessing, most often TOK+NER
and LEM+NER scored highest. This can be explained by the
fact that the data, which was annotated by expert annotators
following official governmental definition of cyberbullying,
often contained revealing of private information. As named
entity recognition covered most of these cases, it is reasonable
that it helped extracting meaningful features. Only for SPEC
the results for the two above settings were not available since
[Ptaszynski et al., 2015b] did not apply them in their research.</p>
        <p>The best so far method, SPEC, was in fact scoring high,
second best after the proposed here CNN. SPEC was also
better then SVM on every dataset, except one (LEM). Although
SPEC is highly time inefficient in the training phase
(generation of all combinatorial patterns), it is easy to implement
and even in its fresh-trained form can be applied without any
additional packages to any external media. This could be an
advantage when including it in CB detection software, such
as smartphone application, etc.</p>
        <p>When it comes to the proposed method, the initial
baselineCNN with only one hidden layer did not perform well,
although still was better than the baselines and comparable to
most of the classifiers.</p>
        <p>However, the final proposed method, namely, the CNN
with two hidden layers, 5x5 patch size, max-pooling and
Stochastic Gradient Descent, outperformed all of the
classi</p>
        <p>6Training SVM, and simple classifiers (NB, kNN) was blazing
fast (several seconds). Simple CNN, Random Forest and JRip was
slower (several minutes to 1-2 hours). SPEC and CNN-2L were the
longest (about one week).
2-nen no tsutsuji no onna meccha busu suki na hito barashimashoka? 1-nen no anoko desuyo ne? kimogatterunde yamete agete kudasai
Wanna know who likes that awfuly ugly 2nd-grade Azalea girl? Its that 1st-grader isn’t it? He’s disgusting, so let’s leave him mercifully in peace.
Aitsu wa busakute sega takai dake no onna, busakute se takai dake ya noni yatara otoko-zuki meccha tarashide panko anna onna owatteru
She’s just tall and apart of that she’s so freakin’ ugly, and despite of that she’s such a cock-loving slut, she’s finished already.</p>
        <p>Shinde kureeee, daibu kiraware-mono de yuumei, subete ga itaitashii...</p>
        <p>Please, dieeee, you’re so famous for being disliked by everyone, everything in you is so pathetic
0.917 CNN-2L 0.685 *p=0.035
0.871 SVM-pol -0.431 p=0.185
0.816 SVM-sig -0.534 p=0.091
0.768 SPEC-BEP -0.550 p=0.133
0.188 RandForest -0.560 p=0.073
0.178 SVM-lin -0.564 p=0.076
0.171 SPEC-F1 -0.636 p=0.066
0.168 SVM-rad -0.639 *p=0.034
0.154 CNN-1L -0.709 *p=0.019
0.144 JRip -0.729 *p=0.011
0.001 NB -0.736 *p=0.013</p>
        <p>J48 -0.791 **p=0.006
kNN -0.809 **p=0.004
p 0.05, p 0.01 !
fiers in almost all settings. The only situation where SPEC
scored higher (only POS features) reveals well known
characteristics of Neural Nets, which perform poorly on a small
number of unique features. As for the second situation, using
only dependency features (DEP), which on the other hand
contained the largest number of features, generation of the
model was not feasible on the applied computer, and thus the
results were not calculated. In the near future we plan to
repeat the experiment in a more efficient environment, such as a
cloud computing service (Google Cloud Platform7, Microsoft
Azure8, or Amazon EC29).</p>
        <p>As for the top three best performing settings, 2-layer CNN
trained on chunks alone scored high, close to 90% of
Fscore. Dependency features with NER was second best with
F1=92.7%. However, the most optimal setting was the 2-layer
CNN trained on chunks with named entities and reached
Fscore equal 93.5%, which is a result far more satisfying then
expected, and exceeds second non-NN classifier (SVM on
lemmas) over 10-percentage points.</p>
        <p>Next, we analyzed the correlation of data preprocessing
with Feature Density (FD). The results were represented in
table 5.</p>
        <p>The results clearly divided the classifiers into three groups.</p>
        <p>First group of the lowest performing classifiers (kNN, NB,</p>
        <sec id="sec-4-3-1">
          <title>7https://cloud.google.com/</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>8https://azure.microsoft.com</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>9https://aws.amazon.com/ec2/</title>
          <p>J48, JRip, CNN-1L) was strongly negatively correlated with
FD, which means these classifiers lose their general
performance the more feature-dense is the model. This suggests
that such classifiers should be fed with a feature set of limited
density.</p>
          <p>Second group contained the classifiers (all SVMs,
Random Forest and both SPEC) that performed somewhat high.
Their correlation with FD was negative from weak (-0.431) to
somewhat high (-0.639). This, supported by the lack of
statistical significance, means that FD is does not correlate well
with such classifiers and some other characteristics should be
used for optimization of dataset preprocessing used in those
classifiers.</p>
          <p>Finally, we made an interesting discovery about the
correlation between FD and our proposed method (2-layer CNN).
The classifier correlated positively nearly strongly with FD.
This suggests that the performance could be improved by
increasing the feature density of the applied dataset. We plan to
follow this path in the nearest future.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>In this paper we presented our research on cyberbullying (CB)
detection. Cyberbullying has become a serious problem in
modern society always connected to the Internet. Manual
measures, such as Internet Patrol, have been undertaken to
deal with CB, unfortunately, reading through the whole
Internet to find CB entries is like looking for a needle in the
haystack, while keeping the CB victims exposed to harmful
messages leads to serious consequences.</p>
      <p>To help quickly respond to ever-growing CB problem,
automatic cyberbullying detection research has started to sprout,
unfortunately, the results have been only partially satisfying.
We proposed a Deep Learning approach to the problem, based
on Convolutional Neural Networks (CNN).</p>
      <p>The proposed optimized CNN model not only
outperformed other classifiers by over 11-percentage-points,
scoring a close to ideal F-score (93.5%), but also revealed an
unusual characteristics, by nearly strongly positively correlating
with Feature Density. This provides an informative hint on
how to improve further not only the proposed method (by
increasing FD of dataset), but also other classifiers (decreasing
FD, etc.).</p>
      <p>In the near future we plan to test the limits of potential
optimization, also by applying different dataset preprocessing
methods (sentiment, etc.), and different language models
(ngram, skip-gram, language combinatorics, etc.). We also plan
to implement the developed model into a smartphone
application for “in-the-field” testing, and further practical research
on cyberbullying and ways of its mitigation.
vised bayesian model for violence detection in social
media. 2013.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Breiman</source>
          , 2001]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Bull</source>
          ,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Glen</given-names>
            <surname>Bull</surname>
          </string-name>
          .
          <article-title>The always-connected generation</article-title>
          .
          <source>Learning &amp; Leading with Technology</source>
          ,
          <volume>38</volume>
          (
          <issue>3</issue>
          ):
          <fpage>28</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>[Cano</surname>
          </string-name>
          Basave et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Amparo</given-names>
            <surname>Elizabeth Cano Basave</surname>
          </string-name>
          , Yulan He, Kang Liu, and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>A weakly super</article-title>
          [Cohen, 1995] William W Cohen.
          <article-title>Fast effective rule induction</article-title>
          .
          <source>In Proceedings of the twelfth international conference on machine learning</source>
          , pages
          <fpage>115</fpage>
          -
          <lpage>123</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Collobert and Weston</source>
          , 2008]
          <string-name>
            <given-names>Ronan</given-names>
            <surname>Collobert</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Weston</surname>
          </string-name>
          .
          <article-title>A unified architecture for natural language processing: Deep neural networks with multitask learning</article-title>
          .
          <source>In Proceedings of the 25th international conference on Machine learning</source>
          , pages
          <fpage>160</fpage>
          -
          <lpage>167</lpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Cortes and Vapnik</source>
          , 1995]
          <string-name>
            <given-names>Corinna</given-names>
            <surname>Cortes</surname>
          </string-name>
          and
          <string-name>
            <given-names>Vladimir</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Cross et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Donna</given-names>
            <surname>Cross</surname>
          </string-name>
          , Therese Shaw, Lydia Hearn, Melanie Epstein,
          <string-name>
            <given-names>Helen</given-names>
            <surname>Monks</surname>
          </string-name>
          , Leanne Lester, and Laura Thomas.
          <article-title>Australian covert bullying prevalence study</article-title>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Dinakar et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Karthik</given-names>
            <surname>Dinakar</surname>
          </string-name>
          , Birago Jones, Catherine Havasi,
          <string-name>
            <given-names>Henry</given-names>
            <surname>Lieberman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Rosalind</given-names>
            <surname>Picard</surname>
          </string-name>
          .
          <article-title>Common sense reasoning for detection, prevention, and mitigation of cyberbullying</article-title>
          .
          <source>ACM Transactions on Interactive Intelligent Systems (TiiS)</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>18</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Hasebrink et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>Uwe</given-names>
            <surname>Hasebrink</surname>
          </string-name>
          , Sonia Livingstone, and
          <string-name>
            <given-names>Leslie</given-names>
            <surname>Haddon</surname>
          </string-name>
          .
          <article-title>Comparing children's online opportunities and risks across europe: Cross-national comparisons for eu kids online</article-title>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Hastie et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Friedman</surname>
          </string-name>
          .
          <source>The Elements of Statistical Learning: Data Mining, Inference, and Prediction</source>
          . Springer Series in Statistics.
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Hinduja and Patchin</source>
          , 2010]
          <string-name>
            <given-names>Sameer</given-names>
            <surname>Hinduja</surname>
          </string-name>
          and Justin W Patchin.
          <article-title>Bullying, cyberbullying, and suicide</article-title>
          .
          <source>Archives of suicide research</source>
          ,
          <volume>14</volume>
          (
          <issue>3</issue>
          ):
          <fpage>206</fpage>
          -
          <lpage>221</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Hinduja and Patchin</source>
          , 2014]
          <string-name>
            <given-names>Sameer</given-names>
            <surname>Hinduja</surname>
          </string-name>
          and Justin W Patchin.
          <article-title>Bullying beyond the schoolyard: Preventing and responding to cyberbullying</article-title>
          . Corwin Press,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Hinton et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Geoffrey E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <article-title>Improving neural networks by preventing coadaptation of feature detectors</article-title>
          .
          <source>CoRR, abs/1207.0580</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[Kim</source>
          , 2014]
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>1746</fpage>
          --
          <lpage>1751</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Kontostathis et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>April</given-names>
            <surname>Kontostathis</surname>
          </string-name>
          , Kelly Reynolds, Andy Garron, and
          <string-name>
            <given-names>Lynne</given-names>
            <surname>Edwards</surname>
          </string-name>
          .
          <article-title>Detecting cyber bullying: query terms and techniques</article-title>
          .
          <source>In Proceedings of the 5th annual acm web science conference</source>
          , pages
          <fpage>195</fpage>
          -
          <lpage>204</lpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <source>[Kowalski and Limber</source>
          , 2007]
          <string-name>
            <surname>Robin</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Kowalski and Susan P Limber.</surname>
          </string-name>
          <article-title>Electronic bullying among middle school students</article-title>
          .
          <source>Journal of adolescent health</source>
          ,
          <volume>41</volume>
          (
          <issue>6</issue>
          ):
          <fpage>S22</fpage>
          -
          <lpage>S30</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Lazuras et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Lambros</given-names>
            <surname>Lazuras</surname>
          </string-name>
          , Jacek Pyz˙alski, Vassilis Barkoukis, and
          <string-name>
            <given-names>Haralambos</given-names>
            <surname>Tsorbatzoudis</surname>
          </string-name>
          .
          <article-title>Empathy and moral disengagement in adolescent cyberbullying: Implications for educational intervention and pedagogical practice</article-title>
          .
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [LeCun et al.,
          <year>2012</year>
          ]
          <article-title>Yann A LeCun, Le´on Bottou, Genevieve B Orr, and Klaus-Robert Mu¨ller. Efficient backprop</article-title>
          .
          <source>In Neural networks: Tricks of the trade</source>
          , pages
          <fpage>9</fpage>
          -
          <lpage>48</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Lin and Lin</source>
          , 2003]
          <string-name>
            <surname>Hsuan-Tien Lin</surname>
          </string-name>
          and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <article-title>A study on sigmoid kernels for svm and the training of nonpsd kernels by smo-type methods</article-title>
          . submitted to Neural Computation, pages
          <fpage>1</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[MEXT</source>
          ,
          <year>2008</year>
          ] MEXT. '
          <article-title>Netto-jo¯ no ijime' ni kansuru taio¯ manyuaru jirei shu¯ (gakko¯, kyo¯in muke) [“bullying on the net” manual for handling and collection of cases (for schools and</article-title>
          teachers)]
          <source>(in japanese)</source>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Mori and Neubig</source>
          , 2014]
          <string-name>
            <given-names>Shinsuke</given-names>
            <surname>Mori</surname>
          </string-name>
          and
          <string-name>
            <given-names>Graham</given-names>
            <surname>Neubig</surname>
          </string-name>
          .
          <article-title>Language resource addition: Dictionary or corpus? In LREC</article-title>
          , pages
          <fpage>1631</fpage>
          -
          <lpage>1636</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Nair and Hinton</source>
          , 2010]
          <string-name>
            <given-names>Vinod</given-names>
            <surname>Nair</surname>
          </string-name>
          and
          <string-name>
            <given-names>Geoffrey E</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Rectified linear units improve restricted boltzmann machines</article-title>
          .
          <source>In Proceedings of the 27th international conference on machine learning (ICML-10)</source>
          , pages
          <fpage>807</fpage>
          -
          <lpage>814</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Nitta et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Taisei</given-names>
            <surname>Nitta</surname>
          </string-name>
          , Fumito Masui, Michal Ptaszynski, Yasutomo Kimura, Rafal Rzepka, and
          <string-name>
            <given-names>Kenji</given-names>
            <surname>Araki</surname>
          </string-name>
          .
          <article-title>Detecting cyberbullying entries on informal school websites based on category relevance maximization</article-title>
          .
          <source>In IJCNLP</source>
          , pages
          <fpage>579</fpage>
          -
          <lpage>586</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>[Patchin and Hinduja</source>
          , 2006]
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Patchin</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Hinduja</surname>
          </string-name>
          .
          <article-title>Bullies move beyond the schoolyard a preliminary look at cyberbullying</article-title>
          .
          <source>Youth violence and juvenile justice</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <fpage>148</fpage>
          -
          <lpage>169</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [Ptaszynski et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ptaszynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dybala P.</given-names>
            ,
            <surname>Matsuba</surname>
          </string-name>
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Masui</surname>
          </string-name>
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Rzepka</surname>
          </string-name>
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Araki</surname>
          </string-name>
          <string-name>
            <given-names>K.</given-names>
            , and
            <surname>Momouchi</surname>
          </string-name>
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>In the service of online order: Tackling cyber-bullying with machine learning and affect analysis</article-title>
          .
          <source>International Journal of Computational Linguistics Research</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <fpage>135</fpage>
          -
          <lpage>154</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Ptaszynski et al., 2015a]
          <string-name>
            <given-names>Michal</given-names>
            <surname>Ptaszynski</surname>
          </string-name>
          , Fumito Masui, Yasutomo Kimura, Rafal Rzepka, and
          <string-name>
            <given-names>Kenji</given-names>
            <surname>Araki</surname>
          </string-name>
          .
          <article-title>Brute force works best against bullying</article-title>
          .
          <source>In IJCAI 2015 Workshop on Intelligent Personalization (IP</source>
          <year>2015</year>
          ), pages
          <fpage>28</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [Ptaszynski et al., 2015b]
          <string-name>
            <given-names>Michal</given-names>
            <surname>Ptaszynski</surname>
          </string-name>
          , Fumito Masui, Yasutomo Kimura, Rafal Rzepka, and
          <string-name>
            <given-names>Kenji</given-names>
            <surname>Araki</surname>
          </string-name>
          .
          <article-title>Extracting patterns of harmful expressions for cyberbullying detection</article-title>
          .
          <source>In Proceedings of 7th Language and Technology Conference(LTC'15)</source>
          , pages
          <fpage>370</fpage>
          -
          <lpage>375</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [Ptaszynski et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ptaszynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Masui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nitta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hatakeyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kimura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rzepka</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Araki</surname>
          </string-name>
          .
          <article-title>Sustainable cyberbullying detection with category-maximized relevance of harmful phrases and double-filtered automatic optimization</article-title>
          .
          <source>International Journal of Child-Computer Interaction</source>
          ,
          <volume>8</volume>
          :
          <fpage>15</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [Pyz˙alski, 2012]
          <article-title>Jacek Pyz˙alski. From cyberbullying to electronic aggression: Typology of the phenomenon</article-title>
          .
          <source>Emotional and behavioural difficulties</source>
          ,
          <volume>17</volume>
          (
          <issue>3-4</issue>
          ):
          <fpage>305</fpage>
          -
          <lpage>317</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>[Quinlan</source>
          ,
          <year>1993</year>
          ]
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Quinlan</surname>
          </string-name>
          .
          <source>C4</source>
          .
          <article-title>5: Programs for Machine Learning</article-title>
          . Morgan Kaufmann series in machine learning. Morgan Kaufmann Publishers,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>[Rutter</source>
          , 1987]
          <string-name>
            <given-names>D.R.</given-names>
            <surname>Rutter</surname>
          </string-name>
          . Communicating by Telephone.
          <source>International Series in Experimental Social Psychology</source>
          , Vol
          <volume>15</volume>
          . Elsevier Science Limited,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>[Sarna and Bhatia</source>
          , 2015]
          <string-name>
            <given-names>Geetika</given-names>
            <surname>Sarna</surname>
          </string-name>
          and
          <string-name>
            <given-names>MPS</given-names>
            <surname>Bhatia</surname>
          </string-name>
          .
          <article-title>Content based approach to find the credibility of user in social networks: an application of cyberbullying</article-title>
          .
          <source>International Journal Of Machine Learning and Cybernetics</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>[Sasaki and Kita</source>
          , 1998]
          <string-name>
            <given-names>Minoru</given-names>
            <surname>Sasaki</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kenji</given-names>
            <surname>Kita</surname>
          </string-name>
          .
          <article-title>Rulebased text categorization using hierarchical categories</article-title>
          .
          <source>In Systems, Man, and Cybernetics</source>
          ,
          <year>1998</year>
          . 1998 IEEE International Conference on, volume
          <volume>3</volume>
          , pages
          <fpage>2827</fpage>
          -
          <lpage>2830</lpage>
          . IEEE,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [Scherer et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Dominik</given-names>
            <surname>Scherer</surname>
          </string-name>
          , Andreas Mu¨ller, and
          <string-name>
            <given-names>Sven</given-names>
            <surname>Behnke</surname>
          </string-name>
          .
          <article-title>Evaluation of pooling operations in convolutional architectures for object recognition</article-title>
          .
          <source>In International Conference on Artificial Neural Networks</source>
          , pages
          <fpage>92</fpage>
          -
          <lpage>101</lpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [Sood et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Sara</given-names>
            <surname>Owsley</surname>
          </string-name>
          <string-name>
            <surname>Sood</surname>
          </string-name>
          , Elizabeth F Churchill,
          <string-name>
            <given-names>and Judd</given-names>
            <surname>Antin</surname>
          </string-name>
          .
          <article-title>Automatic identification of personal insults on social news sites</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>63</volume>
          (
          <issue>2</issue>
          ):
          <fpage>270</fpage>
          -
          <lpage>285</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [Sourander et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sourander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.B.</given-names>
            <surname>Klomek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ikonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lindroos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Luntamo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Koskelainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ristkari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Helenius</surname>
          </string-name>
          .
          <article-title>Psychosocial risk factors associated with cyberbullying among adolescents: A population-based study</article-title>
          .
          <source>Archives of general psychiatry</source>
          ,
          <volume>67</volume>
          (
          <issue>7</issue>
          ):
          <fpage>720</fpage>
          -
          <lpage>728</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <given-names>[Taku</given-names>
            <surname>Kudo</surname>
          </string-name>
          ,
          <year>2002</year>
          ]
          <article-title>Yuji Matsumoto Taku Kudo</article-title>
          .
          <article-title>Japanese dependency analysis using cascaded chunking</article-title>
          .
          <source>In CoNLL 2002: Proceedings of the 6th Conference on Natural Language Learning 2002 (COLING 2002 Post-Conference Workshops)</source>
          , pages
          <fpage>63</fpage>
          -
          <lpage>69</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <source>[Turney</source>
          , 2002]
          <string-name>
            <given-names>Peter D</given-names>
            <surname>Turney</surname>
          </string-name>
          .
          <article-title>Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews</article-title>
          .
          <source>In Proceedings of ACL 2002</source>
          , pages
          <fpage>417</fpage>
          -
          <lpage>424</lpage>
          . Association for Computational Linguistics,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>[Ure</source>
          , 1971]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ure</surname>
          </string-name>
          .
          <article-title>Lexical density and register differentiation</article-title>
          .
          <source>Applications of Linguistics</source>
          , pages
          <fpage>443</fpage>
          -
          <lpage>452</lpage>
          ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [Zheng,
          <year>2012</year>
          ]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zheng</surname>
          </string-name>
          .
          <source>Evolving Psychological and Educational Perspectives on Cyber Behavior. Premier reference source. Information Science Reference</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>