<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Capturing Political Polarization of Reddit Submissions in the Trump Era</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISTI-CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1943</year>
      </pub-date>
      <abstract>
        <p>The American political situation of the last years, combined with the incredible growth of Social Networks, led to the di usion of political polarization's phenomenon online. Our work presents a model that attempts to measure the political polarization of Reddit submissions during the rst half of Donald Trump's presidency. To do so, we design a text classi cation task: political polarization of submissions is assessed by quantifying those who align themselves with pro-Trump ideologies and vice versa. We build our ground truth by picking submissions from subreddits known to be strongly polarized. Then, for model selection, we use a Neural Network with word embeddings and Long Short Time Memory layer and, nally, we analyze how model performances change trying di erent hyper-parameters and types of embeddings.</p>
      </abstract>
      <kwd-group>
        <kwd>Political Polarization Classi cation Text Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>During the last decade, the rise of social networks has drastically changed how
people interact and communicate. More than 45% of world population uses Social
network for an average of 3 hours per day. As these platforms grow, the number
of opinions shared publicly among users increases accordingly.</p>
      <p>
        In this multifaceted panorama, particular attention turned on the di usion
of political discourse online and its implications. With the signi cant increase
of user-Generated content, research in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] underlines that sharing political
beliefs and opinions online makes people feel more active, engaged, and interested.
This attitude, combined with the contemporary political situation, leads to the
spreading of political polarization. This phenomenon refers to the increasing gap
between two di erent political ideologies.
      </p>
      <p>
        Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This volume is published
and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy.
During Donald Trump's presidency, online polarization has found its fertile
ground: the debate between Trump supporters and anti-Trump citizens is
becoming even more complex and uncivil [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>In this paper, we propose a model that aims at measuring the political
polarization of Reddit submissions in the Trump Era. To solve the issue, we model
it as a text classi cation task. Given new submissions, we assess their political
polarization by quantifying how those align themselves with pro-Trump
ideologies and vice versa.</p>
      <p>This task is a key step of a wider study, still ongoing, that attempts to identify
and analyze Echo Chambers on Social Networks. Since an Echo Chamber is a
strongly polarized environment, to assess its existence, it is necessary to dispose
of a tool able to measure its possible components' polarization.</p>
      <p>
        For our purposes, we choose Reddit as a subject of study. Reddit, as stated by
its slogan `The front page of the internet', is not a traditional social network but
rather a website dedicated to the social forum and discussion threads. Founded
in 2005, Reddit is now the nineteenth most visited website on the internet.1
Also, Reddit is composed of thousands of communities, each dedicated to a
speci c topic, called subreddits. Its internal structure makes it easy to nd
politically polarized communities to analyze. Additionally, since users can write
anonymously and posts aren't limited in length, this platform is particularly
active in political discussions [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>The rest of the paper is organized as follows. In Section 2 we discuss the
literature involving text classi cation techniques in general as well as applied
to the political domain. Section 3 describes the phases of data extraction and
data preparation necessary to build our nal dataset. In Section 4 we explore
the di erent stages of model selection. Finally, Section 5 concludes the paper
and set future research directions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>With the growth of Social Networks, researchers have focused on the study of
methods to extract information from a large quantity of unstructured text data.</p>
      <p>
        The rst key component of this process is text preprocessing, a combination
of text mining techniques that allows cleaning data for further steps. Both works
of Camacho-Collados et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and Uysal et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], show how a weighted usage
of these methods can remarkably improve nal performances of a text classi er.
      </p>
      <p>
        Another aspect to take into account when performing classi cation is the
type of words' representation used. Social networks textual data, as posts, are
sequences of words. Capturing their semantic valence is important to represent
not only single words but also the context in which they are inserted. As stated
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], traditional types of word representations, as a bag-of-word model, encode
words as discrete symbols that cannot be directly compared to one another.
Consequently, this approach is not suitable for the sentences' representation.
      </p>
      <sec id="sec-2-1">
        <title>1 https://www.alexa.com/topsites</title>
        <p>
          Word embedding (e.g., Word2vec [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and Glove [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]), instead, maps words to
a continuously-valued low dimensional space, capturing their semantic and
syntactic features.
        </p>
        <p>
          Recently has been shown how deep learning models can be fruitfully used
to address Natural Language Processing related classi cation tasks. The survey
in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] synthesizes powerful models for sequence learning. The authors underline
how Feed Forward Neural networks do not t well with data sequences because
after each word is processed, the entire state of the network is reset. Recurrent
Neural Networks (RNNs), instead, assigning more weights to previous words of
a sequence are suitable for this kind of data. In particular, Long Short Time
Memory network (LSTM), a type of RNN, can maintain long term dependency
through a complex gates mechanism, overcoming the vanishing gradient problem
of RNN [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          Social Network textual data are widely used for various text classi cation
applications such as information ltering [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], personality traits recognition [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
or hate speech detection [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          Concerning political leaning classi cation on such type of data, literature is
not so exhaustive. In [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Chang et al. propose a model to predict the political
a liation of Facebook posts in Taiwan. They build two models, one using the
KNearest Neighbors algorithm and the other one AdaBoost combined with Naive
Bayes classi er. Rao et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] use word embeddings and LSTM to predict if
a Twitter post belongs to Republican or Democratic beliefs. To the best of
our knowledge, researches about political classi cation on Reddit is sparse or
nonexistent.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data Description and Preparation</title>
      <p>In this section, we describe the phases of Data Extraction and Data Preparation
necessary to build our nal dataset.</p>
      <p>
        Data was obtained through the Pushshift API [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that o ers aggregation
endpoints to analyze Reddit activity from June 2005 to nowadays. We use the
submission endpoint to pick submissions posted from January 2017 to May 2019,
a period covering the rst two years and a half of Donald Trump's presidency.
      </p>
      <p>For this text classi cation problem, we select submissions belonging to
subreddits known to be either pro-Trump or anti-Trump. Based on subreddits
descriptions and considerations explained at the end of this section, we choose
r/The Donald for the rst group and r/Fuckthealtright and
r/EnoughTrumpSpam for the second. To have a balanced dataset, for the
antiTrump data we merge two subreddits, strictly related both on the users and the
keywords.2 In Table 1 are illustrated some dataset statistics.</p>
      <p>For each selected submission, we collect the elds id, self text, and title,
respectively, the identi er, the content, and the title of the submission. We merge
the latter two elds in a unique one to give it as input for LSTM, because the
self text of a post may be empty or just a reference to the title itself. By doing
so, we make sure to have a text capturing what the user is actually trying to
convey. Then, we assign to each submission a label: 1 if it belongs to pro-Trump
subreddit, 0 otherwise.</p>
      <p>Due to the nature itself of the social network that promotes free speech and
expression, textual data gathered from Reddit platform is dirty and noisy. For
this reason, we apply a text preprocessing pipeline to give clean data to LSTM:
in particular, we convert text to lowercase, then remove punctuation and
number, as well as stop words.</p>
      <p>Also, by looking at the submissions extracted, we observe that several of
these are composed only by few words. This is probably because, during data
extraction, we only select textual data, removing all multimedia contents related
to each submission. To avoid a ecting LSTM performance, we remove from our
original dataset all submissions shorter than six words.</p>
      <p>Lastly, to check the validity of our initial choice of polarized subreddits, we
identify some of the most frequent bigrams of each subreddit to analyze their
frequencies in the opposite one. As we can see in Fig. 1, these words are
discriminant and semantically related to their belonging subreddit.</p>
      <sec id="sec-3-1">
        <title>2 https://subredditstats.com/r/EnoughTrumpSpam</title>
        <p>https://subredditstats.com/r/Fuckthealtright
In this Section, we describe the model selected to measure the political
polarization of Reddit submissions.</p>
        <p>To assess the best model, we follow these subsequent steps: preprocessing of
input text sequences, model selection, and testing on new instances.
Neural Network Preprocessing. To give our submissions as input for our
model, we have to vectorize them, turning each one in a sequence of integers.</p>
        <p>
          To do so, we create a lexicon index based on word frequency, where 0
represents padding: indexes are assigned in descending order of frequency. Each text
is then transformed into a sequence of integers, replacing each word with the
corresponding index of the lexicon (e.g., the sentence "president trump says"
becomes [
          <xref ref-type="bibr" rid="ref1 ref5">5,1,39</xref>
          ]). We use the whole lexicon for training the model. Lastly, to
optimize the matrix operations in the batch, we pad each sequence to the same
length, based on the mean length of all submissions. Sequences longer than this
length are truncated.
        </p>
        <p>
          Neural Network Architecture. As stated in Section 2, since our training
data consists of sequences, standard neural networks are not suitable for our
task. For such a reason, we adopt a Long Short-Term Memory network [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
a type of Recurrent neural network able to model the meaning of a sentence
by taking into account its paradigmatic structure. Fig. 2 shows the high level
architecture of our model, composed of three layers:
1. Embedding Layer: This layer takes as input the sequences of integers w1,...,
wt previously created and turns them into 100-dimensional dense vectors xt.
t is the length of the sequence. During model selection, we decide to see how
performance changes based on the use of Glove pre-trained word embeddings
or embeddings learned directly from the text.
2. LSTM Layer: This layer consists of multiple LSTM units. Each unit
maintains a memory cell. Cells encode the information of the inputs observed up
to that step through the gates mechanism. The rst gate is called input gate
and controls whether the memory cell is updated. The second gate is the
forget gate that controls if the cell is zero; lastly, the output gate controls
whether the information of the cell state is visible. To avoid over- tting, we
add a dropout regularization of 0.3.
3. Output Layer: This layer is a fully connected layer which outputs a single
neuron to perform binary predictions. We use the Sigmoid activation function
to have a probability output between 0 and 1. Submissions with a probability
score greater or equal than 0.5 will be labeled as pro-Trump, the others as
anti-Trump. As loss function, we use Binary cross-entropy and as optimizer
Adam.
        </p>
        <p>Model Selection. We use a training set of 302,763 instances, perfectly
balanced among the two target classes. To nd the model with the highest
performance, we perform a 3-fold Cross-Validation trying di erent values of LSTM
units [32,64,128] with a xed embedding dimension of 100. We experience such
con gurations, both with learned embeddings and pre-trained ones. Results are
shown in Table 2. The model with higher performance is the one obtained using
100 dimensions Glove pre-trained embeddings and 128 LSTM units that reaches
an accuracy score of 84,63% on the training set and 82,96% on validation.
Model Evaluation. Model performances are rstly assessed on a polarized test
set built by picking unseen submissions belonging to the previously selected
subreddits.3 In detail, we extract 38,906 submissions posted from 2nd May to 1st
December 2019. As shown in the rst line of Table 3, model reaches an accuracy
of 74%.</p>
        <p>We further assess model performances on less polarizing topics wrt Donald
Trump's persona. To do so, we choose four main topics addressing sociopolitical
issues, and for each of them, we select several related subreddits via Reddit lists.4
Three out of four are ne-grained (i.e., gun control and legalization, minority
discrimination, immigration policies) while the latter is more general in the topic
(i.e., political discussion). Submissions extracted are posted from January 2017</p>
      </sec>
      <sec id="sec-3-2">
        <title>3 r/The Donald, r/EnoughTrumpSpam, r/Fuckthealtright</title>
      </sec>
      <sec id="sec-3-3">
        <title>4 http://redditlist.com/</title>
        <p>https://www.reddit.com/r/ListOfSubreddits/wiki/listofsubreddits
to December 2019.</p>
        <p>In this scenario, we do not have a ground truth to evaluate model results among
the di erent topics. Thus, we try a di erent approach: validating our model
through polarized users. In detail, we rst compute for each user in training set
his polarization score by averaging his posts' prediction scores. Then we select
the most polarized ones (i.e., score 0:2 for anti-Trump users and score 0:8 for
pro-Trump ones), obtaining 1,150 nal users. So, we look for their posts among
the aforementioned four topics, labeling them according to the user polarization
score previously obtained. Finally, we assess model performances on the four
new test sets.</p>
        <p>By comparing model performances across the ve di erently polarized topics,
we can assess model's ability to generalize. As shown by Table 3, even though
test sets are not directly comparable, due to their di erence in size, our model
reaches quite good results with accuracy always greater than 70%. Lastly, from
the sample of predictions in Table 4 we denote similar trends across di erent
topics: submissions with an higher prediction score convey a strongly Republican
view while the lower ones a de nitively Democratic leaning.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Works</title>
      <p>In this study, we propose a model that aims at measuring the political
polarization of Reddit submissions in the Trump Era. This task is a key step of a
wider study focused on the identi cation of Echo Chambers on Reddit. Since an
Echo Chamber is a strongly polarized environment, to assess its existence it is
necessary to nd a way to measure its possible components' polarization.</p>
      <p>To do so, we use a Neural Network with word embeddings and Long Short
Time Memory layer to quantify how submissions align themselves with Trump's
ideologies and vice versa. The best model is the one built with Glove embeddings</p>
      <p>Sumission topic score
Tropical storm Barry: Obama has transformed his hatred for America into a new type of treason Polarized topic 0.93
cTornusmerpv'astrive-eelTeecxtiaosnschroiwsiss:iHt.is internal polls show it, national polls show it, and even a poll in reliably Polarized topic 0.02
Never forget: Hillary worked hard to keep wages of Haitian garment workers under 31 cents per hour Political discussion 0.93
American soldiers aren't dying for our freedom in Syria, Iraq and Afghanistan. They're dying for nothing Gun control 0.15
Poor Immigrants Are The Least Likely Group To Use Welfare, Despite Trump's Claims missing Immigration 0.03
dFeecmlainreissttdhealtibmereantealryeaacntsislsikuee aancdonBdueszczefeneddintgreansdsshoitle. to men. When they react like she's being an asshole, Discrimination 0.90
that reaches 83% of accuracy on validation and 74% on the test set.
Furthermore, we assess model performances also on less polarizing topics to evaluate
model's ability to generalize. We achieve quite good results across four di
erently polarized topics with accuracy ranging from 72% to 82%.</p>
      <p>As future research direction, we will leverage the proposed model to discover
political Echo Chambers on Reddit: more speci cally, analyzing their internal
structure, their size and their persistence over time.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Bartels</surname>
          </string-name>
          .
          <article-title>Partisanship in the trump era</article-title>
          .
          <source>The Journal of Politics</source>
          ,
          <volume>80</volume>
          (
          <issue>4</issue>
          ):
          <volume>1483</volume>
          {
          <fpage>1494</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Baumgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zannettou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Keegan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Squire</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Blackburn.</surname>
          </string-name>
          <article-title>The pushshift reddit dataset</article-title>
          . preprint arXiv:
          <year>2001</year>
          .08435,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Boulianne</surname>
          </string-name>
          .
          <article-title>Social media use and participation: A meta-analysis of current research</article-title>
          . Information, communication &amp; society,
          <volume>18</volume>
          (
          <issue>5</issue>
          ):
          <volume>524</volume>
          {
          <fpage>538</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Pilehvar</surname>
          </string-name>
          .
          <article-title>On the role of text preprocessing in neural network architectures: An evaluation study on text categorization and sentiment analysis</article-title>
          .
          <source>preprint arXiv:1707.01780</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>C.-C. Chang</surname>
            ,
            <given-names>S.-I.</given-names>
          </string-name>
          <string-name>
            <surname>Chiu</surname>
            , and
            <given-names>K.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Hsu</surname>
          </string-name>
          .
          <article-title>Predicting political a liation of posts on facebook</article-title>
          .
          <source>In International Conference on Ubiquitous Information Management and Communication</source>
          , pages
          <fpage>1</fpage>
          <issue>{8</issue>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Jones</surname>
          </string-name>
          .
          <article-title>The reddit self-post classi cation task (rspct): a highly multiclass dataset for text classi cation (preprint</article-title>
          ).
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>K.</given-names>
            <surname>Kowsari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jafari Meimandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heidarysafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mendu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barnes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Brown</surname>
          </string-name>
          .
          <article-title>Text classi cation algorithms: A survey</article-title>
          .
          <source>Information</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <fpage>150</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Z. C.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Berkowitz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Elkan</surname>
          </string-name>
          .
          <article-title>A critical review of recurrent neural networks for sequence learning</article-title>
          .
          <source>preprint arXiv:1506.00019</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>preprint arXiv:1301.3781</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>R.</given-names>
            <surname>Nithyanand</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          <article-title>Scha ner, and</article-title>
          <string-name>
            <given-names>P.</given-names>
            <surname>Gill</surname>
          </string-name>
          .
          <article-title>Online political discourse in the trump era</article-title>
          .
          <source>preprint arXiv:1711.05303</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>J. Pennington</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Socher</surname>
            , and
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          . Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In Empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          {
          <fpage>1543</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>B. Y.</given-names>
            <surname>Pratama</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarno</surname>
          </string-name>
          .
          <article-title>Personality classi cation based on twitter text using naive bayes, knn and svm</article-title>
          .
          <source>In 2015 International Conference on Data and Software Engineering (ICoDSE)</source>
          , pages
          <fpage>170</fpage>
          {
          <fpage>174</fpage>
          . IEEE,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>A.</given-names>
            <surname>Rao</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Spasojevic</surname>
          </string-name>
          .
          <article-title>Actionable and political text classi cation using word embeddings and lstm</article-title>
          .
          <source>preprint arXiv:1607.02501</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>M. Sundermeyer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Schlu</surname>
          </string-name>
          <article-title>ter, and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Ney</surname>
          </string-name>
          .
          <article-title>Lstm neural networks for language modeling</article-title>
          .
          <source>In International speech communication association</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>A. K. Uysal</surname>
            and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gunal</surname>
          </string-name>
          .
          <article-title>The impact of preprocessing on text classi cation</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>50</volume>
          (
          <issue>1</issue>
          ):
          <volume>104</volume>
          {
          <fpage>112</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Robinson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Tepper</surname>
          </string-name>
          .
          <article-title>Detecting hate speech on twitter using a convolution-gru based deep neural network</article-title>
          .
          <source>In European semantic web conference</source>
          , pages
          <volume>745</volume>
          {
          <fpage>760</fpage>
          . Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>