TexTrolls: Identifying Trolls on Twitter with Textual
and Affective Features
Bilal Ghanema , Davide Buscaldib and Paolo Rossoa
a
    Universitat Politècnica de València, Valencia, Spain
b
    LIPN - Université Sorbonne Paris Nord, France


                                         Abstract
                                         The growing suspicious online users, that usually are called trolls, are one of the main sources of hate,
                                         fake, and deceptive online messages. Some agendas are utilizing these harmful accounts to spread
                                         incitement tweets, and as a consequence, the online users get deceived. The challenge in detecting such
                                         accounts is that they conceal their real identities, adding more difficulty to identify them using just their
                                         social network information. In this paper we propose affective and lexical information -based models to
                                         detect the online trolls such as those that were discovered during the US 2016 presidential elections. Our
                                         approach is mainly based on features that take into account topic information and profiling features
                                         to identify the accounts from their way of writing tweets. We inferred the topic information in an
                                         unsupervised way and we show that coupling them with the affective and lexical features enhanced
                                         the performance of the proposed models. We find that the proposed profiling features perform the best
                                         comparing to the other features. Our approach shows superior results in comparison to several strong
                                         baselines.

                                         Keywords
                                         Profiling trolls, Twitter, topic modelings


1. Introduction
Recent years have seen a large increase in the amount of disinformation and fake news spread
on social media. False information has been used to spread fear and anger among people, which
in turn, provoked crimes in some countries. The US in the recent years experienced many
similar cases during the presidential elections, such as the one commonly known as “Pizzagate”1 .
Later on, Twitter declared that they had detected a suspicious campaign originated in Russia by
an organization named Internet Research Agency (IRA), and targeted the US to affect the results
of the 2016 presidential elections2 . The desired goals behind these accounts are to spread fake
and hateful news to further polarize the public opinion. Such attempts are not limited to Twitter,
since Facebook announced in mid-2019 that they detected a similar attempt originating from
UAE, Egypt and Saudi Arabia and targeting other countries such as Qatar, Palestine, Lebanon


OHARS’20: Workshop on Online Misinformation- and Harm-Aware Recommender Systems, September 25, 2020, Virtual
Event
email: bigha@doctor.upv.es (B. Ghanem); buscaldi@lipn.univ-paris13.fr (D. Buscaldi); prosso@dsic.upv.es
(P. Rosso)
                                       © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings         CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
                    https://www.rollingstone.com/politics/politics-news/anatomy-of-a-fake-news-scandal-125877/
                  2
                    https://blog.twitter.com/official/en_us/topics/company/2018/2016-election-update.html


                                                                                                           4
and Jordan3 . This attempt used Facebook pages, groups, and user accounts with fake identities
to spread fake news supporting their ideological agendas. The automatic detection of such
attempts is very challenging, since the true identity of these accounts is hidden by imitating
profiles of real persons from the targeted audience; in addition, sometimes these accounts
publish their suspicious idea in a vague way through their tweets’ messages.
   A previous work [1] showed that such suspicious accounts are not bots in a strict sense and
they argue that they could be considered as “software-assisted human workers”. According to
Clark et al. [2], the online suspicious accounts can be categorized into 3 main types: Robots,
Cyborgs, and Human Spammers. We consider IRA accounts as the new emerging type called
trolls, which is similar to Cyborgs except that the former focuses on targeting communities
instead of individuals4 .
   In this work, we identify online trolls in Twitter, namely IRA trolls, from a textual perspective.
We study the effect of a set of text-based features, including affective ones, and we propose
machine learning models that take into account topic information. These models can be applied
to go beyond the textual superficial features, that are used in the related works, to detect
advanced online manipulating efforts. We also conduct an in-depth analysis of the trolls’
language to provide evidence for the reader about their online manipulation campaigns. In this
research work, we aim to answer three research questions:

          RQ1 Can we detect IRA trolls from only a textual perspective?
          RQ2 Does the topic information improve the detection performance?
          RQ3 How IRA campaign utilized the emotions to affect the public opinions?

   The rest of the paper is structured as follows. In the following section, we present an overview
on the literature work on IRA trolls. In Section 3, we describe how the used dataset was compiled.
Section 4 describes our proposed features for our approaches. The experiments and results are
presented in Section 6. We present an analysis for the trolls campaign in Section 7. Finally, we
discuss the limitations of the proposed models and we draw some conclusions and possible
future work on the identification of trolls.


2. Related Work
2.1. IRA Trolls
After the 2016 US elections, Twitter detected a suspicious attempt by a large set of accounts
to influence the results of the elections. Due to this event, various research works about the
Russian troll accounts started to appear [3, 4, 1, 5, 6].
   These research works studied IRA trolls from several perspectives, but most of them focused
on analyzing them and studying their strategies, instead of building a detection model. The
work in [5] studied the links’ domains that were mentioned by IRA trolls and how much they
overlap with other links used in tweets related to “Brexit”. In addition, they compare “Left” and
“Right” ideological trolls in terms of the number of re-tweets they received, number of followers,
    3
        https://newsroom.fb.com/news/2019/08/cib-uae-egypt-saudi-arabia/
    4
        https://itstillworks.com/difference-between-troll-cyberbully-5054.html


                                                          5
etc, and the online propaganda strategies they used. The authors in [3] analyzed IRA campaign
in both Twitter and Facebook, and they focus on the evolution of IRA paid advertisements on
Facebook before and after the US presidential elections from a topic perspective, e.g. whose
topics IRA trolls targeted to seed discord among the public.
   The analysis works on IRA trolls were not limited only to analyse the tweets content, but they
also considered profile description, screen name, application client, geo-location, timezone, and
number of links used per each media domain [4]. There is a probability that Twitter has missed
some IRA accounts that maybe were less active than the others. Based on this hypothesis,
the work in [1] (Still Out There) built a machine learning model based on profile, language
distribution, and stop-words usage features to detect IRA trolls in a newly sampled data from
Twitter. Other works tried to model IRA campaign not only by focusing on the trolls accounts,
but also by examining who interacted with the trolls by sharing their contents [7]. Similarly,
the work [6] proposed a model that made use of the political ideologies of users, bot likelihood,
and activity-related account metadata to predict users who spread the trolls’ contents.

2.2. Online Bots
Online social bots have been a source of nuisance for the social media users for their suspicious
behaviour in retweeting duplicated tweets or boosting advertisements tweets. The work in [8]
studied a large portion of Twitter bots collected during a study of seven months. The authors
study the behaviour of these bots and they grouped them into a set of categories, e.g. duplicate
spammers, malicious promoters, friend infiltrators.
   Bots detection has gained the attention of the research community. Recently, a shared task
on bots profiling in Twitter [9] has been organized at PAN-2019 Lab targeting both Spanish
and English languages. The best performing system [10] for the English language obtained an
accuracy value of ∼96%. The system is based on stylistic features such as terms occurrence,
tweets length, number of capitalized words, etc., and employed a Random Forest classifier.
Another work [11] proposed a system called SentiBot for detecting Indian bots in Twitter. The
approach uses a large combination of features but mainly focuses on sentiment features. The
used features in the previous works were not limited to stylistic and sentiments features: the
authors of [12] proposed a SOTA system called Botometer 5 that uses sentiment, friend, content,
user, temporal, and network features.


3. Data
To model the detection of the IRA trolls, we considered a large dataset of both regular users
(legitimate accounts) and IRA troll accounts. In the following we describe the dataset. In Table 1
we summarize its statistics.


   5
       https://github.com/IUNetSci/botometer-python


                                                      6
Table 1
Statistics of the dataset.
                                                    IRA Trolls      Regular Accounts
                           Total # of Accounts         2,023             94,643
                           Total # of Tweets          ∼ 1.8 M            ∼ 1.9 M
                           Avg. # of Tweets             357                19
                           Avg. # of Followers         1,834              9,867
                           Avg. # of Followees         1,025              2,277


3.1. The Internet Research Agency Dataset
We used the IRA dataset6 that was released by Twitter after identifying the Russian trolls. The
original dataset contains 3, 841 accounts, but we use a lower number of accounts and tweets
after filtering them: We focus on accounts that use English as the main language. In fact, our
goal is to detect Russian accounts that mimic a regular US user. Then, we remove from these
accounts non-English tweets, and maintain only tweets that were tweeted originally by them.
Our final IRA accounts list contains 2,023 accounts.

3.2. Regular Accounts
To contrast IRA behaviour, we sampled a large set of accounts to represent the ordinary
behaviour of accounts from US. We collected a random sample of users that they post at least 5
tweets between 1𝑠𝑡 of August and 31 of December, 2016 (focusing on the US 2016 debates: first,
second, third and vice president debates and the election day) by querying Twitter API hashtags
related to the elections and its parties (e.g #trump, #clinton, #election, #debate, #vote, etc.). In
addition, we selected the accounts that are located within US and use English as language of the
Twitter interface. We focus on users during the presidential debates and elections dates because
we suppose that the peak of trolls efforts concentrated during this period. The final dataset
is totally imbalanced (2% for IRA trolls and 98% for the regular users). This class imbalance
situation represent a real scenario. From Table 1, we can notice that the number of total tweets
of the IRA trolls is similar to the one obtained from the regular users. This is due to the fact
that IRA trolls were posting a lot of tweets before and during the elections in an attempt to try
to make their messages reach the largest possible audience.


4. Textual Features
In order to identify IRA trolls, we use a rich set of textual features. With this set of features we
aim to model the tweets of the accounts from several perspectives.

4.1. Topic Information
Previous works [13] have investigated IRA campaign efforts on Facebook, and they found that
IRA pages have posted more than ∼80K posts focused on divisive issues in US. Later on, the
    6
        https://about.twitter.com/en_us/values/elections-integrity.html


                                                          7
                             (a)
                                                                         (b)
Figure 1: (a) Trump and (b) Hillary topics words clouds.


work in [3] has analyzed Facebook advertised posts by IRA and they specified the main topics
that these advertisements discussed. Given the results of the previous works, we applied a topic
modeling technique, namely Latent Dirichlet Allocation (LDA) [14], on our dataset to extract
its main topics. We aim to detect IRA trolls by identifying their suspicious ideological changes
across a set of topics.
   Given our dataset, we applied LDA on the tweets after a prepossessing step where we
maintained only nouns and proper nouns using the SpaCy part-of-speech (POS) tagger, which is
an off-the-shelf POS tagger7 . In addition, we removed special characters (except HASH “#” sign
for the hashtags) and lowercase the final tweet. To ensure the quality of the topics, we removed
the hashtags we used in the collecting process where they may bias the modeling algorithm. We
tested multiple numbers of topics and finally we use seven. We manually observed the content
of these topics to label them. The extracted topics (𝑇) are: Police shootings, Islam and War, Trump,
Black People, Civil Rights, Hillary, and Crimes. In some topics, like Trump and Hillary, we found
contradicted opinions, in favor and against the main topics, but generally we can notice that the
Trump topic has a support stance to Trump, on the other hand, the Hillary topic has an against
stance towards Hillary (see Figure 1 for the frequency-based wordcloud). Also, the topics Police
Shooting and Crimes are similar, but we found that some words such as: police, officers, cops,
shooting, gun, shot, etc. are the most discriminative between these two topics. In addition, we
found that the Crimes topic focuses more on raping crimes against children and women. Our
resulted topics are generally consistent with the ones obtained from the Facebook advertised
posts in [3], and this emphasizes that IRA efforts organized in a similar manner in both social
media platforms.
   Based on our topic information, we model the users textual features w.r.t. each of these topics.

    7
        https://spacy.io/models


                                                  8
In other words, we model a set of textual features which could change in the users’ tweets
across the topics. We aim at modeling the trolls manipulating effort in which they interact in
a different way with each topic; e.g. a troll account may trigger positive emotions in a set of
topics in favor of and negative if against. Similarly, showing supporting stance intensively in
some topics and denial stance in others. Thus, we used LDA to annotate the tweets of the
users in one of the 𝑇 topics to capture the changes of the following proposed features among
the topics.
   We chose the following affective and lexical features under the assumption that they may
characterise the trolls’ language changes across the topics. We use term frequency representation
to extract the following features from the tweets:

     • Emotions: Since the results of the previous works [3, 13] showed that IRA efforts
       engineered to seed discord among individuals in US, we use emotions features to detect
       their emotional attempts to manipulate the public opinions (e.g. fear spreading behaviour).
       For that, we use the NRC emotions lexicon [15] that contains ∼14K words labeled using
       the eight Plutchik’s emotions (8 Features).
     • Sentiment: We extract the sentiment of the tweets from NRC [15], positive and negative
       (2 Features).
     • Bad & Sexual Cues: During the manual analysis of a sample from IRA tweets, we found
       that some users use bad words to mimic the language of a US citizen. Thus, we model the
       presence of such words using a list of bad and sexual words from [16] (2 Features).
     • Stance Cues: Stance detection has been studied in different contexts to detect the stance
       of a tweet reply with respect to a main tweet/thread [17]. Using this feature, we aim to
       detect the stance of the users regarding the different topics we extracted. To model the
       stance we use a set of stance lexicons employed in previous works [18, 19] . Concretely,
       we focus on the following categories: belief, denial, doubt, fake, knowledge, negation,
       question, and report (8 Features).
     • Bias Cues: We rely on a set of lexicons to capture the bias in text. We model the presence
       of the words in one of the following cues categories: assertives verbs [20] , bias [21] ,
       factive verbs [22] , implicative verbs [23] , hedges citehyland2018metadiscourse , report
       verbs . A previous work has used these bias cues to identify bias in suspicious news posts
       in Twitter [24] (6 Features).
     • LIWC: We use a set of linguistic categories from the LIWC linguistic dictionary [25]. The
       used categories are: pronoun, anx, cogmech, insight, cause, discrep, tentat, certain, inhib,
       incl 8 (10 Features).
     • Morality: Cues based on the morality foundation theory [26] where words labeled in
       one of a set of categories: care, harm, fairness, unfairness, loyalty, betrayal, authority,
       subversion, sanctity, and degradation (10 Features).


    8
     Total pronouns, Anxiety, Cognitive processes, Insight, Causation, Discrepancy, Tentative, Certainty, Inhibition,
and Inclusive respectively.


                                                         9
4.2. Profiling IRA Accounts
As Twitter declared, although the IRA campaign was originated in Russia, it has been found that
IRA trolls concealed their identity by tweeting in English. Furthermore, for any possibility of
unmasking their identity, the majority of IRA trolls changed their location to other countries, as
well as, the language of the Twitter interface they use. Thus, we propose the following features
to identify these users using only their tweets text:

    • Native Language Identification (NLI): This feature was inspired by earlier works on
      identifying native language of essays writers [27]. We aim to detect IRA trolls by identi-
      fying their way of writing English tweets. As shown in [24], English tweets generated by
      non-English speakers have a different syntactic pattern. Thus, we use SOTA NLI features
      to detect this unique pattern [28, 29, 30]. The feature set consists of bag of: stopwords
      (179 Features), POS tags (46 Features), and syntactic dependency relations (DEPREL) (45
      Features). We extract the POS and the DEPREL information using spaCy. To normalize
      the tweets, we clean them from the special characters and maintain dots, commas, and
      first-letter capitalization of words. We use regular expressions to convert a sequence of
      dots to a single dot, and similarly for sequence of characters (in total 270 Features).
    • Stylistic: We extract a set of stylistic features following previous works in the authorship
      attribution domain [31, 32, 33], such as: the count of special characters, consecutive
      characters and letters9 , URLs, hashtags, users’ mentions. In addition, we extract the
      uppercase ratio and the tweet length (8 Features).


5. Models
Given the two sets of features that we presented in Section 4, we use them in two different
approaches in order to build trolls detectors. The proposed approaches utilize a classical machine
learning classifier and a Convolutional Neural Network (CNN):
All Features + LG. In this approach, we model the extracted textual features as follows: Given
𝑉𝑛 as the concatenation of the previous 46 topic information features of a tweet 𝑛, we represent
each user by considering the average and standard deviation of her tweets’ 𝑉1,2,..𝑁 in each topic
𝑡 independently. We concatenate the final vectors; final vectors are seven since the number
of topics (𝑇) is equals seven in our case. Mathematically, the final feature vector of a user 𝑥 is
defined as follows:
                                           𝑇      𝑁                  𝑁
                                                  𝑡
                                                ∑𝑛=1 𝑉𝑛𝑡             𝑡
                                                                   ∑𝑛=1 (𝑉𝑛𝑡 − 𝑉𝑡 )2
                                𝑢𝑠𝑒𝑟𝑥 = ⨀ [                ⊙                           ]       (1)
                                         𝑡=1        𝑁𝑡         √          𝑁𝑡

where given the t𝑡ℎ topic, 𝑁𝑡 is the total number of tweets of the user (annotated with the
t𝑡ℎ topic), 𝑉𝑛𝑡 is the n𝑡ℎ tweet feature vector, 𝑉𝑡 is the mean of the tweets’ feature vectors; ⊙
represents the vectors concatenation process. With this representation we aim at capturing the
“Flip-Flop” behaviour of the IRA trolls among the topics (see Section 7).

    9
        We considered 2 or more consecutive characters, and 3 or more consecutive letters.


                                                         10
Figure 2: CNN structure.


   Regarding the profiling features, we represent each user by considering the average and the
standard deviation of her tweets’ feature vectors, similar to the representation of the previous
features but without considering the topic information. In short, we apply the average and the
standard deviation on all the tweets of a user at once:
                                          𝑁                   𝑁
                                         ∑𝑛=1 𝑉𝑛            ∑𝑛=1 (𝑉𝑛 − 𝑉)2
                               𝑢𝑠𝑒𝑟𝑥 =             ⊙                                              (2)
                                           𝑁            √         𝑁

where 𝑁 is her total number of tweets, 𝑉𝑛 is the n𝑡ℎ tweet feature vector, 𝑉 is the mean of the
tweets feature vectors of a user 𝑥.
   After preparing the two feature set vectors, we concatenate them, and we feed them to a
Logistic Regression (LG) classifier.
CNN. We use a CNN to model the proposed features. We use a CNN that has two branches:
one models the topic information (A) and the other models the profiling features (B). Figure 2
shows the proposed network.
   In branch A, first we divide a user’s tweets into seven tweets’ groups based on their topics and
then we feed each group to a different CNN. The tweets of a specific group are considered as one
long document. Each CNN applies a convolution and max-pooling layers. The input document
𝐷 of length 𝑛 is represented as [𝐷1 , 𝐷2 ..𝐷n ] where 𝐷n ∈ IR𝑑 ; IR𝑑 is a d-dimensional one-hot vector
of the 𝑖-th word in the input document. The words’ d-dimensional vectors have a length of
46, that is, the total number of topic information features. After processing the input group
of tweets, we apply another max-pooling layer to extract the important global features from
the seven topics’ CNNs. The structure of this branch is inspired by the Hierarchical Attention
model [34] that has been proposed for document classification.
   On the other hand, for branch B we concatenate all tweets of a user into one document, and
we use the Equation 2 to extract a vector of the profiling features (length of 278) and we feed it
to a dense layer 𝑓 (𝑊𝑎 𝑣 + 𝑏𝑎 ), where 𝑊𝑎 and 𝑏𝑎 are the corresponding weight matrix and bias
terms, and 𝑓 is an activation function, such as ReLU, tanh, etc.
   After processing the input tweets in both branches, we concatenate the output vectors (⊕)


                                                   11
and we feed them to another dense layer to learn their joint interaction. Finally, to get the
classes probability of a document, we add a Softmax layer.


6. Experiments and Results
6.1. Experimental Setup
Given the substantial class imbalance in the dataset, we report precision, recall and F1 metrics
on the IRA trolls (positive class). In the following section, we tested several classifiers10 with
some of our baselines, and we highlight the ones that obtain the best F1 value. We kept the
default parameters values. We report results for 5-folds cross-validation. Regarding the CNN
model, after we divide the tweets into 7 groups in the branch A, we set their maximum length
to 500 words and we pad the shorter ones with zeros. We noticed that some users have no
tweets labeled with some topics. Thus, we substitute missing topic groups with zeros vectors.
For hyper-parameter selection, we slice a 0.2 of the data for the validation part and we apply
cross-validation on the rest. We tune various parameters with the corresponding search spaces:
the sizes of the dense layers (32, 64, 128, 256), activation functions (tanh, ReLU), CNN filters’
sizes (different combinations of the sizes 3,4,5,6) and their numbers (32, 64, 128, 256, 512), and the
optimization function (Adam, RMSprop, SGD). Also, it is worthy of mentioning that we tried to
oversample the minority class to improve the performance by randomly replicating the trolls
users. However, we did not notice a clear improvement in the F1 metric.

6.2. Baselines
In order to evaluate our approach, we use the following baselines:
BOW + LR: We use bag-of-words (BOW) representation (weighted using TF-IDF scheme) with
a LR classifier where we aggregate all the tweets of a user into one long document. We aim to
assess how a simple word-based model can perform.
LSTM: Word embeddings-based models showed significant improvements in many tasks pre-
viously. We use Long short-term memory (LSTM) network [35] with Glove (840b.300d) words
embeddings [36]. Similar to BOW baseline, we we aggregate all the tweets of a user into one
long document.
Number of Tweets + NB: Based on the dataset statistics (see Table 1), we can notice that the
IRA accounts have a large amount of tweets. Thus, as a baseline, we use the number of tweets
for each account and we feed them to a NB classifier. We use this baseline to investigate if it is
possible to detect the trolls accounts using only the number of tweets.
Tweet2vec + LR: A previous work [37] showed that IRA trolls were playing a hashtag game
which is a popular word game played on Twitter, where users add a hashtag to their tweets and
then answer an implied question [38]. IRA trolls used this game in a similar way but focusing
more on offending or attacking the targeted section of the audience; an example from IRA
tweets:
    10
       We tested Logistic Regression (LR), Random Forest (RF) with 100 as the number of estimators, Naive Bayes
(NB), Support Vector Machine (SVM) with its both kernels, and Neural Network (NN) with a single hidden layer of
size 50 and tanh as an activation function.


                                                      12
                      #OffendEveryoneIn4Words undocumented immigrants are ILLEGALS

   Thus, we use as a baseline Tweet2vec [39] which is a a character-based Bidirectional Gated
Recurrent neural network that reads tweets and predicts their hashtags. We aim to assess if
the tweets hashtags can help identifying the IRA tweets. The model reads the tweets in a form
of character one-hot encodings and uses them for training with their hashtags as labels. To
train the model, we use our collected dataset which consists of ∼3.7M tweets11 . To represent
the tweets in this baseline, we use the decoded embedding produced by the model and we feed
them to a LR classifier.
Network Features + LR: IRA dataset provided by Twitter contains few information about the
accounts details, and they are limited to: profile description, account creation date, number of
followers and followees, location, and account language. Therefore, as a baseline we use the
number of followers and followees to assess their identification performance. We feed these
features to a LR classifier.
Botometer + RF: Botometer is the SOTA bots detection system, which uses content, sentiment,
friend, network, temporal, and user features. We extract these features and feed them to a
Random Forest (RF) classifier with 100 as the number of estimators following the authors setup.
Still Out There + ABDT: Also as a baseline, we use the available proposed model in the related
work [1], which uses profile, language distribution, and stop-words usage features with an
Adaptive Boosted Decision Trees (ABDT) classifier.

6.3. Results
Table 2 presents the classification results of the baselines and our approaches. We report the
results of our classical classifier -based approach with top 3 performing classifiers (RF, NN, and
LR). The best results in terms of F1 score obtained with the LR classifier. The results show that
both proposed models perform best comparing to the used baselines. Also, the results show that
the All Features + LG model performs better than the CNN with a noticeable difference in terms
of F1 measure. Generally, we can notice that we are able to detect the IRA trolls effectively
using the the affective and lexical features (RQ1).
   The topic features have a good performance comparing to most of the baselines. The result
obtained with the Profiling features is interesting; we are able to detect the IRA trolls from
the users’ writing style with an F1 value of 0.88 using the All Features + LG model. To assess
whether the topic information improves the performance of each of the lexical features, we
run the All Features + LG model with each feature independently, with and without utilizing
the topic information (without considering the topics in Eq. 1). Following, we present the
results obtained with each feature: Emotions (+0.74|-0.02)12 ; Sentiment (+0.28|-0.0); Bad & Sexual
(+0.58|-0.0); Stance Cues (+0.72|-0.12); Bias Cues (+0.73|-0.03); LIWC (+0.71|-0.04), and Morality
(+0.72|-0.36). We conclude from these results that the model weakly detects the changes in
stances, variations in emotions, etc., for a user when we discard the topic information. Clearly,
we can notice that the model became aware of the flipping behaviour across the topics. These
results emphasize the importance of the topic information (RQ2), especially with the emotions.

   11
        We used the default parameters that were provided with the system code.
   12
        (+) stands for the F1 result with the topic information and (-) without them.


                                                           13
Table 2
Classification results.
                          Method                         Precision Recall    F1
                          Network Features + LR              0.0      0.0   0.0
                          Random Selection                  0.02      0.5   0.04
                          Tweet2vec + LR                    0.18     0.64   0.28
                          Number of Tweets + NB             0.47     0.53   0.5
                          BOW + LR                          0.86     0.51   0.64
                          LSTM                              0.86     0.69   0.76
                          Still Out There + ABDT [1]        0.97     0.75   0.84
                          Botometer + RF                    0.99     0.76   0.86
                                          Topic Information Features
                          Topic-based Features + LR         0.89      0.7   0.78
                          CNN (branch A)                    0.79     0.81   0.80
                                              Profiling Features
                          Profiling Features + LR           0.92     0.85   0.88
                          CNN (branch B)                    0.81     0.88   0.84
                                                  All Features
                          All Features + RF                 0.99     0.78   0.88
                          All Features + NN                 0.90     0.89   0.90
                          All Features + LR                 0.93     0.88   0.91
                          CNN                               0.86     0.90   0.88


This motivates us to analyze further the emotions in the IRA tweets (see the following section).
Finally, the baselines’ results show us that the Network features are not able to detect the IRA
trolls. A previous work [4] showed that the IRA trolls tend to follow many users, and nudging
other users to follow them (e.g. by writing “follow me” in their profile description) to hide their
identity (account information) with the regular users. Finally, similar to the Network features,
the Tweet2vec baseline performs poorly. This indicates that, although the IRA trolls used the
hashtag game extensively in their tweets, the Tweet2vec baseline is not able to identify them.
The results of both Botometer and Still Out There [1] are superior to the other baselines, but still
lower comparing to our proposed approaches.


7. Analysis
Given that the Emotions feature boosted the F1 with the highest value comparing to the other
topic-based features, in Figure 3 we analyze IRA trolls from an emotional perspective to answer
RQ3. This analysis can make us more aware of the manipulation efforts across the topics. The
figure shows that the topics that were used to attack immigrants (Black People and Islam and
War) have the fear emotion in their top two emotions. On the other hand, a topic like Trump
has the lowest amount of fear emotion, while the joy emotion is among the top emotions.
Why do the topic information help?..The Flip-Flop behaviour. The trolls accounts were supporting
their ideologies by tweeting positively in some topics, and simultaneously, posting tweets with
a negative stance in topics that are against their ideologies. As an example, let’s considering
the fear and joy emotions in Figure 3. We can notice that all the topics that used to nudge


                                                    14
Figure 3: Emotional analyzing of the IRA trolls from an thematic perspective.


the divisive issues have a decreasing dashed line, where others such as Trump topic has an
extremely increasing dashed line. Therefore, we manually analyzed the tweets of a sample of
the IRA accounts and we found this observation clear, as an example from user 𝑥:

          Islam and War topic: (A) @RickMad: Questions are a joke, a Muslim asks how SHE will be
          protected from Islamaphobia! Gmaffb! How will WE be protected from terrori…

          Trump topic: (B) @realDonaldTrump: That was really exciting. Made all of my points.
          MAKE AMERICA GREAT AGAIN!

   Figure 4 shows the flipping behaviour for user 𝑥 by extracting the mean value of the fear
and joy emotions. The small difference between fear and joy emotions in the Islam and War
topic for this user is due to the ironic way of tweeting for the user (e.g. the beginning of
tweet A: “Questions are a joke”). Even though, the fear emotion is still superior to the joy. We
noticed a similar pattern with some of the regular users, although much more evident among
the IRA trolls. Thus, the way we combine our feature set with the topic information makes our
classification models aware of this flipping behaviour.
   To understand more the NLI features performance, given their high performance comparing
to the other feature set, we extract the top important tokens for each of the NLI feature subsets
(see Figure 5). Some of the obtained results confirmed what was found previously. For instance,
the authors in [24] found that Russians write English tweets with more prepositions comparing
to native speakers of other languages (e.g. as, about, because in (c) Stop-words and RP13 in (a)
   13
        RP stands for “adverb, particle” in the POS tag set.


                                                               15
Figure 4: Flipping emotions between topics by user 𝑥 (an IRA troll account).


Table 3
Linguistic analysis of Morality, LIWC, Bias and Subjectivity, Stance, and Bad and Sexual cues shown
as the percentage of averaged value of tweets with one or more cues across IRA trolls (X) and
regular users (Y) in a shape of X(arrows)Y. The tweets average value is the mean value across the
topics. We report only significant differences: p-value ≤ 0.001↑↑↑, ≤ 0.01↑↑, ≤ 0.05↑ estimated using the
Mann-Whitney U test. NSD stands for No statistical Significant Difference.
             Morality                       LIWC                      Bias language                    Stance               Bad and Sexual
    category           𝑃𝑣𝑎𝑙𝑢𝑒    category         𝑃𝑣𝑎𝑙𝑢𝑒      category             𝑃𝑣𝑎𝑙𝑢𝑒      category        𝑃𝑣𝑎𝑙𝑢𝑒    category     𝑃𝑣𝑎𝑙𝑢𝑒
        care        1.3↑↑↑.74    pronoun       53.34↑47.59    assertive        6.53↓↓↓7.05      belief      2.9↑↑↑.49      bad      5.4↑↑↑.66
       harm         2.3↑↑↑.61        anx        1.9↑↑↑.98        bias              NSD          denial      0.6↑↑↑.57     sexual    3.5↑↑↑.16
     fairness      0.64↓↓↓0.84   cogmech          NSD           factive         5.5↑↑↑.95       doubt       1.3↑↑↑.25        -          -
   unfairness       0.06↓↓0.31    insight     12.1↑↑↑0.08       hedge           10.0↑↑↑.69       fake      0.49↓↓↓1.22       -          -
      loyalty      0.84↓↓↓1.26     cause      10.7↑↑↑0.27    implicative        9.0↑↑↑.37     knowledge    0.75↓↓↓1.48       -          -
     betrayal      0.13↓↓↓0.35    discrep     12.7↑↑↑1.07       report        14.37↓↓↓18.89    negation     11.4↑↑↑.10       -          -
    authority      1.59↓↓↓1.88     tentat     13.9↑↑↑2.29    strong subj        54.1↑↑↑9.9     question     3.1↑↑↑.44        -          -
   subversion       0.3↓↓↓.33     certain     13.5↑↑↑0.69     weak subj        50.33↑41.96      report     2.86↓↓↓3.46       -          -
     sanctity       0.4↑↑↑.27       inhib       4.1↑↑↑.87          -                 -             -             -           -          -
  degradation       0.5↑↑↑.49        incl     20.69↓↓21.24         -                 -             -             -           -          -


POS in Figure 5). Further research must be conducted to investigate in depth the rest of the
results.
Linguistic Analysis. We measure statistically significant differences in the cues markers of
Morality, LIWC, Bias and Subjectivity, Stance, and Bad and Sexual words across IRA trolls and
regular users. These findings presented in Table 3 allow for a deeper understanding of IRA
trolls language usage. In general, the table shows that most of the topic-based features have a
significant difference between the trolls and the regular users. Also, the analysis shows that
trolls have a higher percentage of usage of Subjective language, Discrepancy, Bad, and Sexual
terms comparing to the regular users. On the other hand, trolls appear to be less Fair (Fairness)
and Loyal, and in addition use less Assertive and Report terms. Other categories like Anxiety
and Bias do not show any significant difference. Considering that the Bias category has no
significant difference emphasizes the advancement of the IRA campaign which was able to
conceal its bias in text. Our approaches uses the topic information to overcome the limitations
of the text only.
Topic Importance. The topics that were targeted by the Russian trolls are not equally impor-
tant. They received tweets from both troll and regular users, but some of them have received


                                                                        16
           (a) POS                           (b) DEPREL                   (c) Stopwords
Figure 5: The top 10 important tokens in each of the NLI features.


Figure 6: Topics importance.


more tweets from the trolls than the regular users. In this experiment, we extract the topics’
importance to understand clearly the trolls campaign. For that, we extract the features impor-
tance values from our classifier, and then we average these values for each topic independently,
given that each topic has the same feature set (see Eq. 1). Using these averaged values, we are
able to rank the topics from the most important to the less important one in the classification
process (see Figure 6). On one hand, we can notice that topics like Attacking Hillary and Black
People are more important comparing to the others. The trolls targeted them more to reach
their desired goal. On the other hand, topics like Islam and War and Police Shootings have the
lowest importance; this could be justified by that regular users are most likely to have a high
effect on these topics compared to the rest. Thus, these two topics became less important for
the classifier to discriminate between the two types of accounts.
   For understanding the language usages in each topic, we extract the top 3 important features
wrt each topic in Figure 7. The results obtained are consistent with our preliminary hypothesis
about the trolls’ language in each topic. The results show, for example, that the topic Police


                                                 17
Figure 7: Top 3 features in the extracted topics.


Shooting has tweets that trigger at most Anxiety and Harm. The aim of these tweets is to
increase the amount of panic among the public. Similarly, the Black People topic shows that it
has Unfairness, Anger, and Negative emotions. This result is interesting and reveals the negative
stance in the racist tweets against black people.
False Negative Cases. The proposed features showed to be effective in the classification
processes. We are interested in understanding the causes of misclassifying some of IRA trolls.
Therefore, we manually investigated the tweets of some of the false negative users and we
found that there are three main reasons: 1) Some trolls were tweeting in a questioning way by
asking about general issues; we examined their tweets but we did not find a clear ideological
orientation or a suspicious behaviour in their tweets. 2) Some accounts were sharing traditional
social media posts (e.g. “http://t.co/GGpZMvnEAj cat vs trashcan”); the majority of the false
positive IRA trolls are categorized under this reason. In addition, these posts were given a false
topic name; the tweet in the previous example assigned to Attacking Hillary topic. 3) Lack of
content. Some of the misclassified trolls mentioned only external links without a clear textual
content. This kind of trolls needs a second step to investigate the content of the external links.
Thus, we tried to read the content of these links but we found that the majority of them referred
to deleted tweets. Probably this kind of accounts was used to “raise the voice” of other trolls, as
well as, we argue that the three kinds of IRA trolls were used for “likes boosting”.


                                                    18
8. Limitations
In this work, we focused on the detection of online trolls, namely the IRA Russian trolls. We
proposed a topic and profiling -based approaches, and we compared them to several solid
baselines. For the topic-based features, we used a set of lexicons as features and we combined
them with topic information. This combination allowed our approaches to detect the change in
the affective and lexical information among the extracted topics, and consequentially, to detect
the suspicious behaviour of the trolls in spreading negative tweets in some topics and positive
in some others.
   Despite the better performance of our approaches, there are still a couple of limitations. Our
approaches consider that we have a general knowledge of the issues that trolls address. As we
showed in Section 4.1, we extracted seven topics that are used by trolls to seed discord. The
number of topics is not automatically set, and supervision by human knowledge is needed.
For instance, a topic different than the US 2016 elections needs a different number of topics.
Another aspect is that our feature set is language-dependent. Recently, during the Italian 2019
elections for the European Parliament, some journalists claimed that they noticed a rise in the
amount of fake Twitter accounts that tweeted to affect the public decisions14 . To apply for
instance our approaches successfully on an Italian corpus, we would need Italian language
lexicons (emotions, stance cues, etc.) and an Italian POS tagger.


9. Conclusion
In this paper, we present two text-based approaches to detect social media trolls, namely IRA
trolls. Due to the anonymity characteristic that social media provide to users, these kinds of
suspicious behavioural accounts have started to appear. We built machine learning models based
on topic and profiling features that in a cross-validation evaluation achieved F1 values of 0.88
and 0.91. We applied a topic modeling algorithm to go behind the superficial textual information
of the tweets. Our experiments showed that the extracted topics boosted the performance of
the proposed models when coupled with other affective and lexical features. In addition, we
proposed NLI features to identify IRA trolls from their writing style, which showed to be very
effective. Finally, for a better understanding we analyzed the IRA accounts from emotional,
linguistic, and thematic perspectives. Through the manually checking of IRA accounts, we
noticed that frequently irony was employed. As a future work, it would be interesting to try to
identify these accounts by integrating an irony detection module, although irony detection is
still an open research topic and results may be far from accurate.


Acknowledgments
This work is partially supported by a public grant overseen by the French National Research
Agency (ANR) as part of the program “Investissements d’Avenir” (reference: ANR-10-LABX-
0083). It contributes to the IdEx Université de Paris - ANR-18-IDEX-0001. The work of

   14
        https://www.thelocal.it/20180802/russian-troll-factory-tweets-attempted-influence-italian-elections


                                                         19
Paolo Rosso was partially funded by the Spanish MICINN under the research project MISMIS-
FAKEnHATE on Misinformation and Miscommunication in social media: FAKE news and HATE
speech (PGC2018-096212-B-C31).


References
 [1] J. Im, E. Chandrasekharan, J. Sargent, P. Lighthammer, T. Denby, A. Bhargava, L. Hemphill,
     D. Jurgens, E. Gilbert, Still out there: Modeling and Identifying Russian Troll Accounts on
     Twitter, arXiv preprint arXiv:1901.11162 (2019).
 [2] E. M. Clark, J. R. Williams, C. A. Jones, R. A. Galbraith, C. M. Danforth, P. S. Dodds, Sifting
     Robotic from Organic Text: A Natural Language Approach for Detecting Automation on
     Twitter, Journal of Computational Science 16 (2016) 1–7.
 [3] R. L. Boyd, A. Spangher, A. Fourney, B. Nushi, G. Ranade, J. Pennebaker, E. Horvitz,
     Characterizing the Internet Research Agency’s Social Media Operations During the 2016
     US Presidential Election using Linguistic Analyses, PsyArXiv (2018).
 [4] S. Zannettou, T. Caulfield, E. De Cristofaro, M. Sirivianos, G. Stringhini, J. Blackburn, Dis-
     information Warfare: Understanding State-Sponsored Trolls on Twitter and their Influence
     on the Web, in: Companion Proceedings of The 2019 World Wide Web Conference, ACM,
     2019, pp. 218–226.
 [5] G. Gorrell, M. E. Bakir, I. Roberts, M. A. Greenwood, B. Iavarone, K. Bontcheva, Partisan-
     ship, Propaganda and Post-Truth Politics: Quantifying Impact in Online, arXiv preprint
     arXiv:1902.01752 (2019).
 [6] A. Badawy, K. Lerman, E. Ferrara, Who Falls for Online Political Manipulation?, in:
     Companion Proceedings of The 2019 World Wide Web Conference, ACM, 2019, pp. 162–168.
 [7] A. Badawy, E. Ferrara, K. Lerman, Analyzing the Digital Traces of Political Manipulation:
     the 2016 Russian Interference Twitter Campaign, in: 2018 IEEE/ACM International
     Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2018,
     pp. 258–265.
 [8] K. Lee, B. D. Eoff, J. Caverlee, Seven months with the devils: A long-term study of content
     polluters on twitter, in: Fifth international AAAI conference on weblogs and social media,
     2011.
 [9] F. Rangel, P. Rosso, Overview of the 7th author profiling task at pan 2019: Bots and gender
     profiling, in: L. Cappellato, N. Ferro, D. Losada, H. Müller (Eds.), CLEF 2019 Labs and
     Workshops, Notebook Papers, CEUR-WS.org, 2019.
[10] F. Johansson, Supervised classification of twitter accounts based on textual content of
     tweets, in: CLEF 2019 Labs and Workshops, Notebook Papers, volume 2019, CEUR-WS.org,
     2019.
[11] J. P. Dickerson, V. Kagan, V. Subrahmanian, Using Sentiment to Detect Bots on Twitter:
     Are Humans more Opinionated than Bots?, in: 2014 IEEE/ACM International Conference
     on Advances in Social Networks Analysis and Mining (ASONAM 2014), IEEE, 2014, pp.
     620–627.
[12] C. A. Davis, O. Varol, E. Ferrara, A. Flammini, F. Menczer, Botornot: A system to evaluate
     social bots, in: Proceedings of the 25th International Conference Companion on World


                                                20
     Wide Web, International World Wide Web Conferences Steering Committee, 2016, pp.
     273–274.
[13] A. Ng, This was the Most Viewed Facebook ad Bought by Russian Trolls, 2018. URL: https:
     //www.cnet.com/news/this-was-the-most-viewed-facebook-ad-bought-by-russian-trolls/.
[14] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent Dirichlet Allocation, Journal of machine Learning
     research 3 (2003) 993–1022.
[15] S. M. Mohammad, P. D. Turney, Emotions Evoked by Common Words and Phrases: Using
     Mechanical Turk to Create an Emotion Lexicon, in: Proceedings of the NAACL HLT 2010
     workshop on computational approaches to analysis and generation of emotion in text,
     Association for Computational Linguistics, 2010, pp. 26–34.
[16] S. Frenda, B. Ghanem, M. Montes-y Gómez, Exploration of Misogyny in Spanish and
     English Tweets, in: 3rd Workshop on Evaluation of Human Language Technologies for
     Iberian Languages (IberEval 2018), volume 2150, CEUR-WS, 2018, pp. 260–267.
[17] S. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, C. Cherry, Semeval-2016 Task 6:
     Detecting Stance in Tweets, in: Proceedings of the 10th International Workshop on
     Semantic Evaluation (SemEval-2016), 2016, pp. 31–41.
[18] H. Bahuleyan, O. Vechtomova, UWaterloo at SemEval-2017 Task 8: Detecting Stance To-
     wards Rumours with Topic Independent Features, in: Proceedings of the 11th International
     Workshop on Semantic Evaluation (SemEval-2017), 2017, pp. 461–464.
[19] B. Ghanem, A. T. Cignarella, C. Bosco, P. Rosso, F. M. R. Pardo, UPV-28-UNITO at
     SemEval-2019 Task 7: Exploiting Post’s Nesting and Syntax Information for Rumor Stance
     Classification, in: Proceedings of the 13th International Workshop on Semantic Evaluation,
     2019, pp. 1125–1131.
[20] J. B. Hooper, On Assertive Predicates, volume 4, In J. Kimball, editor, Syntax and Semantics,
     1974.
[21] M. Recasens, C. Danescu-Niculescu-Mizil, D. Jurafsky, Linguistic Models for Analyzing and
     Detecting Biased Language, in: Proceedings of the 51st Annual Meeting of the Association
     for Computational Linguistics, volume 1, 2013, pp. 1650–1659.
[22] P. Kiparsky, C. Kiparsky, Fact, Linguistics Club, Indiana University, 1968.
[23] L. Karttunen, Implicative Verbs, Language (1971) 340–358.
[24] S. Volkova, S. Ranshous, L. Phillips, Predicting Foreign Language Usage from English-
     Only Social Media Posts, in: Proceedings of the 2018 Conference of the North American
     Chapter of the Association for Computational Linguistics: Human Language Technologies,
     volume 2, 2018, pp. 608–614.
[25] Y. R. Tausczik, J. W. Pennebaker, The Psychological Meaning of Words: LIWC and
     Computerized Text Analysis Methods, Journal of language and social psychology 29 (2010)
     24–54.
[26] J. Graham, J. Haidt, B. A. Nosek, Liberals and Conservatives Rely on Different Sets of
     Moral Foundations, Journal of personality and social psychology 96 (2009) 1029.
[27] S. Malmasi, K. Evanini, A. Cahill, J. Tetreault, R. Pugh, C. Hamill, D. Napolitano, Y. Qian,
     A Report on the 2017 Native Language Identification Shared Task, in: Proceedings of the
     12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017, pp.
     62–75.
[28] A. Cimino, F. Dell’Orletta, Stacked Sentence-Document Classifier Approach for Improving


                                               21
     Native Language Identification, in: Proceedings of the 12th Workshop on Innovative Use
     of NLP for Building Educational Applications, 2017, pp. 430–437.
[29] I. Markov, L. Chen, C. Strapparava, G. Sidorov, CIC-FBK Approach to Native Language
     Identification, in: Proceedings of the 12th Workshop on Innovative Use of NLP for Building
     Educational Applications, 2017, pp. 374–381.
[30] C. Goutte, S. Léger, Exploring Optimal Voting in Native Language Identification, in:
     Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational
     Applications, 2017, pp. 367–373.
[31] R. Zheng, J. Li, H. Chen, Z. Huang, A Framework for Authorship Identification of Online
     Messages: Writing-Style Features and Classification Techniques, Journal of the American
     society for information science and technology 57 (2006) 378–393.
[32] M. Bhargava, P. Mehndiratta, K. Asawa, Stylometric Analysis for Authorship Attribution
     on Twitter, in: International Conference on Big Data Analytics, Springer, 2013, pp. 37–47.
[33] M. Sultana, P. Polash, M. Gavrilova, Authorship Recognition of Tweets: A Comparison
     Between Social Behavior and Linguistic Profiles, in: 2017 IEEE International Conference
     on Systems, Man, and Cybernetics (SMC), IEEE, 2017, pp. 471–476.
[34] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, E. Hovy, Hierarchical Attention Networks for
     Document Classification, in: Proceedings of the 2016 conference of the North American
     chapter of the association for computational linguistics: human language technologies,
     2016, pp. 1480–1489.
[35] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation 9 (1997)
     1735–1780.
[36] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in:
     Proceedings of the 2014 conference on empirical methods in natural language processing
     (EMNLP), 2014, pp. 1532–1543.
[37] B. C. Boatwright, D. L. Linvill, P. L. Warren, Troll Factories: The Internet Research Agency
     and State-Sponsored Agenda Building, Resource Centre on Media Freedom in Europe
     (2018).
[38] W. Haskell, People Explaining their ’Personal Paradise’ is the Latest Hash-
     tag to Explode on Twitter, 2015. URL: https://www.businessinsider.com.au/
     hashtag-games-on-twitter-2015-6.
[39] B. Dhingra, Z. Zhou, D. Fitzpatrick, M. Muehl, W. W. Cohen, Tweet2Vec: Character-
     Based Distributed Representations for Social Media, in: The 54th Annual Meeting of the
     Association for Computational Linguistics, 2016, p. 269.


                                               22