=Paper=
{{Paper
|id=Vol-2380/paper_235
|storemode=property
|title=Bots and Gender Profiling using a Multi-layer Architecture
|pdfUrl=https://ceur-ws.org/Vol-2380/paper_235.pdf
|volume=Vol-2380
|authors=Régis Goubin,Dorian Lefeuvre,Alaa Alhamzeh,Jelena Mitrović,Előd Egyed-Zsigmond,Leopold Ghemmogne Fossi
|dblpUrl=https://dblp.org/rec/conf/clef/GoubinLAMEF19
}}
==Bots and Gender Profiling using a Multi-layer Architecture==
<pdf width="1500px">https://ceur-ws.org/Vol-2380/paper_235.pdf</pdf>
<pre>
         Bots and Gender Profiling using a Multi-layer
                        Architecture
                         Notebook for PAN at CLEF 2019

    Régis Goubin*, Dorian Lefeuvre*, Alaa Alhamzeh∗,∗∗ , Jelena Mitrović**, Előd
                Egyed-Zsigmond*, and Leopold Ghemmogne Fossi*

                      * Université de Lyon - INSA Lyon - LIRIS UMR5205,
                                       **Universität Passau
      regis.goubin@insa-lyon.fr, dorian.lefeuvre@insa-lyon.fr, alaa.alhamzeh@insa-lyon.fr,
                                 jelena.mitrovic@uni-passau.de,
            elod.egyed-zsigmond@insa-lyon.fr,leopold.ghemmogne-fossi@insa-lyon.fr


        Abstract In this paper, we introduce the architecture used for our PAN@CLEF-
        2019 author profiling participation. In this task, we had to predict if the author
        of 100 tweets was a bot, a female human, or a male human user. This task is
        proposed from a multilingual perspective, for English and Spanish. We handled
        this task in two steps, using different feature extraction techniques and machine
        learning algorithms. In the first step, we used random forest classifier with dif-
        ferent features in order to predict if the users were bots or humans. In the second
        step, we recovered all the users predicted as humans. We then used a 2-layers
        architecture to predict the gender of the users detected as humans.


1     Introduction

Nowadays, the need for author profiling is growing as people share more and more
content on the internet, especially on social networks. The profiling task is useful for
several domains, such as security and forensics, marketing, target advertising, politics,
etc. A lot of information can be recovered via the content shared by the users such as
their gender, age, affiliation, etc.
    Author profiling from tweets has been introduced by the PAN annual challenge
since 2013 [20,18,16,22,21,19]. However, until now, the prediction was based only on
human tweets, with an aim to predict some of their characteristics (age, gender, lan-
guage variety, etc.), while this year, bots appear on the scene. Another difference to the
previous challenges is the absence of the images attached to the tweets. The goal of the
2019 PAN author profiling shared task is to investigate whether the author of a Twitter
feed is a bot or a human. Furthermore, in case of a human user, the goal is to predict the

    Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons Li-
    cense Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano,
    Switzerland.
gender of the author, in two different languages: English and Spanish. In this paper, we
will describe our approach to achieve that goal.
    The reminder of the paper is structured as follows. Section 2 introduces state of the
art corresponding to author profiling. In section 3, we present consecutively the overall
architecture of our approach, the methods of bot detection, and gender prediction 1 . In
section 4, we present the results. Finally, in section 5, we come to a conclusion and
discuss future steps.


2      Related Work
2.1     Bot Detection
A social media bot is a piece of software used to automatically generate messages,
reply to messages, or to share particular hashtags, act as a follower of users, and as
a fake account to gain followers itself. Varol et al. [26] have estimated that 9-15%
of Twitter accounts may be bots. Twitter bots are a well-known example of how fake
social media accounts can create convincing online personas capable of influencing
real people on cultural and political topics. The first signs of this appeared during the
2016 U.S. election where the Russian government apparently leveraged bots to spread
divisive messaging. Other governments, enterprises and groups use this technique as
well. Although social bots are made to appear as if they were human accounts, it is
still possible to identify them as bots based on their profile i.e. the user name, profile
photo, time of posting and other meta-data. This information has been used efficiently
to identify bots during the 2017 French elections [8]. However, the challenge this year
is to detect bots only from textual data and not from the entirety of meta-data.
     A.H. Wang [27] proposed to detect bots based on the number of friends, the number
of followers, and the follower ratio of a user alongside the number of tweets containing
HTTP links and the number of replies/mentions from the last 20 tweets of a user.
     Ferrara et al. [9] used numerous features grouped in five different classes to detect
bots. Varol et al. [26] expended the available features and grouped them in six different
classes in the BotOrNot (now called Botometer) framework. The number of features
available also increased in 2018 [29]. Although not all features identified by Varol et al.
are relevant to the PAN challenge, sentiment and content related features can be used
to detect bots with good accuracy. To the best of our knowledge, there are no previous
works studying bot detection based on textual information only In another paper, Fer-
rara also showed that similar accuracy can be achieved when only user meta-data are
used [8].


2.2     Gender Prediction
The relationship between personal traits and the use of language has been widely stud-
ied by Pennebaker[14] in the psycholinguistic research field. He showed how the usage
of language varies depending on personal traits. For example, he found out that, at least
 1
     Our code is available here: https://github.com/rgoubin/participation_author_profiling
in English, women use negations or talk in first person more than men, because they are
more "self-conscious", whereas men use more prepositions in order to describe their
environment, i.e. they speak about "concrete things" more. These finding are the ba-
sis of LIWC (Linguistic Inquiry and Word Count) [25] that is one of the most often
used tools in author profiling. Pioneering research in gender profiling such as Argamon
et al. [1], Burger et al. [4] and Schler et al. [24] focused mainly on formal texts and
blogs, reporting accuracies in range of 75%-80% in most cases, as mentioned in the last
PAN overview [19]. However, most recent investigations focus on social media such as
Facebook and Twitter, where we have to think about handling short phrases, with many
potential typos, less formal and more spontaneous language.
    Most recent studies use features such as sequence of words and characters (unigram,
bigrams and n-gram), and submit them to an SVM classifier, reporting an average ac-
curacy of 75%-82% according to the last PAN edition [19].
    In the 2017 edition of this challenge, the winners, Basile et al. [2], reached an accu-
racy of 0,8253 on gender, all languages (Arabic, English and Spanish) combined. They
used a single SVM classifier and character 3- to 5- grams and word 1- to 2- grams with
Tf-idf where tf is replaced by (1 + log(tf)) as features.
    In 2017, another team of the PAN authors profiling task, Kheng and al. [12], had a
different approach. After several experiments, they removed stop-words in English and
Arabic, used a Tf-idf model based on 1- and 2-gram model and trained these features
on a Naive Bayes Classifier.

    Daneshvar et al. [6] used only text during the 2018 edition of the PAN author profil-
ing task. They reached the first place on text classification and an overall second place
with a general accuracy of 0,8170.
    In 2018, Ciccone et al. [5] based their research on the previous work of Kheng et al.
[12] in order to improve it and construct an efficient text classifier. They performed ap-
proximately the same preprocessing steps, and also used the n-grams and Tf-idf model.
Furthermore, they took into consideration the experiments done by Kheng in his master
thesis [11] (he was able to improve his results and finally got an f-score of approxi-
mately 0,8 in 10-fold cross-validation) and obtained an overall accuracy of 0,7981 on
texts, while Kheng et al. obtained an accuracy of 0,7002.
    We based our work on the methods of Ciccone et al. [5]. We reproduced and im-
proved their text model by integrating their set of features in a more complex architec-
ture.

3     Our Approach
The classification part of our architecture consists of two steps. The first step classifies
each user as a human or as a bot. After that, the users classified as humans are re-
classified according to their gender. Figure 1 shows the overall architecture.

3.1   Bot Detection
For bot detection, our approach followed a classical pipeline. First of all, in the pre-
processing phase, we choose for both languages to remove repeating characters occur-
                        Figure 1: Schema of the overall architecture


ing more than three times in order to recover the real word. The text is also tokenized
with NLTK’s TweetTokenizer [3].
    After pre-processing the tweets, we choose to use the features extracted in the work
of Varol et al. [26] in addition to other features based on our personal observations of
the dataset.
    As we mentioned in the last section, the work of Varol et al. extracts numerous
groups of features. We used only the ones based on the tweet text. These features are:
word entropy, the ratio of tweets that contain emojis, and Part-of-Speech (POS) distri-
bution. We used these features for the early bird submission.
    Emojis are an important part of communication in Twitter, and they are used by bots
and humans alike. However, they are not always used in the same way. For example,
some bots only use emojis to communicate. Based on this observation, we choose to
consider the ratio of tweets that contain emojis as a feature for a user, but we also
choose to consider the number of emojis used in a tweet.
    We were able to test our system during the early bird evaluation phase of this chal-
lenge, the results were quite good for English but unfortunately, for Spanish, we ob-
served a gap between the model efficiency on the training dataset compared to the test
dataset used by the early bird evaluation. That is why we thought about adding new
features to the Spanish bot detection subtask.
    To improve the result in the early bird, we have analyzed a sample of the dataset.
We then identify some potential features to extract:
 – Average number of words by tweet
 – Average of the tweets’ lengths for one user
 – The standard deviation of the number of words
 – Number of URLs
 – Number of Hashtags
 – Number of user mentions
 – Percentage of uppercase letters
 – Number of emojis
 – Number of first person used
 – Number of pronouns used
 – Number of negations used

After running a feature selector (Shapley value based one) on these features, we choose
to retain for bot classification for both languages:

 – Average number of words per tweet
 – Number of URLs
 – Number of Hashtags
 – Number of emojis
 – The standard deviation of the number of words

    In order to check the usefulness of these features, we tried to classify the dataset
using these features alone. We achieved an accuracy of around 80% on both languages
which supports our choice to use them afterward.
    We choose to perform sentiment analysis on the tweets. For this purpose, we added
more pre-processing steps for Spanish texts. We choose also to use DAL (Dictionary of
Affect in Language), which is an instrument designed to measure the emotional mean-
ing of words and texts. It does this by comparing individual words to a word list of
8742 words which have been rated by people for their activation, evaluation, and im-
agery. This concept was introduced by Whissell et al. [28] in 1986 for the English
language. Later on, in 2013, a Spanish Dictionary for Affect in Language was produced
by MDA Ríos [7]. Since we use DAL for Spanish tweets, we must be positive that we
can find most of the words inside tweets using DAL. Moreover, when writing text, peo-
ple tend to follow grammar rules. Therefore, we also apply stemming on Spanish words
using NLTK’s Snowball Stemmer [3]. This DAL includes the pleasantness, activation
and imagery of around 2500 Spanish words [10] . For each tweet of an author, we rate
his pleasantness, activation and imagery according to this DAL. The use of these fea-
tures did not seem to improve the accuracy on the training data by much (around 0.2%).
Therefore, we did not extract these features for the English set.
    In order to train a classifier to identify bots, we perform 10-fold cross-validation
on the classifier. After some testing on different classifiers we choose to use Random
Forest Classifier from which we witness the best overall performances. We implement
this classifier using the Sklearn Random Forest with 100 estimators.
    On the final submission we use the following features for bot classification:

 – Average number of words by tweet
 – Number of URLs
 – Number of Hashtags
 – Number of Emojis
 – The standard deviation of the number of words
 – Word entropy
 – Emojis ratio in the tweets
 – Part-of-Speech (POS) tagging distribution
 – Sentiment analysis (For Spanish)

3.2   Gender Prediction
The gender prediction is based on a 2-layer architecture. We distinguish between two
types of classifiers:
 – Low classifiers, which transform features into prediction
 – Meta classifier, which takes as input the prediction of the low classifier and makes
   its own prediction
    We use different classifiers provided by the sklearn library [13]. Figure 2 shows
the architecture of this part. Each component of this architecture is described in the
subsection below.

Low Classifiers The Tf-idf classifier is based on the work of Ciccone et al. [5]. The
first task is the pre-processing. Twitter’s data has to be cleaned, as users often make
typos, use a lot of emojis, repeat characters to express happiness or anger, use upper
cases in a different way than in classic texts. Thus, we proceed with different cleaning
steps to remove the noise caused by the particularities of tweets and thus, create more
useful features.
     First of all, we remove repeating characters occurring more than three times. For
instance, we transform "I’m really happyyyyyyyyyy" into "I’m really happy". The pur-
pose was to recover the real word, since we did not want to create a vocabulary with the
same word written in different way.
     We remove the punctuation, URLs and user mentions. We also remove stop words
but keep some of them with a transformation. Thus, we transform any used pronoun ex-
cept the first person into ’<pronoun>’, while the first person pronouns into ’<first_person>’
and the negation sign into ’<negation>’. This last pre-processing step, in which the fo-
cus is on the fact that both genders do not write in the same manner, is based on the
work of Pennebaker and associates [14].
     Finally, for English texts, we remove the plural endings, such as the final -s from
nouns, taking into account all other possibilities and exceptions in plural formation in
the English language [23]. We then tokenize the text using the TweetTokeniser of NLTK
[3]. As features, we kept the Tf-idf model, based on 1- and 2- word n-grams. We tried
different classifiers and datasets to set up the final classifier. We tried to train an SVM
and a Bagging Classifier with both 2019 and 2018 training datasets.
     We realized that the two datasets are very different. It seems that the 2019 training
dataset is really specific and works only on itself. On the other hand, the 2018 dataset
seemed to be very complete and the classifier trained on it had good results on both
                   Figure 2: Schema of the gender prediction architecture


datasets. To perform our experiments, we ran our classifier on the 2018 test dataset and
2019 dev split provided by PAN organizers. As we did not know if the test dataset was
going to be closer to the 2018 or the 2019 training dataset, we trained an SVM on both
datasets. We use a LinearSVC (from sklearn library) with a maximum of 500 iterations.
                            Potential Features                      Used Features
                                                                   English Spanish

                         Average number of words by tweet             X          X
                            Average of the tweet lengths              X          X
                                 Number of URLs                       X          X
                                Number of Hashtags                    X          X
          Generic
                             Number of user mentions                  X          X
                                 Number of emojis                     X          X
                              Emoji ratio in the tweets
                                   Word entropy                       X          X
                           Percentage of uppercase letters
                   The standard deviation of the number of words                 X
                            Number of first person used               X
          Specific           Number of pronouns used                  X
                             Number of negations used                 X          X

                Table 1: List of potential and used features in both languages


As a consequence, our Tf-idf model combined the completeness of the 2018 training
dataset and the particularities of the 2019 one.
    Bagging classifier is, in some cases, useful in order to avoid over-fitting or to im-
prove the accuracy. We choose to implement an architecture of 10 SVMs, with the same
parameters we used so far. As bagging didn’t provide us with significant improvement,
and, in several experiments, it decreased the results, we chose to keep only SVM in the
architecture.
    We expend the architecture defined last year by Ciccone et al. Indeed, we considered
other features to predict the user’s gender.
    The specific (language dependant) and generic features (not related to the language)
classifiers recover different features from the text in order to improve the prediction pro-
vided by the Tf-idf classifier. We did not make any preprocessing in order to recover
generic features, as it would have modified the features. However, we made one prepro-
cessing step to recover specific features: we removed the punctuation marks. It prevents
some typos like "It is our.s" where "our.s" would have been an undetected pronoun.
Obviously, we did not remove apostrophes as we wanted to detect all the negation such
as "don’t" or "wouldn’t". We ran a feature selector (Shapley value based one) on these
features to only keep the most useful and accurate ones. In the table 1, we sum up the
features we extracted and the selected features per language.
    We trained this classifier on 70% of the dataset. On the last 30%, it reached an
accuracy of 0,7032 in Spanish and 0,7081 in English. We tried different classifiers on
these features. The best performance was reached by a Random Forest classifier. We
use the Sklearn implementation with 100 estimators. For the other parameters, we kept
the default values defined by Sklearn.
                                 English       Spanish
                                Bot Gender Bot Gender
                              0.9356 0.8295 0.7444 0.6667

                  Table 2: Accuracy obtained on the early bird submission


                                 English       Spanish
                                Bot Gender Bot Gender
                              0.9034 0.8333 0.8678 0.7917

                     Table 3: Accuracy obtained on the final submission


Meta classifier The meta classifier is a simple SVM classifier which takes a few fea-
tures as input. The Spanish meta classifier uses the prediction of the Tf-idf classifier
and the generic and specific feature classifiers, while the English meta classifier uses
the prediction of the Tf-idf classifier and the generic and specific features.
    The meta classifiers are trained with 30% of the dataset. To assess this relevance, we
train it with 10-fold cross-validation. As the training sets were small (620 human users
for the English, 460 human users for Spanish), we essentially focused on the average
accuracy of the 10-fold. Then, we trained both classifiers with all the humans contained
in dev splits.
    We used a meta classifier for two reasons. We needed an "arbitrator" in order to
determine a final prediction according to both low classifier predictions. Besides, it
improves the accuracy obtained by the low classifiers. Indeed, the meta classifier tries
to trust the low classifier according to their relevance and gives some importance to
each classifier to take advantage of both ones.


4   Results

In this section, we compare the results we obtained against our early bird results.


Bot Improvement We have added features to our English bot classifier used for the
early bird submission system. These features have slightly improved our results on the
training data but do not seem to improve our results on the final submission and might
have worsened the results. On the other hand, the modifications on Spanish have greatly
increased our results. The early bird and the final submission datasets are not the same
so we cannot draw conclusions before further testing.


Gender Improvement For the gender, we cannot consider the absolute accuracy, as it
is totally dependant on the bot detection accuracy. As a consequence, we chose to deter-
mine our progress by considering the percentage of accuracy lost, using the following
formula:
                                                 English Spanish
                           Early bird submission 0.8866 0.8960
                             Final submission 0.9224 0.9123

                      Table 4: Relative gender accuracy per submission


                                                     gender_accuracy
                   relative_gender_accuracy =
                                                      type_accuracy

    For both submissions, the relative gender accuracy is shown in table 4. In this ta-
ble, it can be seen that we improved the relative accuracy. Obviously, the differences
between bot detection accuracy and gender accuracy are caused by bad classification
of human gender. Thus, the results can be influenced by a huge proportion of bots in
gender accuracy. Nevertheless, as the improvement is over 3.5% in English and over
1.5% in Spanish, we can say with confidence that our modification had a good impact
on gender prediction.
    Consequently, we suppose the feature selector allowed better results. This supposi-
tion means some extracted features confused our classifier and could be considered as
noise. Furthermore, it seems that the training with both 2018 and 2019 datasets was a
good choice.
    It is difficult to give an overall conclusion to the gender results. Indeed, the accuracy
obtained on human detection is completely hidden. The comparison with the state-of-
the-art [6] is not possible.


5   Conclusion and Future Work

In this paper, we introduced our architecture for the author profiling shared task, pro-
posed by the PAN @CLEF challenge.
    For bot detection we based our approach on the work of Varol et al. [26]. As we only
have tweet text as data, we were only able to extract some of the features proposed and
thus, we identified other potential features and we chose to add some of these features
to our classifier.
    For the gender prediction, we implemented a 2-layers architecture. We based our
work on the solution of Ciccone et al.[5]. Nevertheless, we improved this solution by
integrating it into a more complex architecture. We took advantages of features not used
by the classifier based on Tf-idf.
    We also presented our results and the improvement we obtained in both early bird
and final submissions we have made.
    As future work, we would like to improve our results and be close (or better than)
the state-of-the-art of the challenge. Notably, we plan to integrate other features in the
gender part, such as features based on pre-trained word embedding models.
References
 1. Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in
    formal written texts. TEXT 23, 321–346 (2003)
 2. Basile, A., Dwyer, G., Medvedeva, M., Rawee, J., Haagsma, H., Nissim, M.: N-GrAM:
    New Groningen Author-profiling Model. arXiv:1707.03764 [cs] (Jul 2017),
    http://arxiv.org/abs/1707.03764, arXiv: 1707.03764
 3. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media,
    Inc., 1st edn. (2009)
 4. Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In:
    Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp.
    1301–1309. EMNLP ’11, Association for Computational Linguistics, Stroudsburg, PA,
    USA (2011), http://dl.acm.org/citation.cfm?id=2145432.2145568
 5. Ciccone, G., Sultan, A., Laporte, L., Granitzer, M.: Stacked Gender Prediction from Tweet
    Texts and Images p. 11 (2018)
 6. Daneshvar, S., Inkpen, D.: Gender Identification in Twitter using N-grams and LSA:
    Notebook for PAN at CLEF 2018. In: CLEF (2018)
 7. Dell’ Amerlina Ríos, M., Gravano, A.: Spanish DAL: A Spanish Dictionary of Affect in
    Language. In: Proceedings of the 4th Workshop on Computational Approaches to
    Subjectivity, Sentiment and Social Media Analysis. pp. 21–28. Association for
    Computational Linguistics, Atlanta, Georgia (Jun 2013),
    https://www.aclweb.org/anthology/W13-1604
 8. Ferrara, E.: Disinformation and Social Bot Operations in the Run Up to the 2017 French
    Presidential Election. First Monday 22(8) (Jul 2017), http://arxiv.org/abs/1707.00086,
    arXiv: 1707.00086
 9. Ferrara, E., Varol, O., Menczer, F., Flammini, A.: Detection of Promoted Social Media
    Campaigns p. 4 (2016)
10. Gravano, A., Rıos, M.G.D.: Spanish DAL: A Spanish Dictionary of Affect in Language
    p. 24
11. Kheng, G.: Author Profiling : author ”gender” and ”variety of language” retrieval in tweets.
    Master’s thesis, University of Passau and INSA de LYON (Sep 2017)
12. Kheng, G., Laporte, L., Granitzer, M.: INSA LYON and UNI PASSAU’s participation at
    PAN@CLEF’17: Author Proling task p. 11 (2017)
13. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
    Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D.,
    Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal
    of Machine Learning Research 12, 2825–2830 (2011)
14. Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.G.: Psychological aspects of natural
    language use: Our words, our selves. Annual Review of Psychology 54(1), 547–577 (2003),
    https://doi.org/10.1146/annurev.psych.54.101601.145041, pMID: 12185209
15. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture.
    In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World -
    Lessons Learned from 20 Years of CLEF. Springer (2019)
16. Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd
    Author Proling Task at PAN 2015 p. 40
17. Rangel, F., Rosso, P.: Overview of the 7th Author Profiling Task at PAN 2019: Bots and
    Gender Profiling. In: Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.) CLEF 2019
    Labs and Workshops, Notebook Papers. CEUR-WS.org (Sep 2019)
18. Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B.,
    Daelemans, W.: Overview of the 2nd Author Proling Task at PAN 2014 p. 30
19. Rangel, F., Rosso, P., Montes-y Gómez, M., Potthast, M., Stein, B.: Overview of the 6th
    Author Proling Task at PAN 2018: Multimodal Gender Identication in Twitter p. 38 (2018)
20. Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the Author
    Proling Task at PAN 2013 p. 13
21. Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th Author Proling Task at
    PAN 2017: Gender and Language Variety Identication in Twitter p. 26 (2017)
22. Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of
    the 4th Author Proling Task at PAN 2016: Cross-Genre Evaluations p. 35
23. Savoy, J.: Analysis of the style and the rhetoric of the 2016 US presidential primaries. DSH
    33(1), 143–159 (2018), https://doi.org/10.1093/llc/fqx007
24. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging
    SS-06-03, 191–197 (8 2006)
25. Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: Liwc and
    computerized text analysis methods. Journal of Language and Social Psychology 29(1),
    24–54 (2010), https://doi.org/10.1177/0261927X09351676
26. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online Human-Bot
    Interactions: Detection, Estimation, and Characterization. arXiv:1703.03107 [cs] (Mar
    2017), http://arxiv.org/abs/1703.03107, arXiv: 1703.03107
27. Wang, A.H.: Detecting Spam Bots in Online Social Networking Sites: A Machine Learning
    Approach. In: Foresti, S., Jajodia, S. (eds.) Data and Applications Security and Privacy
    XXIV. pp. 335–342. Lecture Notes in Computer Science, Springer Berlin Heidelberg
    (2010)
28. Whissell, C.M.: Chapter 5 - THE DICTIONARY OF AFFECT IN LANGUAGE. In:
    Plutchik, R., Kellerman, H. (eds.) The Measurement of Emotions, pp. 113–131. Academic
    Press (Jan 1989), http://www.sciencedirect.com/science/article/pii/B9780125587044500116
29. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the
    public with AI to counter social bots. arXiv:1901.00912 [cs] (Jan 2019),
    http://arxiv.org/abs/1901.00912, arXiv: 1901.00912

</pre>