An Unsophisticated Neural
                  Bots and Gender Profiling System
                        Notebook for PAN at CLEF 2019

                           Oren Halvani? and Philipp Marquardt

                 Fraunhofer Institute for Secure Information Technology SIT
                       Rheinstrasse 75, 64295 Darmstadt, Germany
                        {FirstName.LastName}@SIT.Fraunhofer.de


       Abstract In recent years a sharp increase of bot-aided campaigns can be ob-
       served across social media networks. As a consequence, an own research disci-
       pline known as social bot detection has been established, to counteract these. In
       the context of the shared task "Bots and Gender Profiling" at the PAN workshop,
       we propose a simple neural network-based approach that determines for a given
       Twitter feed whether its author is a bot or a human, where in the latter case it dis-
       tinguishes between male and female authors. On the official English test set, our
       approach achieves an accuracy of 92% and 83% for type and gender detection,
       respectively. For the Spanish test set, however, the results are lower (82% for type
       and 74% for gender detection).


1   Introduction
Bots and gender profiling can be seen as research tasks in the field of digital text foren-
sics where, from the perspective of machine learning, both represent classification prob-
lems. In general, bots detection deals with the problem to judge if a piece of text (for
instance, a Facebook post or a Twitter tweet) stems from a human or a bot, while gen-
der profiling focuses on the question whether the text was written by a male or a female
author. With the rise and growth of social networks, social bots became more and more
present. As an attempt to counteract these, the organizers of the PAN workshop1 invited
researchers and practitioners to participate in the shared-task bots and gender profiling.
In the context of this, we present a very simple approach based on a feed-forward neural
network that was ranked 18th out of 55 participants.


2   Related Work
Over the years, many approaches have been proposed for both bot detection and gender
profiling. In 2014, for example, Dickerson et al. [3] proposed their SentiBot system,
?
   Corresponding author.
   Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons Li-
   cense Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano,
   Switzerland.
 1
   https://pan.webis.de/clef19
which uses sentiment to distinguish humans from bots on Twitter. More precisely, they
considered four classes of features related to tweet syntax, tweet semantics, user be-
havior as well as network-centric user properties. SentiBot relies on an ensemble of six
classifiers (Naive Bayes, SVMs, AdaBoost, Gradient Boosting, Random Forests and
Extremely Randomized Trees) and achieved a score of 0.73 in terms of AUC on the
India Election Dataset, which consists of 7.7 million tweets stemming from 550,000
Twitter accounts. One of the findings of Dickerson et al. was that sentiment related
factors play a significant role in regard to the detection of bots and that considering
the topics of interest to an application into account is highly important to identify bots
associated with a specific application.
    In 2017, Varol et al. [6] presented a similar framework for bot detection on Twitter.
Based on a large number of tweets, their framework extracted 1,150 features, which they
categorized into six different classes (user meta-data, friends/connected users, tweet
content, sentiment, network patterns and activity time series. As an underlying model,
the authors tried out a variety of classification algorithms (Random Forests, AdaBoost,
Logistic Regression and Decision Tree classifiers), where the best performance was ob-
tained using the Random Forest classifier. In contrast to the study of Dickerson et al. [3],
here, Varol et al. state that both user meta-data and content features are the most promis-
ing classes to detect simple bots. To evaluate their approach, the authors used a dataset
consisting of 14 millions twitter accounts of English-speaking active users. Their initial
system yielded an AUC score of 0.95 on this dataset. Afterwards, the authors applied
their approach on a more challenging dataset, where it also achieved a high score (0.94
AUC). In regard to their analysis, the authors made several interesting findings. They
estimate, for example, that between 9% and 15% of the active Twitter accounts are
bots. Also, they observed that simple bots tend to interact with bots that exhibit more
human-like behaviors. Furthermore, the authors performed clustering analysis, where
the resulting clusters point mainly to three subclasses of accounts (spammers, self pro-
moters, and accounts that post content from connected applications).


3     Proposed Approach

In the following, we propose our bots and gender profiling method, which is essentially
a simple feedforward-based neural network. However, before introducing the approach
in more detail, we first mention the preprocessing steps that were performed on the
respective documents.


3.1    Preprocessing

During the inspection of the provided corpora (more precisely, the inception of the
underlying documents) we observed a large variety of noise such as citations, HTML
encoded string such as \&amp;, inconsistent apostrophe usage, etc. Initially, we at-
tempted to clean the noise using a fine-grained preprocessing procedure based on true-
casing [4], lexical normalization [7], accents / diacritics normalization2 , etc. However,
 2
     https://github.com/motss/normalize-diacritics
                                                                                                                                          global max pooling                                       fully connected softmax output


                                                                                                                                        ...


                          input text length


                                              Global Max-Pooling


                                                                   Global Max-Pooling


                                                                                        Global Max-Pooling


                                                                                                             Global Max-Pooling


                                                                                                                                              Global Max-Pooling


                                                                                                                                                                   Global Max-Pooling
                                                                                                                                  ...


                                                                                                                                                                                             ...
     Wherefore she went
        after their ...                                                                                                                 ...

                                                                                                                                        ...


                                                                                                                                                                                                                 ...
                                                           200 dimensional embedding

                                                                                                                                                                                        200 dimensions         64 units


                                                                   Figure 1. Architecture of our approach.


after using these in our preliminary analysis, we noticed a strong decrease in terms of
accuracy. Therefore, we only performed "low-level" preprocessing steps including:
 – Concatenation of all tweets in each XML-file into a one long document
 – Lowercasing of the entire text
 – Substitution of noisy elements with a dummy token as, for example, twitter handles
   (@ → §AT§), URLs (http... → §URL§), hashtags (# → §HASHTAG§), numbers
   ([0-9]+ → §NUMBER§), Emojis (... → §EMJOI§), punctuation marks ([.,?¿]+ →
   §PUNCTATION§), retweets (RT → §RT§).

3.2      Network Architecture
Our approach represents a simple feedforward neural network3 , which involves a single
hidden layer. The architecture is illustrated in Figure 1). As can be seen, we first tok-
enize a given document and map each token into an embedding4 vector. Next we apply
global max pooling on the embedding dimensions over the sequence of tokens and con-
catenate the resulting pooled values to a compact representation vector, which is then
fed into a simple fully connected hidden layer. The output layer performs the binary
classification using the Softmax function. We used the same architecture for both clas-
sification scenarios human vs. bot and male vs. female. Furthermore, the architecture
was used for both languages English and Spanish.

3.3      Hyperparameter Optimization
To optimize the hyperparameters of the network, we applied Random Search [1]. From
the pool of all constructed configurations, we picked the one that led to the most stable
 3
     We use the open-source neural-network framework Keras (https://keras.io)
 4
     Note that we learn embeddings from scratch rather than using pretrained models.
results at the expense of accuracy. The hyperparameters of this configuration are listed
in Table 1. Due to the varying lengths of the documents, we performed the following


            Hyperparameter           Value
            Vocabulary size        10,000
            Input text length      2,500 characters
            Embedding dimension 200
            Dropout                0.5
            Epochs                 35
            Batch size             64 (= number of units in the hidden layer)
            Loss function          Categorical cross entropy
            Optimizer              Adam (learning rate = 0.001)
            Activation function    ReLu (hidden layer), Softmax (output layer)
                         Table 1. Hyperparameters of our approach.


strategy: Short documents with < 2, 500 tokens were padded with zero values, while
longer texts were truncated after the 2,500-th token.
    In addition to dropout, we made use of Early Stopping [2] to counteract overfitting.
Here, we observed that in many cases only few epochs (≤ 10) were needed, until the
network reached a state, where the accuracy stopped to improve. Here, we also used the
Keras callback function ReduceLROnPlateau to reduce the learning rate by 1e-1, where
1e-8 was the minimum value.


4     Evaluation

In order to reduce overfitting, we trained our approach on the provided training set
(truth-train.txt) and evaluated the learned model on the development set (truth-dev.txt),
as suggested5 by the PAN organizers. On the validation set our approach achieved an
accuracy of 97.69%. Afterwards, we applied the learned model on the official test set
hosted on the TIRA6 [5] platform. The results are listed in Table 2.


                 Language Type (bot vs. human) Gender (male vs. female)
                English                    91.59%                       82.73%
                Spanish                    82.39%                       73.78%
             Table 2. Results for the official test set (test-dataset2-2019-04-29).


 5
     https://pan.webis.de/clef19/pan19-web/author-profiling.html
 6
     https://www.tira.io/
5   Conclusion and Future Work

We proposed a simple feedforward-based neural network that aimed to distinguish for
a given Twitter feed whether its author is a bot or a human, where in the latter case the
gender (male/female) is also classified. Although, the proposed method is quite simple,
we observed in preliminary experiments that it was able to outperform more advanced
approaches based on CNN and LSTM building blocks. In the near future, we plan to
experiment with more sophisticated techniques such as Transformer-based networks
that are able to capture fine-grained patterns in the embedding space.


References
1. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn.
   Res. 13, 281–305 (Feb 2012),
   http://dl.acm.org/citation.cfm?id=2188385.2188395 3
2. Caruana, R., Lawrence, S., Giles, L.: Overfitting in neural nets: Backpropagation, conjugate
   gradient, and early stopping. In: Proceedings of the 13th International Conference on Neural
   Information Processing Systems. pp. 381–387. NIPS’00, MIT Press, Cambridge, MA, USA
   (2000), http://dl.acm.org/citation.cfm?id=3008751.3008807 4
3. Dickerson, J.P., Kagan, V., Subrahmanian, V.S.: Using Sentiment to Detect Bots on Twitter:
   Are Humans More Opinionated Than Bots? In: Proceedings of the 2014 IEEE/ACM
   International Conference on Advances in Social Networks Analysis and Mining. pp.
   620–627. ASONAM ’14, IEEE Press, Piscataway, NJ, USA (2014),
   http://dl.acm.org/citation.cfm?id=3191835.3191957 1, 2
4. Lita, L.V., Ittycheriah, A., Roukos, S., Kambhatla, N.: tRuEcasIng. In: Proceedings of the
   41st Annual Meeting of the Association for Computational Linguistics. pp. 152–159.
   Association for Computational Linguistics, Sapporo, Japan (Jul 2003),
   https://www.aclweb.org/anthology/P03-1020 2
5. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture.
   In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World -
   Lessons Learned from 20 Years of CLEF. Springer (2019) 4
6. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online Human-Bot
   Interactions: Detection, Estimation, and Characterization. In: Proceedings of the Eleventh
   International Conference on Web and Social Media, ICWSM 2017, Montréal, Québec,
   Canada, May 15-18, 2017. pp. 280–289. AAAI Press (2017),
   https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15587
   2
7. Xu, K., Xia, Y., Lee, C.: Tweet normalization with syllables. In: ACL (1). pp. 920–928. The
   Association for Computer Linguistics (2015) 2