=Paper=
{{Paper
|id=Vol-2482/paper39
|storemode=property
|title=Make Social Networks Clean Again: Graph Embedding and Stacking Classifiers for Bot Detection
|pdfUrl=https://ceur-ws.org/Vol-2482/paper39.pdf
|volume=Vol-2482
|authors=Kirill Skorniakov,Denis Turdakov,Andrey Zhabotinsky
|dblpUrl=https://dblp.org/rec/conf/cikm/SkorniakovTZ18
}}
==Make Social Networks Clean Again: Graph Embedding and Stacking Classifiers for Bot Detection==
<pdf width="1500px">https://ceur-ws.org/Vol-2482/paper39.pdf</pdf>
<pre>
    Make Social Networks Clean Again: Graph Embedding
         and Stacking Classifiers for Bot Detection

           Kirill Skorniakov                              Denis Turdakov                   Andrey Zhabotinsky
    Ivannikov Institute for System                 Ivannikov Institute for System      Ivannikov Institute for System
     Programming of the Russian                     Programming of the Russian          Programming of the Russian
         Academy of Sciences                            Academy of Sciences                Academy of Sciences
    Moscow Institute of Physics and                 National Research University                Lomonosov
     Technology (State University)                  Higher School of Economics            Moscow State University
            Moscow, Russia                                Moscow, Russia                    Moscow, Russia
     kirill.skorniakov@ispras.ru                        turdakov@ispras.ru               zhabotinsky@ispras.ru


                                                                      Effective automatic methods are required to search
                                                                   for bots on the scale of the entire social network.
                         Abstract                                  Many papers addressed this problem in recent years
                                                                   [10, 13, 18, 1, 8]. Most of them are based on su-
     The paper introduces a novel approach to the                  pervised machine learning and produce valuable re-
     detection of social bots using ensembling of                  sults. Authors use features extracted from informa-
     classifiers. We also studied the impact of                    tion about particular profile available in most OSNs
     different feature sets and demonstrated the                   such as the publication of posts and comments, the
     power of graph embedding which is underused                   formation of non-directional (friendship) and directed
     by the existing methods. The main contri-                     (subscription) links. As far as we know, there are no
     bution of this work is a creating of a stack-                 works that use the global network structure for this
     ing based ensemble, which effectively exploits                purpose. In this paper, we correct this omission with
     text and graph features. Empirical evalua-                    the aid of graph embedding.
     tion proved the effectiveness of the proposed                    Any development of a supervised classifier for bot
     method for bots detection and showed im-                      detection meets several difficulties. First of all, it’s
     provement in comparison to existing solutions                 hard to determine the definition of a “bot” and receive
     by 4-9 points of AUC.                                         a labeled dataset. Another problem is the creation
                                                                   of a qualitative classifier that can effectively use and
                                                                   combine text, graph and other types of data. In this
1    Introduction                                                  work, we explore and utilize the existing solutions for
Online social networks (OSNs) are an important part                the first problem and focus on improving solutions for
of life for many people. More than 2 billion people                the second problem.
use social network services. Unfortunately, online so-                Many good algorithms that use different feature ex-
cial networks like many other technologies provide op-             traction methods were developed recently. It seems
portunities for illegal and undesirable activities. Thus           reasonable to combine them in order to achieve state-
OSNs can be used to spread spam, phishing links, fake              of-the-art result and stacking of algorithms can help
news. In addition, there are many malefactors and                  achieve a quality gain in this task.
fraudsters who extort money from users or engage in                   To summarize, we make the following contributions:
illegal advertising. They disturb and defraud upstand-
ing users. That is why the administration of social                  • We applied graph embedding techniques to ex-
networks tries to find and block their profiles.                       tract additional features from the entire network;

Copyright © CIKM 2018 for the individual papers by the papers'       • We also created an efficient stacking based bot
authors. Copyright © CIKM 2018 for the volume as a collection          classifier, which combines graph and text infor-
by its editors. This volume and its papers are published under
                                                                       mation.
the Creative Commons License Attribution 4.0 International (CC
BY 4.0).
   The rest of the paper is organized as follows. In         statistics of a user’s followers and tweets. Onur Varol
Section 2 we describe existing solutions. Section 3          et.al. [15] proposed framework with many different fea-
is devoted to our stacking-based classifier. Section 4       tures. They are statistics of emotions, pos tags; post-
presents our experiments. At the Section 5 we sum-           ing time features, simple statistics of retweets, men-
marize and conclude.                                         tions, and hashtag co-occurrence networks. We will
                                                             use works [8, 10] as baselines. Unfortunately, we can’t
2     Related Work                                           directly compare with the [15] because they used many
                                                             tools and features specific in English, but in our sam-
2.1    Bot Detection Labels                                  ples, we have many different languages.
An important issue is to collect a labeled dataset to de-       To the best of our knowledge, the researchers did
tect bots using machine learning algorithms The main         not apply graph embedding techniques to the entire
difficulties arise from the definition of the concept of a   social graph to improve their solution to bots detec-
bot. However, despite the ambiguity of the definition,       tion.
there are ways to obtain a rather precise labeling of
bot dataset.                                                 2.3   Graph Embedding
   According to [10] all labeling techniques can be cat-     In the classical approach to building a feature vector
egorized into three groups. They are:                        for objects, features are usually invented by experts
    • Manual annotation;                                     in the field. This approach has several limitations.
                                                             First of all, for each new area, an involvement of ex-
    • Lists of suspended users – users blocked by the        perts leads to additional expenses for development. In
      social network administration;                         addition, features created by them could be compu-
                                                             tationally complex. Therefore, methods of learning
    • Honeypots [8] – bots created by researches to lure     representations, which automatically obtain qualita-
      other bots.                                            tive representations of small dimensions became pop-
                                                             ular in recent years. Such representations are usually
   We use the second approach because it requires the
                                                             called “embedding” (word embedding, graph embed-
least human resources for the markup and utilizes an
                                                             ding, etc.). There are a large number of such meth-
underlying information about the rules used by social
                                                             ods for graphs. DeepWalk [12], Node2vec [4], and
network administration which is unobtainable to ex-
                                                             LINE [14] are the most popular of them due to the
ternal researchers (for example, the number of com-
                                                             linear complexity of the number of edges or vertices.
plaints about the user).
                                                                Since in our experience the quality of the above al-
                                                             gorithms does not differ much for unweighted large
2.2    Bot Detection Features
                                                             graphs, we used in our research LINE because of its
A wide range of various features can be extracted from       good computational speed.
a social network and used for bot detection. Based              In the LINE two probability models of the appear-
on [10, 13, 18, 1] we can group these features in the        ance of edges in the graph were proposed. The first
following way:                                               model, which preserve first-order proximity maximizes
                                                             the joint probability of the observed edges {(vi , vj )}i,j
    • Text features which include the number of hash-        in accordance with the following model:
      tags, links, geo-tags, words from spam list, statis-
      tics of posts sentiment and topics;                                                          1
                                                                          p1 (vi , vj ) =                       ,
                                                                                            1 + exp(−~uTi ~uj )
    • User-profile features usually imply the username,
      the number of friends, subscribes, photos, audios,     where ~ui – vector representations of nodes.
      retweets;                                                 In the second model, vertices have two different rep-
                                                             resentations (as in word2vec [9] model): ~ui when ver-
    • Time features include statistics of user online        tex treated as vertex itself and ~u0i – when vertex treated
      time, user publications time;                          as the context of other vertices. Then joint link prob-
                                                             ability would be p(vi , vj ) = p(vi )p2 (vj |vi ), where
    • Graph features involve information extracted
      from friendship and subscribes graphs such as                                                    T
                                                                                              exp(u~0j ~ui )
      PageRank and centrality.                                           p2 (vj |vi ) = P                        .
                                                                                             |V |    ~0 T ui )
                                                                                             k=1 exp(uk ~
   Fred Morstatter et. al. [10] used topics distribution,
obtained by the Latent Dirichlet Allocation (LDA) al-           The hidden parameters of these models are vector
gorithm on tweets. Kyumin Lee et. al. [8] utilized           representations of the vertices. These parameters are
          Bag of
                          LogReg
        Subscribes
                                                                         Table 1: Bot Datasets
                                                                  Social Network Active Banned
        word2vec           GRU
                                                                    VKontakte       158806     21263
                                                                      Twitter        69602      7886
         TF-IDF           LogReg

                                            GB
                                                           word dictionary, |D3char |, |D4char | — sizes of 3- and 4-
                          LogReg
                                                           grams dictionaries. Then TF-IDF transformation was
                           KNN
                                                           applied to this feature vector and logistic regression
          LINE                                             was used for classification.
                           GB                                 Another classifier was GRU recurrent neural net-
                                                           work [2] that receives word embeddings created by
                           MLP
                                                           well-known word2vec model [9] as input. We used
                                                           word vectors with 300 dimensions preliminary trained
                                                           on the our large corpus (3.3GB) of informal text col-
        Figure 1: Proposed scheme of stacking              lected from online social networks.

obtained by minimizing the Kullback-Leibler diver-         3.2   Subscribes classifier
gence between the model and observed distributions
(which is equivalent to maximizing the likelihood). To     Information about subscribes we handle the following
accelerate calculations, the method of negative sam-       way:
pling [9] is used.
                                                            1. Select largest N groups by number of subscribers;

3     Stacking for bot detection                            2. Each user is represented as “bag of subscribes” –
                                                               by the selected at previous step groups.
In this section, we turn to the construction of a clas-
sifier for prediction of bots. We use three types of       “Bag of subscribes” is a binary vector with information
attributes for each user: friendship graph, subscrip-      about subscribes. The i-th element of this vector is
tion information, and user’s texts. We produce various     1 if the user is subscribed to the i-th group and 0
transformations of these attributes to obtain a great      otherwise. Length of final vector is N .
set of vector representations of a user.                      Then we train logistic regression on this feature vec-
    Then there are two main options for exploring dif-     tor with N = 10000.
ferent feature spaces: combine these views with one
large classifier or train a separate classifier on each
                                                           3.3   Graph classifiers
of them and combine these classifiers with ensemble
techniques. The first option considers relations be-       For friendship analysis we use graph embedding fea-
tween features in various spaces which could improve       tures with size d = 100 obtained from LINE as input
the predictive power of a model. But at the same time,     to different classifiers:
such classifiers have less flexibility and a tendency to
overfitting. By flexibility, we mean the ability to use      • Multilayer perceptron (MLP)
the classifier or its parts in the case when for some
features don’t exist for certain users or reusability of     • K nearest neighbours with cosine distance (KNN)
parts of the classifiers in the new domain. In addition,
                                                             • Gradient Boositng Classifier (GB)
the effectiveness of the method of ensemble algorithms
was repeatedly proved in the data analysis competition       • Logistic Regression (LogReg)
and papers. Therefore, we chose the second option.
    For each of the three types of attributes, we build       Machine learning algorithms were implemented
their transformations into vector representations and      with scikit-learn [11], lightGBM [6] and keras [3].
train on them different classifiers.
                                                           3.4   Stacking
3.1    Text classifiers
                                                           There are several common ways to combine classifiers
As one text representation we use the concatena-           predictions. The most popular are the weighted aver-
tion of bags of words and char n-grams representa-         age and stacking. In case of weighted average result-
tions, where n = (3, 4). Resulting feature vector has      ing prediction is a convex combination of all classifiers
|D| + |D3char | + |D4char | dimensions, where |D| — size   scores.
    The main idea of stacking [16] is to use predic-                 For obtaining labels for VKontakte dataset we use
tions of existing classifiers (which are called “first-           the following process:
layer” classifiers) as new features. Then a new clas-
sifier, called a meta-classifier, is trained on them.              1. All users profiles from the social network are col-
    We append vector representations of the vertices of               lected;
the friendship graph as the most “powerful” features to            2. Profiles with the “deleted” and “banned” statuses
the predictions of classifiers of the first level. Then we            are excluded from the sample;
use gradient boosting as a meta-classifier. The scheme
of the proposed method is shown in Figure 1.                       3. Statuses of all remaining accounts are recollected;

                                                                   4. Users with the “banned” status after recollection
4     Evaluation                                                      receive a label “1”;
In this section, we empirically evaluate the proposed
                                                                   5. Users with the “active” status receive a label “0”.
method on two real-life datasets and compare our re-
sults with existing solutions.                                        We use data collected from 2 until 16 November
                                                                  2017 for our experiments 5 .
4.1   Twitter Dataset                                                 Of all users, we select users with more than 10
                                                                  friends and texts in the open groups for the last month.
There are several papers, which share their Twitter-              The restriction on the number of friends is introduced
based datasets ([15, 8]) 1 2 . Unfortunately, these               to accelerate the computation of vector representa-
datasets don’t contain followers graphs, because this             tions. The second restriction is necessary in order
data can’t be easily collected due to Twitter’s API               to use text attributes in the final algorithm. Note
limit. Also, we can’t obtain any information from sus-            that both these restrictions only affect the speed of
pended accounts. That’s why we construct dataset in               the experiment, because if one of the attributes (text
the following way: we use tweets from [17] 3 , graph              or graph of friendship) is absent, the user can be classi-
from [7] 4 and relations between sceen name and                   fied by another available attribute using only one clas-
user id (for suspended users) from [5]. The result-               sifier from the ensemble. The resulting graph contains
ing dataset consists of 77488 users that are present in           110 million of vertices and 7 billion of edges.
all these three datasets.                                             To reduce the required computational resources for
   We used Twitter API to determine the suspended                 machine learning algorithms we select a small sample
user’s. Users who have been suspending received a                 from the original dataset. We take all banned user
label of 1, the rest received a label of 0.                       among the selected above as positive labels and 5% of
                                                                  active users as negative labels. The parameters of the
4.2   VKontakte Dataset                                           final dataset are shown in the Table 1. Note that the
In order to show that our method is applicable on                 algorithms of the graph embedding algorithms were
different social networks, we also collect data from              still trained on the whole friendship graph for obtain-
VKontakte (Russian social network similar to Face-                ing good node representations.
book) that had friendly open API with fewer restric-
                                                                  4.3   Metric and Baselines
tions than in most OSNs. We use only public infor-
mation, such as user’s posts and comments in open                 As a measure of quality, we use the area under ROC-
groups, their friendship information, group subscribes.           curve (AUC), that is a common metric for binary clas-
   Data is downloaded using VKontakte API. We                     sifications tasks.
use information about the status of users as labels.                  As a first baseline algorithm for comparison, we
Users can have one of the following statuses: “active”,           took the results of the detection of bots from the pa-
“deleted”, ”banned”. Users with the “deleted” status              per by Zegzhda et.al. [18]. The choice of the paper
have deleted their accounts themselves. Users with                is justified by using the same task, quality measures,
the “banned” status were blocked by VKontakte ad-                 and social network. We also used Twitter bot detec-
ministration. We can’t obtain any information about               tion algorithms BoostOR [10] and Boosting [8]. Our
non-active profiles due to restrictions of VKontakte.             implementation of the Boosting algorithm didn’t use
                                                                  one feature — dynamic of followers number, because
   1 https://botometer.iuni.iu.edu/bot-
                                                                  we don’t have this information in our datasets. For
repository/datasets.html
   2 http://bit.ly/asonam-bot-data                                  5 VKontakte         dataset       is    available        at
   3 http://snap.stanford.edu/data/bigdata/twitter7/tweets2009-   http://talisman.ispras.ru/datasets/
09.txt.gz                                                            6 We did not re-implement the method and only give the re-
   4 https://snap.stanford.edu/data/twitter-2010.html             sult published by the author [18]
              Table 2: Bot detection results               achieves the best score on the tests. Experimental re-
    Transformation      Classifier    VK       Twitter     sults on real-life datasets show the effectiveness of the
         tf-idf         LogReg       0.714      0.632      proposed method and its applicability to the analysis
       Word2vec         GRU          0.687      0.592      of the different social networks. Our method improved
         LINE           GB           0.779      0.639      existing solutions by 4-9% of AUC score.
                        LogReg       0.760      0.629
                        KNN          0.756      0.623      5.1   Acknowledgements
                        MLP          0.769      0.638
                                                           This work is funded by the Minobrnauki Russia (grant
        “Bag of
                                                           number: 14.604.21.0199 (id:RFMEFI60417X0199)).
     subscribes”        LogReg       0.692        -
       first-layer      Weighted
       classifiers      Average      0.802      0.652      References
      first-layer
 classifiers + LINE     GB           0.820      0.652       [1] F. Benevenuto, G. Magno, T. Rodrigues, and
                                                                V. Almeida. Detecting spammers on twitter.
 Zegzhda et al. [18]    MLP          0.73 6       -
                        BoostOR                                 In Collaboration, electronic messaging, anti-abuse
          LDA             [10]       0.659      0.613           and spam conference (CEAS), volume 6, page 12,
                        Boosting                                2010.
      Lee et al. [8]       [8]       0.637      0.605
                                                            [2] K. Cho, B. Van Merriënboer, D. Bahdanau, and
performance estimation, we do standard 5-fold cross-            Y. Bengio. On the properties of neural machine
validation.                                                     translation: Encoder-decoder approaches. arXiv
                                                                preprint arXiv:1409.1259, 2014.
4.4    Results and Discussions
                                                            [3] F. Chollet et al. Keras, 2015.
The results of the experiments are shown in Table 2.
It’s clearly seen that proposed method based on stack-      [4] A. Grover and J. Leskovec. node2vec: Scalable
ing different classifiers outperform existing approaches        feature learning for networks. In Proceedings of
by 4-9 points of AUC. It is worth paying attention              the 22nd ACM SIGKDD international conference
to the features obtained from graph embedding tech-             on Knowledge discovery and data mining, pages
niques. LINE provides powerful user representations.            855–864. ACM, 2016.
It even allows achieving good quality with single clas-
sifiers. We believe that the main reason for this is        [5] N. O. Hodas and K. Lerman. The simple rules of
that the graph structure is the most complex charac-            social contagion. Scientific reports, 4:4343, 2014.
teristic of a user account. Thus, the creation of a bot
with a friendship graph similar to a normal user is a       [6] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen,
complex and time-consuming task and the most bot                W. Ma, Q. Ye, and T.-Y. Liu. Lightgbm: A highly
makers don’t do this.                                           efficient gradient boosting decision tree. In Ad-
    Results also show that stacking of first layer clas-        vances in Neural Information Processing Systems,
sifiers with graph embedding features allows boosting           pages 3146–3154, 2017.
the best single classifier scores by 1-4% of AUC. We
consider that the relatively poor results of all algo-      [7] H. Kwak, C. Lee, H. Park, and S. Moon. What
rithms on the Twitter dataset are caused by peculiar-           is twitter, a social network or a news media? In
ities of its construction. There was more than 8 years          Proceedings of the 19th international conference
between the collection of user’s texts and graphs and           on World wide web, pages 591–600. AcM, 2010.
the moment of the collection of labels. Also, to the
best of our knowledge, Twitter does not let you know        [8] K. Lee, B. D. Eoff, and J. Caverlee. Seven months
if the user deleted his profile or it was blocked.              with the devils: A long-term study of content
                                                                polluters on twitter. In ICWSM, pages 185–192,
5     Conclusion                                                2011.

This paper presents an ensemble approach to bots de-        [9] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado,
tection problem. We showed that graph embedding                 and J. Dean. Distributed representations of words
techniques can be used to obtain powerful features to           and phrases and their compositionality. In Ad-
this task. We proposed stacking algorithm which effec-          vances in neural information processing systems,
tively combines text- and graph-based classifiers and           pages 3111–3119, 2013.
[10] F. Morstatter, L. Wu, T. H. Nazer, K. M. Carley,
     and H. Liu. A new approach to bot detection:
     striking the balance between precision and recall.
     In Proceedings of the 2016 IEEE/ACM Interna-
     tional Conference on Advances in Social Networks
     Analysis and Mining, pages 533–540. IEEE Press,
     2016.
[11] F. Pedregosa et al. Scikit-learn: Machine learning
     in python. Journal of machine learning research,
     12(Oct):2825–2830, 2011.
[12] B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk:
     Online learning of social representations. In Pro-
     ceedings of the 20th ACM SIGKDD international
     conference on Knowledge discovery and data min-
     ing, pages 701–710. ACM, 2014.
[13] V. Subrahmanian, A. Azaria, S. Durst, V. Ka-
     gan, A. Galstyan, K. Lerman, L. Zhu, E. Ferrara,
     A. Flammini, and F. Menczer. The darpa twitter
     bot challenge. Computer, 49(6):38–46, 2016.
[14] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan,
     and Q. Mei. Line: Large-scale information net-
     work embedding. In Proceedings of the 24th Inter-
     national Conference on World Wide Web, pages
     1067–1077. International World Wide Web Con-
     ferences Steering Committee, 2015.
[15] O. Varol, E. Ferrara, C. A. Davis, F. Menczer,
     A. Flammini, V. Subrahmanian, A. Azaria,
     S. Durst, V. Kagan, A. Galstyan, et al. Online
     human-bot interactions: Detection, estimation,
     and characterization. Comm. ACM, 59:7, 2016.
[16] D. H. Wolpert. Stacked generalization. Neural
     networks, 5(2):241–259, 1992.
[17] J. Yang and J. Leskovec. Patterns of tempo-
     ral variation in online media. In Proceedings of
     the fourth ACM international conference on Web
     search and data mining, pages 177–186. ACM,
     2011.
[18] P. Zegzhda, E. Malyshev, and E. Y. Pavlenko.
     The use of an artificial neural network to detect
     automatically managed accounts in social net-
     works, 2017.

</pre>