When social bots attack: Modeling susceptibility of users in online social networks Claudia Wagner Silvia Mitter Christian Körner Institute for Information and Knowledge Management Knowledge Management Communication Technologies Institute Institute JOANNEUM RESEARCH Graz University of Technology Graz University of Technology Graz, Austria Graz, Austria Graz, Austria claudia.wagner@joanneum.at smitter@student.tugraz.at christian.koerner@tugraz.at Markus Strohmaier Knowledge Management Institute and Know-Center Graz University of Technology Graz, Austria markus.strohmaier@tugraz.at ABSTRACT be used to spread misinformation and propaganda, as one Social bots are automatic or semi-automatic computer pro- could for example see during the US political elections [9]. grams that mimic humans and/or human behavior in online Recently a new breed of computer programs so-called social social networks. Social bots can attack users (targets) in on- media robots (short social bots or bots) emerged in OSN. line social networks to pursue a variety of latent goals, such Social bots are automatic or semi-automatic computer pro- as to spread information or to influence targets. Without grams that mimic humans and/or human behavior in OSN. a deep understanding of the nature of such attacks or the Social bots can be directed to attack users (targets) to pur- susceptibility of users, the potential of social media as an sue a variety of latent goals, such as to spread information instrument for facilitating discourse or democratic processes or to influence users [7]. Recent research [1] highlights the is in jeopardy. In this paper, we study data from the So- danger of social bots and shows that Facebook can be infil- cial Bot Challenge 2011 - an experiment conducted by the trated by social bots sending friend requests to users. The WebEcologyProject during 2011 - in which three teams im- average reported acceptance rate of such friend requests was plemented a number of social bots that aimed to influence 59.1% which also depended on how many mutual friends the user behavior on Twitter. Using this data, we aim to de- social bots had with the infiltrated users, and could be up velop models to (i) identify susceptible users among a set to 80%. This study clearly demonstrates that modern secu- of targets and (ii) predict users’ level of susceptibility. We rity defenses, such as the Facebook Immune System, are not explore the predictiveness of three different groups of fea- prepared for detecting or stopping a large-scale infiltration tures (network, behavioral and linguistic features) for these caused by social bots. tasks. Our results suggest that susceptible users tend to use Twitter for a conversational purpose and tend to be more We believe that modern social media security defenses need open and social since they communicate with many different to advance in order to be able to detect social bot attacks. users, use more social words and show more affection than While identifying social bots is crucial, identifying users who non-susceptible users. are susceptible to such attacks - and implementing means to protect against them - is important in order to protect the Keywords effectiveness and utility of social media. In this paper, we de- social bots, infection, user models fine a target to represent a user who has been singled out by a social bot attack, and a susceptible user as a user who has been infected by a social bot (i.e. the user has in some way 1. INTRODUCTION cooperated with the agenda of a social bot). This work sets Online social networks (OSN) like Twitter or Facebook are out to identify factors which help detecting users who are powerful instruments since they allow reaching millions of susceptible to social bot attacks. To gain insights into these users online. However, in the wrong hands they can also factors, we use data from the Social Bot Challenge 2011 and introduce three different groups of features: network features, behavioral features and linguistic features. In to- tal, we use 97 different features to first predict infections by training various classifiers and second aim to predict users’ level of susceptibility by using regression models. Copyright c 2012 held by author(s)/owner(s). Published as part of the #MSM2012 Workshop proceedings, Thus, unlike previous research, our work does not focus on available online as CEUR Vol-838, at: http://ceur-ws.org/Vol-838 #MSM2012, April 16, 2012, Lyon, France. detecting social bots in OSN, but on detecting users who are susceptible to their attacks. To the best of our knowledge, 41 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · this represents a novel task that has not been proposed or 3. THE SOCIAL BOT CHALLENGE tackled previously. Our work is relevant for researchers in- The Social Bot Challenge was a competition organized by terested in social engineering, trust and reputation in the Tim Hwang (and the WebEcologyProject). The competi- context of OSN. tion took place between January and February 2011. The aim was to have a set of competing teams developing social bots that persuade targets to interact with them - i.e., re- 2. RELATED WORK ply to them, mention them in their tweets, retweet them or Social bots represent a rather new phenomenon that has re- follow them. The group of targets consisted of 500 unsus- ceived only little attention so far. For example, Chu et al. [3] pecting Twitter users which were selected semi-randomly: use machine learning to identify three types of Twitter user all users had an interest in or tweeted about cats. The ma- accounts: users, bots and cyborgs (users assisted by bots). jority of targets exhibited a high activity level, that means They show that features such as entropy of posts over time, they tweeted more than once a day. We define a suscepti- external URL ratio and Twitter devices (usage of external ble user as a target that interacted (i.e., replied, mentioned, Twitter applications) give good indications for differentiat- retweeted or followed) at least once with a social bot. ing between distinct types of user accounts [1]. Work by [6] describes how honeypots can be used to identify spam 3.1 Rules profiles in OSN. They present a long term study where 60 Each team was allowed to create one lead bot (the only bot honeypots were able to harvest about 36.000 candidate con- allowed to score points) and an arbitrary number of support tent polluters over a period of 7 months. Based on the col- bots. The participating teams got points for every successful lected data they trained a classification model using fea- interaction between their lead bot and any target. One point tures based on User Demographics, User Friendship Net- was awarded for any target who started following a lead bot works, User Content and User History. Their results and and three points were awarded for any target who replied show that features which were most useful for differentiat- to, mentioned or retweeted a lead bot. ing between content polluters and legitimate users were User Friendship Network based features, like the standard devia- The following rules were announced for the game: tion of followees and followers, the change rate of the number of followees and the number of followees. In the context of the goals of this paper, related work on spam detection in • No humans are allowed during the game. That means OSN is as well relevant. For example, Wang et al. [14] pro- bots need to act in a completely automated way. pose a general purpose framework for spam detection across • Teams were not allowed to report other teams as spam multiple social networks. Unlike previous research, our work or bots to Twitter, but other countermeasures and does not focus on detecting spammers or social bots in OSN, strategies to harm the opponents are allowed. but on detecting users who are susceptible to their attacks. • The existence of the game needs to remain a secret. Research about users’ online behavior in general represents That means bots are not allowed to inform others about another field that is closely related to our research on user the game. susceptibility. Predicting users’ interaction behavior (i.e., who replies to whom, who friends whom) in online media • The code needs to be published as open source under has been previously studied in the context of email com- the MIT license. munications [12] and more recently in the context of social • Teams are allowed to collaborate. That means they media applications. For example, Cheng et al. [2] consider are allowed to talk to each other and exchange their the problem of reciprocity prediction and study this prob- code. lem in a communication network extracted from Twitter. The authors aim to predict whether a user A will reply to a message of user B by exploring various features which char- There was a period of 14 days during which teams were acterize user pairs and show that features that approximate allowed to develop their social bots. Afterwards the game the relative status of two nodes are good indicators of reci- started on the Jan 23rd 2011 (day 1) and ended Feb 5th procity. Work described in [10] considers the task of predict- 2011 (day 14). During this period, bots were autonomously ing discussions on Twitter, and found that certain features active for the first 7 days. At the 30th of January (day 8) were associated with increased discussion activity - i.e., the the teams were allowed to update their codebase and change greater the broadcast spectrum of the user, characterized by strategies. After this optional update, the bots continued in-degree and list-degree levels, the greater the discussion to be autonomously active for the remaining time of the activity. The work of Hopcroft et al. [4] explores follow- challenge back-behavior of Twitter users and find strong evidence for the existence of the structural balance among reciprocal re- 3.2 Participants and Challenge Outcome lationships. In addition, their findings suggest that different The following three teams competed in the challenge. types of users reveal interesting differences in their follow- back behavior: the likelihood of two elite users creating a reciprocal relationships is nearly 8 times higher than the • Team A - @sarahbalham The lead bot sarahbalham likelihood of two ordinary users. Our work differs from the claims to be a young woman who grew up on the coun- related work discussed above by focusing on modeling and tryside and just moved to the city. This team didn’t predicting the behavior of users who are currently attacked construct a bot-network,but only used one lead bot. by social bots. This lead bot created 143 tweets, which is rather low 42 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · in comparison to the other teams, and used only a few @replies and hashtags. Despite low activity level this team could reach the highest number of mutual con- 80 nections, which is 119 connections. Overall the team only collected 170 points, since only 17 interactions with targets were counted. 60 • Team B - @ninjzz The woman impersonated by this #users susceptible bot - ninjzz - doesn’t provide much personal infor- 40 mation, only that she is a bit shy and looking for friends on Twitter. Ninjzz was supported by 10 other bots, which also created some tweets. This bot was rather defensive in the first round of the challenge, 20 but changed the strategy on day 8 and acted in a much more aggressive way in the second part of the challenge. Overall this team created 99 mutual con- 0 nections and 28 interactions, and therefore collected 2 4 6 8 10 12 14 days 183 points. • Team C - @JamesMTitus The bot JamesM T itus Figure 1: This figure shows for each day of the chal- claims to be a 24 old guy from New Zealand, who is lenge the number of users who were infected - i.e., new on Twitter, and a real cat enthusiast. Team C they interacted with a social bot for the first time. with their bot JamesM T itus won the game by col- lecting 701 points, with 107 mutual connections and 198 interactions. This team had five support bots, who only created social connections but did not tweet at all. The team picked a very aggressive strategy, tweeted a lot and also made extensively use of @replies, retweets and hashtags. 4. DATASET The authors of this paper were not involved in nor did they participate in the design, setup or execution of this chal- lenge. The dataset used for this analysis was provided by the WebEcologyProject after the challenge took place. Ta- ble 1 provides a basic description of this dataset. Figure 1 shows infections over time - i.e., it depicts on which day of the challenge targets interacted with social bots for the first time. One can see from this figure that at the beginning Figure 2: This figure shows when users were infected of the challenge - on day 2 - already 87 users became in- and how many tweets they have published before - fected. One possible explanation for this might be the usage i.e. between the start of the challenge and the day of auto-following features which some of the targets might they were infected. have used. One can see from Figure 2 that for the users who became infected at an early stage of the challenge, we do not have many tweets in our dataset. This is a limitation of Table 1: Description of the Social Bot Challenge dataset the dataset we use, which includes only tweets authored be- Susceptible 202 tween the 23th of January and the 5th of February and social Non-Susceptible 298 relations which where existent at the this point in time or Mean Tweets per User 146.49 Mean Nr of Follower/Followees per User 8.5 created during this time period. Since most of our features require a certain amount of tweets a user authored in order to contain meaningful information about the user, we de- cided to remove all users who became susceptible before day users via a binary classification task, and (ii) we aim to pre- 7. While this means we loose 133 susceptible users as sam- dict the level of susceptibility per infected user. To this end ples for our experiments, we believe (i) that the remaining we explore three distinct feature sets that can be leveraged 76 susceptible users and 298 non-susceptible users are suffi- to describe the susceptibility of users: linguistic features, cient to train and test our classifiers and regression models behavioral features and network features. and (ii) that eliminating those users that might have used an auto-follow feature is a good since they are less interesting For all targets, we computed the features by taking all tweets to study from a susceptibility viewpoint. they authored (up to the point in time where they become infected) and a snapshot of the targets’ follow network which 5. FEATURE ENGINEERING was as recorded at the 26th of January (day 4). Since we We adopt a two-stage approach to modeling targets’ suscep- only study susceptible users who became infected on day 7 tibility to social bot attacks: (i) We aim to identify infected or later, this follow network snapshot does not contain any 43 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · future information (such as tweets or social relations which 5.2.1 Hub and Authority Score were created after a user became infected) which could bias Using Kleinberg’s HITS algorithm [5], we calculated the au- our prediction results. Based on this aggregation of tweets, thority as well as the hub score for all targets in our net- we constructed the interaction and retweet network of each works. A high authority-score indicates that a node (i.e., user by analyzing their reply and retweet interactions. a user) has many incoming edges from nodes with a high hub score, while a high hub-score indicates that a node has many outgoing edges to nodes with high authority scores. 5.1 Linguistic Features For example, in the retweet network a high authority score Previous research has established that physical and psycho- indicates that a user is retweeted by many other users who logical functioning are associated with the content of writ- retweeted many users, while a high hub score indicates that ing [8]. In order to analyze such content in an objective and the user retweets many others who are as well retweeted by quantifiable manner, Pennebaker and colleagues developed many others. a computer based text-analysis program, known as the Lin- guistic Inquiry and Word Count (short LIWC) [11]. LIWC uses a word count strategy searching for over 2300 words or 5.2.2 In- and Out-Degree word stems within any given text. The search words have A high in-degree indicates that a node (i.e., a user) has many previously been categorized by independent judges into over incoming edges, while a high out-degree indicates that a 70 linguistic dimensions. These dimensions include standard node has many outgoing edges. For example, in the interac- language categories (e.g., articles, prepositions, pronouns in- tion network a high in-degree means that a user is retweeted, cluding first person singular, first person plural, etc.), psy- replied, mentioned and/or followed by many other users, chological processes (e.g., positive and negative emotion cat- while a high out-degree indicates that the user retweets, egories, cognitive processes such as use of causation words, replies, follows and/or mentions many other users. self-discrepancies), relativity-related words (e.g., time, verb tense, motion, space), and traditional content dimensions 5.2.3 Clustering Coefficient (e.g., sex, death, home, occupation). The clustering coefficient is defined as the number of actual links between the neighbors of a node divided by the number In this work we use those 70 linguistic dimensions1 as lin- of possible links between the neighbors of that node. A high guistic features and compute them based on the aggregation clustering coefficient of a node indicates that a node has a of tweets authored by each target. Due to space limits we central position in the network. For example, in the follow do not describe all 70 features in detail, but explain those network a high clustering coefficient indicates that the users which seem to be relevant for modeling the susceptibility of a user follows or is followed by, are also well connected via users in the result section. follow relations. 5.2 Network Features 5.3 Behavioral Features To study the predictiveness of network theoretic features we In our own previous work [13], we introduced a number of be- constructed the following three directed networks from the havioral or structural measures that can be used to charac- data. In each of the networks nodes correspond to targets, terize user streams and reveal structural differences between while edges are constructed differently. them. In the following, we describe some of those measures and elaborate how we use them to gauge the susceptibility of targets. • User-Follower - A network representing the target - follower structure in Twitter. There exists an directed 5.3.1 Conversational Variety edge from user A to user B if the user A is followed by The conversational variety per message CV pm represents B. the mean number of different users mentioned in one mes- sage of a stream and is defined as follows: • Retweet - A network representing the retweet behavior of targets. In this network there exists an edge from |Um | CV pm = (1) A to B if user A retweeted a message from B. |M | To measure the number of users being mentioned in a stream • Interaction - The third network captures the general (e.g., via @replies or slashtags), we introduce |Um | for um ∈ interaction behavior of targets. There exists an edge Um . A high conversational variety indicates that a user talks from user A to user B if user A either mentioned, with many different users. replied, or retweeted user B. 5.3.2 Conversational Balance For each point in time, we constructed a retweet and in- To quantify the conversational balance of a stream, we de- teraction network by analyzing all tweets users published fine an entropy-based measures, which indicates how evenly before that timestamp. The follower-network is based on a balanced the communication efforts of a user is distributed snapshot which was as recorded at the 26th of January (day across his communication partners. We define the conversa- 4). tional balance of a stream as follows: � CB = − P (m|u) ∗ log(P (m|u)) (2) 1 http://www.liwc.net/descriptiontable1.php u∈Um 44 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · A high conversational balance indicates that the user talks 5.3.9 Informational Balance equally much with a large set of users, i.e. the distribution of The informational balance IB can, in the same way as the conversational messages per user is even. Therefore a high conversational balance, be defined as an entropy-based mea- score indicates that it is hard to predict with whom a user sures which quantifies how predictable a link is on a certain will talk next. stream. A high informational balance indicates that a user posts many different links as part of her tweeting behavior. 5.3.3 Conversational Coverage From the number of conversational messages |Mc | - i.e., mes- 5.3.10 Informational Coverage sages which contain an @reply - and the total number of From the number of informational messages |Mi | and the messages of a stream |M |, we can compute the conversa- total number of messages of a stream |M | we can compute tional coverage of a user stream, which is defined as follows: the informational coverage of a stream which is defined as |Mc | follows: CC = (3) |Mi | |M | IC = (7) |M | A high conversational coverage indicates that a user is using Twitter mainly for a conversational purpose. A high informational coverage indicates that a user is using Twitter mainly to spread links. 5.3.4 Lexical Variety To measure the vocabulary size of a stream, we introduce 5.3.11 Temporal Variety |Rk |, which captures the number of unique keywords rk ∈ Rk The temporal variety per message TPVpm of a stream is de- in a stream. For normalization purposes, we include the fined via the number of unique timestamps of messages |T P | stream size (|M |). The lexical variety per message LVpm (where timestamps are defined to be unique on an hourly represents the mean vocabulary size per message and is de- basis), and the number of messages |M | in a stream. The fined as follows: temporal variety is defined as follows: |Rk | |T P | LV pm = (4) T P V pm = (8) |M | |M | 5.3.5 Lexical Balance 5.3.12 Temporal Balance The lexical balance LB of a stream can be defined, in the The temporal balance TPB can, in the same way as the so- same way as the conversational balance, via an entropy- cial balance, be defined as an entropy-based measure which based measure which quantifies how predictable a keyword quantifies how balanced messages are distributed across these is on a certain stream. message-publication-timestamps. A high temporal balance indicates that a user is tweeting regularly. 5.3.6 Topical Variety To compute the topical variety of a stream, we can use ar- 5.3.13 Question Coverage bitrary surrogate measures for topics, such as the result of From the number of questions |Q| and the total number of automatic topic detection or manual labeling methods. In messages of a stream |M | per stream we can compute the the case of Twitter we use the number of unique hashtags question coverage of a stream which is defined as follows: rh ∈ Rh as surrogate measure for topics. The topical va- |Q| riety per message T V pm represents the mean number of QRpm = (9) topics per message and is defined as follows: |M | |Rh | A high question coverage indicates that a user is using Twit- T V pm = (5) ter mainly for gathering information and asking questions. |M | 5.3.7 Topical Balance 6. EXPERIMENTS The topical balance T B can, in the same way as the con- In the following, we attempt to develop models that (i) iden- versational balance, be defined as an entropy-based measure tify susceptible users (whether a user becomes infected or which quantifies how predictable a hashtag is on a certain not) and (ii) predict their level of susceptibility (the extent stream. A high topical balance indicates that a user talks to which a user interacts with a social bot). We begin by about many different topics to similar extents. That means explaining our experimental setup before discussing our find- the user has no topical focus and it is difficult to predict ings. about which topic he/she will talk next. 6.1 Experimental Setup 5.3.8 Informational Variety For our experiments, we considered all targets of the Social In the case of Twitter we define informational messages to Bot Challenge, and divided them into those who were not contain one or more links. To measure the informational va- infected (non-susceptible users) and those who were infected, riety of a stream, we can compute the number of unique links i.e. started interacting with a bot within day 7 or later in messages of a stream |Rl | for rl ∈ Rl . The informational (susceptible users). For each of those targets we constructed variety per message IV pm is defined as follows: the features as described in section 5 and normalized them. |Rl | Identifying the most susceptible users in a given community IV pm = (6) |M | is often hindered by including users that are not susceptible 45 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · at all. We alleviate this problem by first aiming to model the differences between susceptible and non-susceptible users in Table 2: Comparison of classifiers’ performance Susceptible Non-Susceptible a binary classification task. Once susceptible users have Model F1 Rec Prec F1 Rec Prec Overall been identified, we can then attempt to predict the level random 0.5 0.5 0.5 0.5 0.5 0.5 0.5 of susceptibility for each infected user. Therefore we per- gbm 0.71 0.70 0.74 0.70 0.74 0.68 0.71 glmnet 0.69 0.75 0.67 0.73 0.72 0.77 0.71 formed the following two experiments. rpart 0.64 0.56 0.78 0.44 0.60 0.36 0.54 pls 0.67 0.69 0.68 0.68 0.71 0.70 0.68 knn 0.70 0.71 0.71 0.72 0.75 0.71 0.71 rf 0.68 0.72 0.66 0.70 0.70 0.74 0.69 1. Predicting Infections The first experiment sought to identify the factors that are associated with infections. To this end, we performed a binary classification task using 6 different classifier, partial least square regres- To understand which features are most predictive, we ex- sion (pls), generalized boosted regression (gbm), k- plore the importance of different features by using our best nearest neighbor (knn), elastic-net regularized gener- performing model. Table 2 shows the importance ranking of alized linear models (glmnet), random forest (rf) and features using the area under the ROC curve as a ranking regression trees (rpart). We divided our dataset into criterion. a balanced training and test set - i.e. in each training and test split we had the same number of susceptible One can see from Table 3 that the most important fea- and non-susceptible users. We performed a 10-cross- tures for differentiating susceptible and non-susceptible is fold validation and selected the best classifier to fur- the out-degree of a user node in the interaction network. ther explore the most predictive features, and plotted Figure 3 shows that susceptible users tend to actively inter- ROC curves for each feature. The ROC curve is a act (i.e., retweet, mention, follow or reply to a user) with method to visualize the prediction accuracy of ranking more users than non-susceptible users do on average. That functions showing the number of true positives in the means, susceptible users tend to have a larger social net- results plotted against the number of results returned. work and/or communication network. One possible expla- We use the area under the ROC curve (AUC) as the nation for that is that susceptible users tend to be more measure of feature importance. active and open and therefore easily create new relations with users. Our results also show that susceptible users 2. Predicting Levels of Susceptibility After identifying also tend to have a high in-degree in the interaction net- susceptible users, it is interesting to rank them accord- work, which indicates that most of their interaction efforts ing to their probability of being susceptible for a bot are successful (i.e., they are followed back by users they fol- attack, because one usually wants to identify the most low and/or get replies/mentions/retweets from users they susceptible users, i.e. those who are most in need for reply/mention/retweet). security measures and protection. In this experiment we aim to predict the susceptibility level of infected Further, susceptible users tend to use more verbs (especially users and identify key features which are correlated present tense verbs, but also past tense verbs and auxiliary with users’ susceptibility levels. We define the suscep- verbs) and use more personal pronouns (especially first per- tibility level of an infected user as the number of times son singular but also third person singular in their tweets. a user followed, mentioned, retweeted or replied to a This suggest that susceptible users tend to use Twitter to bot. report about what they are currently doing. We divided our dataset (consisting of infected users only) into a 75/25% split, fit a regression model using Interestingly, our results also show that susceptible users the former split and applied it to the latter. We used have a higher conversational variety and coverage than non- regression trees to model the susceptibility level of in- susceptible users, which means that susceptible users tend fected users, since they can handle strongly nonlinear to talk to many different users on Twitter and that most of relationships with high order interactions and different their messages have a conversational purpose. This indicates variable types. The resulting model can be interpreted that susceptible users tend to use Twitter mainly for a con- as a tree structure providing a compact and intuitive versational purpose rather than an informational purpose. representation. Further, susceptible users also have a higher conversational balance which indicates that they do not focus on few con- 7. RESULTS & EVALUATION versation partners (i.e., heavily communicate with a small circle of friends) but spend an equal amount of time in com- 7.1 Predicting Infections municating with a large variety of users. Its suggests again As a first step, we would like to compare the performance that susceptible users are more open to communicate with of different classifiers for this task and compare them with a others, also if they are not in their closed circle of friends. random baseline classifier. We used all features and trained six different classifiers: partial least square regression (pls), Our results further suggest that susceptible users show more generalized boosted regression (gbm), k-nearest neighbor affection - i.e. they use more affection words (e.g., happy, (knn), elastic-net regularized generalized linear models (glm- cry), especially words which expose positive emotions (e.g., net), random forests (rf) and regression trees (rpart). One love, nice) - and use more social words (e.g., mate, friend ) can see from table 2 that generalized boosted regression than non-susceptible users, which might explain why they models (gbm) perform best, since they have the highest ac- are more open to interact with social bots. Susceptible users curacy. also tend to use more motion words (e.g., go, car ), adverbs 46 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · (e.g., really, very), exclusive words (e.g., but, without) and our dataset is too small for fitting the model (we only have negation words (e.g., no, not, never ) in their tweets than 76 samples and 97 features). Another potential reason is non-susceptible users. It indicates again that susceptible that our features do not correlate with susceptibility scores users tend to use Twitter to talk about their activities and of users. We leave the task of elaborating on this problem emotionally communicate. to future work. To summarize, our results suggest that susceptible users tend to use Twitter mainly for a conversational purpose 1 negemo (high conversational coverage) and tend to be more open and social since they communicate with many different users (high out-degree and in-degree in the interaction network < 0.40068 >= 0.40068 and high conversational balance and variety), use more so- 2 cial words and show more affection (especially positive emo- temp_bal tions) than non-susceptible users. < 0.37025 >= 0.37025 Table 3: Importance ranking of the top features us- 3 death ing the area under the ROC curve (AUC) is used as ranking criterion. The importance value is propor- tional to the most important feature which has an < −0.16389 >= −0.16389 Node 4 (n = 25) Node 5 (n = 7) Node 6 (n = 9) Node 7 (n = 15) importance value of 100%. ● 8 8 8 8 ● Feature Importance out-degree (interaction network) 100.00 6 6 6 6 verb 98.01 conversational variety 96.93 4 4 4 4 conversational coverage 96.65 ● ● present 94.66 2 2 2 2 affect 90.15 personal pronoun 89.71 first person singular 89.27 conversational balance 87.28 motion 87.28 past 86.56 Figure 4: Regression tree model fitted to the sus- adverb 86.20 ceptibility scores of our training split users. The pronoun 84.41 negate 84.33 tree-structure shows based on which features and positive emotions 83.25 thresholds the model selects branches and the box third person singular 82.38 plots indicate the distribution of the susceptibility social 82.02 scores of users in each branch of the tree. exclusive 81.86 auxiliary verb 81.70 in-degree (interaction network) 81.66 8. CONCLUSIONS AND OUTLOOK In this work, we studied susceptibility of users who are under attack from social bots. To this end, we used data collected 7.2 Predicting Levels of Susceptibility by the Social Bots Challenge 2011 organized by the WebE- To model the susceptibility level of users, we use regression cologyProject. Our analysis aimed at (i) identifying suscep- trees and aim to identify features which correlate with users’ tible users and (ii) predicting the level of susceptibility of susceptibility levels. To gain insights into the factors which infected users. We implemented and compared a number of correlate with high or low susceptibility levels of a user, we classification approaches that demonstrated the capability inspect the regression tree model which was trained on 75% of a classifier to outperform a random baseline. of our data. One can see from Figure 4 that users who use more negation words (e.g. not, never, no) tend to interact Our analysis revealed that susceptible users tend to use more often with bots, which means they have a higher sus- Twitter mainly for a conversational purpose (high conversa- ceptibility level. Further, users who tweet more regularly tional coverage) and tend to be more open and social since (i.e. have a high temporal balance) and users who use more they communicate with many different users (high out- and words related with the topic death (e.g. bury, coffin, kill) in-degree in the interaction network and high conversational tend to interact more often with bots than other susceptible balance), use more social words and show more affection (es- users. pecially positive emotions) than non-susceptible users. Al- though finding that active users are also more susceptible for One can see from Figure 4 that the structure of the learned social bot attacks does not seem to be too surprising, it is an tree is very simple which means that our features only allow intriguing finding in itself as one would assume that users differentiating between rather lower and rather high suscep- who are more active socially would develop some kind of tibility scores. For a more finer-grained susceptibility level social skills or capabilities to distinguish human users from prediction our approach is of limited utility. Also the rank social bots. This is obviously not the case and suggests that correlation of users given their real susceptibility level and attacks of social bots can be effective even in cases where their predicted susceptibility level and the goodness of fit of users have experience with social media and are highly ac- the model is rather low. One potential reason for that is that tive. 47 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · out−degree verb conv variety conv coverage present past adverb pronoun negate positive emotion 2.5 1.5 2.0 2.0 3 2.0 1.5 2.0 2.0 1.5 1.5 1.5 1.5 2 1.0 1.0 1.5 2 1.5 1.0 1.0 1.0 1.0 1.0 1.0 0.5 1 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0 0.0 0.0 −0.5 −1.5 −1.0 −0.5 0.0 −0.5 −0.5 0 −0.5 −1 −1.0 −1.0 −1.0 −1.0 −0.5 −1.0 −1.5 −1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 affect personal pronoun i conv balance motion she/he social exclusive ausiliary verb in−degree 2.0 2.0 2.0 2.0 2.0 1.0 2 2.0 1.5 1.5 1.5 1.5 2 1.5 1.5 0.5 1.0 1.0 1.0 1.0 0.5 1 1.0 1.0 0.5 0.5 0.5 1 0.5 0.5 0.0 0.0 0.0 0.5 0.0 0 0.0 0.0 −0.5 −1.5 −1.0 −0.5 0.0 0 0.0 −0.5 −1.0 −0.5 −1.0 −0.5 −0.5 −1 −0.5 −1.0 −0.5 −1 −1.5 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Figure 3: Box plots for the top 20 features according to the area under the ROC curve (AUC). Yellow boxes (class 0, left) represent non-susceptible users, red boxes (class 1, right) represent susceptible users. Differences between susceptible and non-susceptible users can be observed. While our work presents promising results with regard to ’11, pages 1137–1146, New York, NY, USA, 2011. the identification of susceptible users, identifying the level ACM. of susceptibility is a harder task that warrants more research [5] J. M. Kleinberg. Authoritative sources in a in the future. In general, the results reported in this work hyperlinked environment. In H. J. Karloff, editor, SODA, pages 668–677. ACM/SIAM, 1998. are limited to one specific domain (cats). In addition, all our [6] K. Lee, J. Caverlee, and S. Webb. Uncovering Social features are corpus-based and therefore the size and struc- Spammers : Social Honeypots + Machine Learning, ture of our dataset can have an influence on our results. pages 435–442. Number i. ACM, 2010. In conclusion, our work represents a first important step to- [7] D. Misener. Rise of the socialbots: They could be wards modeling susceptibility of users in OSN. We hope that inuencing you online. web, March 2011. our work contributes to the development of tools that help [8] J. Pennebaker, M. Mehl, and K. Niederhoffer. protect users of OSN from social bot attacks, and that our Psychological aspects of natural language use: Our exploratory work stimulates more research in this direction. words, our selves. Annual review of psychology, 54(1):547–577, 2003. [9] J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, Acknowledgments S. Patil, A. Flammini, and F. Menczer. Detecting and We want to thank members of the WebEcology project, espe- tracking the spread of astroturf memes in microblog cially Tim Hwang for sharing the dataset and Ian Pierce for streams. CoRR, abs/1011.3768, 2010. technical support. Claudia Wagner is a recipient of a DOC- [10] M. Rowe, S. Angeletou, and H. Alani. Predicting fForte fellowship of the Austrian Academy of Science. This discussions on the social semantic web. In Extended research is partly funded by the European Community’s Sev- Semantic Web Conference, Heraklion, Crete, 2011. enth Framework Programme (FP7/2007-2013) under grant [11] Y. R. Tausczik and J. W. Pennebaker. The agreement no. ICT-2011-287760. psychological meaning of words: Liwc and computerized text analysis methods. 2010. 9. REFERENCES [12] J. R. Tyler and J. C. Tang. When can i expect an [1] Y. Boshmaf, I. Muslukhov, K. Beznosov, and email response? a study of rhythms in email usage. In M. Ripeanu. The socialbot network. In Proceedings of Proceedings of the eighth conference on European the 27th Annual Computer Security Applications Conference on Computer Supported Cooperative Work, Conference, page 93. ACM Press, Dec 2011. pages 239–258, Norwell, MA, USA, 2003. Kluwer [2] J. Cheng, D. Romero, B. Meeder, and J. Kleinberg. Academic Publishers. Predicting reciprocity in social networks. In he Third [13] C. Wagner and M. Strohmaier. The wisdom in IEEE International Conference on Social Computing tweetonomies: Acquiring latent conceptual structures (SocialCom2011), 2011. from social awareness streams. In Proc. of the [3] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Semantic Search 2010 Workshop (SemSearch2010), Who is tweeting on twitter. In Proceedings of the 26th april 2010. Annual Computer Security Applications Conference on [14] D. Wang, D. Irani, and C. Pu. A social-spam - ACSAC10, page 21. ACM Press, Dec 2010. detection framework. In Proceedings of the 8th Annual [4] J. Hopcroft, T. Lou, and J. Tang. Who will follow you Collaboration, Electronic messaging, Anti-Abuse and back?: reciprocal relationship prediction. In Spam Conference on, pages 46–54. ACM Press, Sep Proceedings of the 20th ACM international conference 2011. on Information and knowledge management, CIKM 48 · #MSM2012 · 2nd Workshop on Making Sense of Microposts ·