Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina A Novel Metric for Assessing User Influence based on User Behaviour Antonela Tommasel and Daniela Godoy ISISTAN Research Institute, CONICET-UNCPBA, Tandil, Buenos Aires, Argentina Abstract important part of the daily life of millions of users scattered across the world. As a result, the study of users’ influence in People’s influence has been the subject of study of the context of micro-blogging platforms arises as an import- several social and humanities disciplines. Lately, ant issue. the study of user’s influence in micro-blogging plat- Social influence or prestige can be defined as the potential forms arises as an important issue. Although social or ability of an individual to engage others in a certain act, or influence or prestige can be defined as the poten- to induce others to behave in a particular manner. In micro- tial or ability of an individual to engage others in a blogging sites, the definition traditionally relies on status at- certain act, or to induce others to behave in a partic- tributes such as the number of followers (i.e. the size of the ular manner, there is no global consensus on what influence group of a certain user), the number of re-tweets means to be an influential user. This work aims (i.e. the ability to generate attractive content to be distrib- at shedding some light on how to assess user in- uted), and the number of mentions (i.e. the ability to engage fluence by proposing a novel metric of user influ- other users in a conversation). However, having a high num- ence based on analysing user behaviour regarding ber of followers, which would imply a high level of popular- both content-based and topological factors. The ity, it is not sufficient for also being influential in terms of metric does not only consider each user individu- triggering social responses as retweets or mentions [Cha et ally, but also aims at assessing the interactions with al., 2010]. Moreover, the most influential users are influential his/her neighbourhood. The statistical analysis per- over several topics, but such influence is obtain only through formed confirmed that only analysing the topolo- a concentrated effort of posting tweets related to only one gical factors is not sufficient for accurately assess- topic. ing the influence of users. Instead the published content and its influence over the neighbourhood of In this context, analysing the behaviour of users regard- users has to be also analysed. A comparison with ing the diffusion of information can be useful for assess- a human assessment of user influence showed that ing their influence. Several authors have proposed charac- the factors considered by the proposed metric are terisations of users based on behavioural patterns observed truly relevant for assessing people’s influence. through not only topological features [Java et al., 2007; Krishnamurthy et al., 2008], but also social and content- related features [Tinati et al., 2012]. Such categorisations 1 Introduction analyse the number of published posts, the type of posts (ori- Several disciplines, such as sociology, communication, mar- ginal posts, replies in conversations, retweets), the number of keting and political sciences have tackle the study of people times that posts were retweeted, the proportion of retweeted and their influence [Rogers, 2003; Katz and Lazarsfeld, posts, the proportion of followers, and the number of interac- 2005]. The notion of people’s influence plays a crucial role in tions with the neighbourhood, among others. Consequently, businesses and in the functioning of societies. For example, it the influence of users can be estimated according to certain could modify the spread of fashion or voting patterns [Glad- behavioural patterns. For example, users who share a large well, 2000; Keller and Berry, 2003]. Furthermore, the study number of posts, which are highly retweeted, and also have of influence patters could help to understand how trends and more followers than followees could be regarded as highly innovations are adopted, and to the design of more effective influential users. On the other hand, users who rarely publish publicity campaigns [Cha et al., 2010]. The rapid growth and posts and have a larger proportion of followees than followers exponential usage of social digital media increased the pop- could be regarded as not influential. ularity of micro-blogging platforms, which have become an There are also several commercial metrics that claim to be able to assess the influence of users, such as Klout1 and Copyright c 2015 for the individual papers by the papers’ au- thors. Copying permitted for private and academic purposes. This 1 volume is published and copyrighted by its editors. http://klout.com/ 15 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina Kred2 , among others. However, they have received several highly retweeted suggesting their influence over their fol- critics and have been the focus of several controversies re- lowers. These users tend to interact with a limited and se- garding how the measurements are computed or the effect lected group of users, ensuring high quality relations with that spam-bots might have on the algorithms. As most of the them. They share the characterisation proposed for Inform- commercial measures do not publicly state how scores are ation Sources. computed, they are not accessible for scrutiny or reproduc- Amplifiers are those users who share ideas and opinions tion, which might compromise their trustworthiness [Gaffney posted by other users. They tend to have a greater number of and Puschmann, 2012]. followers than followees. They interact with Idea Starters Considering that there might be no consensus on what and share their ideas with a more visible audience. As a means to be an influential user, this work aims at shedding result, the majority of their posts correspond to retweets of some light on how to assess user influence by proposing a Idea Starters. Posts are also highly retweeted by their follow- novel metric based on analysing user behaviour regarding ers. They share the characterisation proposed for Information the patterns of information diffusion, i.e. it considers both Sources. content-based and topological factors. The metric does not Curators are those users who interact with both Idea only consider each user individually, but also aims at assess- Starters and Amplifiers by aggregating their ideas together, ing the interactions with his/her neighbourhood. Then, a stat- and helping to clarify the topic of conversation. They tend istical analysis is performed for comparing the novel metric to have a balanced number of followees and followers, and with traditional means for assessing user influence (such as In to interact with a large number of them. They lie in the bor- Degree or followee/follower ratio), commercial metrics and a der between Information Sources and Friends as they tend to human assessment of user influence. share a lot of content (as an Information Source) and to inter- The rest of this paper is organised as follows. Section 2 act with a large number of users (as a Friend). As the num- presents several characterisations of users regarding their role ber of interactions with other users increases, the Curator be- on the diffusion of information. Section 3 presents and haves as a Friend, whereas as the number of interactions with defines the proposed metric for estimating the influence of other users decreases, the Curator behaves as an Information Twitter users based on the patterns of user behaviour regard- Source. ing the information diffusion process. Section 4 describes the Commentators are those users who also share the ideas and analysis carried out using Twitter data. Section 5 discusses re- opinions of other users, but without interrupting the flow of lated research. Finally, Section 6 summarises the conclusions the original conversation or immersing in it. They only want drawn from this study and presents future lines of work. to share content and do not desire to be recognised by their posts. The main difference between Amplifiers and Comment- 2 Information-based User Characterisation ators is the impact that their content has over their social net- work, measured by the number of retweets received. As the Several studies [Java et al., 2007; Krishnamurthy et al., 2008] number of retweets increases the user behaves more as an have characterised users according to their behaviour in the Amplifier and less as a Commentator. They can be charac- information diffusion process by classifying them into three terised as Friends, as they tend to have a balanced number of categories: Information Sources or Broadcasters, Friends followees and followers. or Acquaintances, and Information Seekers. Information Viewers are those users who do not share nor publish posts. Sources are those users who have a greater proportion of They do not engage on conversations or retweet other posts. followers than followees (i.e. they are followed by more Instead, they read or consume large amounts of information. users than they follow) and also publish valuable and rel- They tend to have a larger number of followees than follow- evant content on a regular basis. Friends are those users ers. They share the characterisation proposed for Information who have a balanced number of followees and followers, Seekers. without necessarily implying the presence of reciprocal re- lationships. Finally, Information Seekers are those users who rarely publish content and have a greater proportion of fol- 3 Quantitatively Assessing User Influence lowees than followers, aiming at receiving updates. Further- Based on the defined characterisations and dimensions of user more, Tinati et al. [2012] characterised users into five di- behaviour, this section proposes novel definitions for quantit- mensions (Idea Starters, Amplifiers, Curators, Commentat- atively analysing the behaviour of users in each of the presen- ors and Viewers) according to their social and psychological ted dimensions, and then assessing their influence. The defin- behaviour, and how this behaviour affects their posting and itions of Commentators (those users who also share the ideas communication activity on Twitter. Each dimension can be and opinions of other users, but without interrupting the flow associated to the categories identified in [Java et al., 2007; of the original conversation or immersing in it) and Viewers Krishnamurthy et al., 2008]. More importantly, these charac- (those users who do not share nor publish posts) are omitted terisations can be used for assessing the influence of users: as they can be inferred from the scores of the other dimen- Idea Starters are those users who are highly engaged with sions. For example, a low score in the Amplifier dimension the media. As they tend to start conversations, the major- could indicate the presence of a Commentator. On the other ity of their posts correspond to original content and to be hand, low scores in both Idea Starter and Amplifier dimen- sions could indicate the presence of a Viewer. In all cases, the 2 http://kred.com/ scores are constrained to the interval [0, 1], and corrections 16 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina are applied in order to avoid undetermined values. 3.3 Curator They are characterised for interacting with a greater number 3.1 Idea Starter of users than those characterised by the other dimensions. By Idea Starters are characterised for posting a greater propor- default any user can interact with any other user, regardless tion of original content (T weetsORIGIN AL ) than the other whether they are actually followers of that other user. Equa- dimensions. Equation 1 not only assesses the proportion of tion 3 assesses not only the number of interactions with other original posts (first part), but also the impact of those posts in users (first part), but also to what extend a user interacts only the neighbourhood of the user (second part). with his/her neighbourhood. |T weetsORIGIN AL {T weetsORIGIN AL RT >=µ−σ}| |Interactions∈{F ollowers∪F ollowees}| |T weets| |Interactions| ∗ P (1) (3) ∗ T weets ORIGIN AL RT |ReT weets| |Interactions∈{F ollowers∪F ollowees}| |F ollowers|+|F ollowees| The first part considers the ratio between the number The first part considers the ratio between those inter- of original tweets with a number of retweets superior to actions that belong to either the follower or followee list the inferior limit of the normal distribution of retweets (Interactions ∈ {F ollowers ∪ F ollowees}), and the total (|T weetsORIGIN AL {T weetsORIGIN AL RT >= µ − σ}|, number of interactions (Interactions). Then, the second where T weetsORIGIN AL RT is the number of retweets part considers the proportion of users with whom a certain that T weetsORIGIN AL has received, and µ and σ represent user interacts regarding the size of the neighbourhood. The the arithmetic mean and standard deviation of the retweet higher the number of interactions, the higher the score, and distribution, respectively) and the total number of published thus the less the user behaves as an Information Source. tweets (|T weets|). The restriction imposed on the number of retweets assesses whether the received retweets are uni- 3.4 Follower/Followee Ratio formly distributed over all published tweets or over a small In addition to the content-related dimensions, a topological proportion of them. The second part assesses the impact factor can also be considered. The content-related dimen- that posts have on the neighbourhood of the user, which is sions do not consider the size of the neighbourhood of a user. measured as the ratio P between the retweets that the original As a result, two users might achieve the same score but have a content received ( T weetsORIGIN AL RT ) and the total totally different neighbourhood. In other words, it is more im- number of retweeted tweets (|ReT weets|). The higher the portant a user with a high content-related score and a greater score, the more the user behaves as an Idea Starter, and thus neighbourhood engaged in his/her content (i.e. a greater as an Information Source. number of followers) than a user with a high content-related score but a smaller neighbourhood engaged in his/her con- 3.2 Amplifier tent. Consequently, the Follower/FolloweeRatio (F FRatio ), They are characterised for posting a greater proportion of is proposed to leverage the importance of the neighbourhood retweeted content (T weetsRT ), engage on conversations size (Equation 4). (T weetsREP LY ) or even start conversations by mentioning other users (T weetsM EN T ION ). Equation 2 assesses not |F ollowers| only the interaction between a user and his/her social network (4) |F ollowers| + |F ollowees| (first part), but also the impact of those posts in such network (second part). 3.5 Information Source Index |T weetsRT |+|T weetsREP LY |+|T weetsM EN T ION | Based on the previous metrics, the Information Source Index |T weets| ∗ (IS) is defined for numerically characterising users according P P P T weetsRT RT + T weetsREP LY RT + T weetsM EN T ION RT to their behaviour. The metric denotes to what extent a user |ReT weets| (2) can be considered an Information Source or an Information The first part considers the ratio between the added num- Seeker. High values of IS denote users behaving as Inform- ber of retweeted content, conversations and tweets contain- ation Sources, whereas low values of IS denote users behav- ing mentions, and the total number of published tweets. ing as Information Seekers. For computing the IS index the The second part assesses the impact that posts considered Idea Starter, Amplifier and 1 − Curator are assigned equal in the first part have on the social network of the user, weight and thus combined by means of the arithmetic mean which is measured as the ratioPbetween the retweets re- (µIDAC ), as shown in Equation 5. ceived by thePretweeted content ( T weetsRT RT ), the con- µIDAC = Idea-Starter+Amplif3ier+(1−Curator) (5) versations P ( T weetsREP LY RT ) and the mentions of other users ( T weetsM EN T ION S RT ), and the total number of Then, the combination of the content-related dimensions retweets. Also, the second part aids in the accurate differenti- (i.e. µIDAC ) and the topological factor F FRatio are com- ation between Amplifiers and Commentators. A high score in bined by means of the Harmonic mean for defining the IS, as the second part might indicate the presence of an Amplifier, shown in Equation 6. As the content-based dimensions and whereas a low score might indicate the presence of a Com- the topology factor represent different aspects of user beha- mentator. The higher the score, the more a user behaves as viour, they are different kind of elements, and thus cannot be an Amplifier, and thus as an Information Source. combined by means of the arithmetic mean. Consequently, 17 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina the Harmonic mean is more adequate for computing the final 4.2 Data Analysis Settings score. Furthermore, the Harmonic mean is less biased to the Several data collections were created by manually selecting presence of small numbers or outliers. Twitter users who were considered influential or even popu- IS (uj ) = 2∗µ IDAC ∗F FRatio (6) lar according to the criteria presented by Twitter Counter4 and µIDAC +F FRatio WeFollow5 . Selected users were grouped according to their As Information Sources represent those users who are topic of influence. In total seven datasets were created: Ar- highly engaged with the media, publish valuable and relevant gentina (13 users), Miscellaneous (8 users), Music (8 users), content on a regular basis they could be considered influen- Politics (7 users), Sports (7 users), Technology (7 users) and tial users. Furthermore, they tend to engage a great audience Tv-Movies (10 users). For every user, all tweets, followees, of Amplifiers and Commentators who share and enrich their followers, favourite tweets and user account information were posts. Due to its relevance, their published content tends to retrieved. All the information was obtained by means of re- be highly retweeted, which also implies a high number of in- quests to the TwitterAPI 6 . Additionally, for each user his or teractions with their neighbourhood. Additionally, they tend her Kred and Klout score was obtained during March 2015 to be highly followed by Viewers. As a result, Information by means of the Kred 7 and Klout 8 APIs respectively. Sources meet all the requirements for being regarded as influ- Considering that there is no consensus on what means ential users. In this context, the influence of users could be to be an influential user [del Campo-Ávila et al., 2013; measured by means of the IS score. The higher the IS of a Gaffney and Puschmann, 2012], this work aims at analys- user, the higher the influential such user is supposed to be. ing the human perception of influence. The presented met- ric was compared not only to several commercial metrics, but 4 Data Analysis also to a human assessment of user influence. Undergraduate This section presents the experimental evaluation performed and graduate students from an Artificial Intelligence course at to assess the effectiveness of the proposed metric. Section 4.1 UNICEN University (Argentina) were asked to rank the sets presents other scores for measuring the influence of users in of users previously presented according to the perceived in- the context of social networks to which the presented metric fluence. The ranking task was materialised by means of a web was compared. Section 4.2 describes the data collections and site 9 in which the students were able to access a brief sum- data analysis settings regarding the human assessment of user mary of the profiles and latest tweets posted by each Twit- influence. Finally, Section 4.3 presents the results of the data ter user. All rankings were performed during April 2015. analysis performed. In total, 31 students ranked the users according to their per- ceived influence. In order to combine the rankings provided 4.1 Metrics used for Comparison by the students, Twitter users were assigned the mode of the In order to quantitatively assess the effectiveness of the pro- provided rankings. posed metric for measuring the influence of users, it was com- Once all users were ranked according to their influence pare to the scores of other related metrics. The first two are score in each metric, it is possible to quantify how the rank commercial metrics, whereas the last two are metrics that can of users varies across the different metrics. The correlation be easily computed with simple data obtained from the users’ between the different rankings was analysed by means of the profiles. Kendall coefficient [Kendall, 1938], which is a statistic used Klout was launched in 2008. It provides a measure- for measuring the association between two measured quant- ment of the online influence of users by combining inform- ities in the form of lists or rankings. The correlation takes a ation extracted from Twitter, Facebook, LinkedIn, Instagram, value between -1 and 1 so that the higher the score, the higher Google+, Flickr, Blogger and Foursquare, among others. It the agreement between the two rankings. A score equal to 0 is based on three fundamental principles: True Reach (how indicates that the two rankings are independent. The correla- many users a certain user influences), Amplification (how tion is analysed for the total number of Twitter users in each much a certain user influence other users), and Network Im- dataset. pact (the influence of networks of users). 4.3 Analysis of Influence Metrics Kred was launched in 2012. It measures social influence Figure 1 shows the correlation among the different analysed and outreach in Twitter and Facebook, aiming at assessing the metrics. As it can be observed, in all cases the correlation trust and generosity of users. Social influence is measured by between rankings depends on the dataset under considera- assessing the retweets, replies, mentions and new followees a tion. This could indicate that the topic that users publish user has, i.e a user receives social influence points every time about might be an important factor to consider in the ana- people interacts with his or her content. lysis. As regard the commercial metrics (Figures 1b and 1c), In Degree. Computes the influence of a user as the number their highest correlations values were found for the Techno- of his/her followers. This metric is currently used by many logy dataset, where the correlation between KloutScore and third-party services, such as TwitterHolic3 . 4 Follower/Followee Ratio. Computes the influence of a user http://twittercounter.com/ 5 as the ratio between their followers and followees. A high 6 http://wefollow.com score indicates that the user has a higher proportion of fol- 7 https://api.twitter.com/ https://developer.peoplebrowsr.com/kred/ lowers than followees. 8 https://klout.com/s/developers/home/ 3 9 http://twitaholic.com/ https://sites.google.com/site/influenciatwitterusers/ 18 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina KredScore was higher than 0.8. On the contrary, the low- 1 est correlation among the commercial metrics was found for 0.8 the Miscellaneous dataset. However, only for the Technology 0.6 dataset the p-values were lower than 0.05, which indicates Correlation Score 0.4 that for all the other datasets the null hypothesis cannot be 0.2 rejected and thus, it can be stated that both commercial met- 0 rics are actually independent. These results reinforce the idea -0.2 that there is no consensus between the commercial metrics on -0.4 how the influence of users is computed. -0.6 For most datasets, both commercial metrics were not cor- Argentina Miscellaneous Music Politics Sports Technology Tv-Movies related with the metric presented in this work. These results FF-ratio KloutScore Human Assessment could imply that the content-based and topological dimen- In Degree KredScore sions the IS calculation is based upon are not regarded in the (a) IS index Vs. All same manner or with the same importance by the commer- cial metrics. Moreover, the commercial metrics could be also 1 based on a different set of dimensions or features. The only 0.8 exception was for the Politics dataset in which the highest 0.6 correlation value corresponded to the IS. Additionally, the Correlation Score degree of correlation between the commercial metrics and the 0.4 FF-ratio and In Degree also depended on the considered data- 0.2 set. For example, regarding KloutScore, the highest correla- 0 tion with In Degree was found for the Miscellaneous dataset, -0.2 whereas the highest statistically significant correlation with the FF-ratio was found for the Technology dataset. Interest- -0.4 Argentina Miscellaneous Music Politics Sports Technology Tv-Movies ingly, for the Technology dataset, the correlation between In IS In Degree Human Assessment Degree and KloutScore was lower than for the FF-ratio, which FF-ratio KredScore could indicate that in such topic, it is not highly important the (b) Klout Vs. All actual number of followers, but the proportion of followers regarding the number of followees. 1 As regards the IS (Figure 1a), for three datasets (Argentina, 0.8 Miscellaneous and Technology) the highest correlation values 0.6 corresponded to the human assessment. This could indicate Correlation Score 0.4 that the human users agree with the criteria considered by the 0.2 proposed metric for measuring influence. On the contrary, for two datasets (Sports and TV-movies) the highest correl- 0 ation values corresponded to the KloutScore. Furthermore, -0.2 for those datasets the correlations to the other metrics were -0.4 negative. Note that, when analysing the correlation between -0.6 Argentina Miscellaneous Music Politics Sports Technology Tv-Movies FF-ratio, the human assessment and every other metric for those datasets, results were similar. As a result, it can be infer IS FF-ratio In Degree KloutScore Human Assessment that, in those cases, the human assessment of influence was (c) Kred Vs. All mostly guided by topological factors, such as the number of followees, and not by the content they post or the impact that 1 such content has in the form of retweets. The highest over- all correlations were found for the Politics dataset, being the 0.8 most statistically significant the ones with KloutScore and the 0.6 Correlation Score human assessment. Conversely, the correlation with the In 0.4 Degree was statistically insignificant. These results highlight 0.2 the importance of not only considering the topological links, but also the published content and its impact. 0 Regarding the human assessment (Figure 1d), the highest -0.2 overall correlations were found for the Miscellaneous and -0.4 Technology datasets with the FF-ratio, which could imply the Argentina Miscellaneous Music Politics Sports Technology Tv-Movies preference for users with a higher proportion of followers. IS FF-ratio In Degree KloutScore KredScore However, this could be also caused by human users not being familiar with the Twitter users they had to analyse. It is worth (d) Human Assessment Vs. All mentioning that for the dataset Argentina, the highest correl- ation value was found for the IS. Furthermore, for this dataset Figure 1: Correlations between metrics across the different the correlations with both topological metrics were negative, datasets 19 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina which reinforced the fact that topological factors are not suf- including number of followers and retweets, PageRank, ficient for assessing the influence of users regarding their real HITS [Kleinberg, 1999]. According to the authors, all the impact or influence in their neighbourhood. These results are other ranking schemes were outperformed by TuRank as they of great importance as the human users were highly familiar only consider topological information, suggesting the import- with all the users in the dataset, and thus these rankings can ance of considering also content. be regarded as the most accurate ones. As regards commercial metrics, Messias et al. [2013] In summary, as in most cases the analysed metrics were not found that they might be vulnerable and easy to manipulate. highly correlated, results highlighted the fact that there is no The authors developed bot accounts, which were able to inter- consensus on how the scores are computed and that defining act with real users by following them or posting tweets about user influence cannot be considered a trivial task. Moreover, interesting topics by following different patterns of followee in several cases the influence assessment of the different met- selection and posting activity. Results showed that bot ac- rics proved to be independent from each other. Furthermore, counts were able to become influential by following simple results seemed to indicate that among the different topics strategies, reaching similar or higher scores than celebrities might not be an uniform consensus regarding what means to or individuals with great reputation. These results imply that be influential, which further remarks the fact that there is no the commercial measures should review their algorithms to unique definition of user influence. Finally, results showed avoid being influenced by automatic activity. Finally, del that in specific topics (for example Music and Sports) the ap- Campo-Ávila [2013] compared the scores of Klout, PeerIn- preciation of human users might be related mostly to the pop- dex and TwitterGrader. They found that the TwitterGrader ularity of users measured by means of topological factors. is not highly correlated with the other two metrics, whereas Klout and PeerIndex are highly correlated, as a result the fea- 5 Related Work tures considered for measuring the influence of users must vary for each metric. Furthermore, the authors stated that, Cha et al. [2010], compared three measures of user influence: unlike Klout and PeerIndex, TwitterGrader is mainly focused In Degree, Retweet and Mention influence. Results showed on network topology. a strong correlation between the Retweet and Mention influ- ence. However, In Degree was not strongly correlated to the other two measures, which could imply that the most connec- 6 Conclusions ted users are not necessarily the ones that are most capable of engaging others in conversations or on spreading their tweets. This work aimed at shedding some light on how to assess Also, Kwak et al. [2010] compared three measures of in- user influence by proposing a novel metric based on user be- fluence: Followers, Retweets and PageRank. Results agreed haviour regarding both content-based and topological factors. with those in [Cha et al., 2010]. The three rankings comprised The metric does not only consider each user individually, but different users, and only 4 users out of 20 appeared in the also aims to assess the interactions with their neighbourhood. three rankings. The authors found that the Retweet ranking The novel metric was compared to traditional means for differed from the other two, which could indicate that not ne- assessing user influence, commercial metrics and a human cessarily the most followed users are also the most retweeted assessment of user influence. The performed data analysis ones. Both works highlighted the fact that user influence can showed that there is no consensus on how the scores are com- be defined from different points of view, which are not neces- puted and that defining user influence cannot be considered sarily contradictory. a trivial task. For example, in most cases, the commercial On the other hand, several works [Weng et al., 2010; metrics proved to be independent from each other. Further- Yamaguchi et al., 2010] focused on creating more sophist- more, results seemed to indicate that even among the differ- icated approaches for measuring user influence by combin- ent topics there might not be an uniform consensus regarding ing both topological and content-based features. Weng et what means to be influential, which further remarks the fact al. [2010] presented TwitterRank, an extension of the PageR- that there is no unique definition of user influence, and that ank algorithm, that unlike the original algorithm, considers such definition might differ according to the analysed topic. the topological structure of the network and the topical sim- Interestingly, the presented metric achieved its highest correl- ilarity between users. Experimental results showed that Twit- ations with the human assessment of influence, which might terRank was able to improve the algorithm used by Twitter, indicate that the factors considered by the IS are truly relevant and both the original PageRank and Topic-sensitive PageR- for assessing people’s influence. Finally, results confirmed ank. Also, the study confirmed the existence of homophily in that only analysing the topological factors is not sufficient for Twitter, justified by the fact that there are users who follow accurately assessing the influence of users. Instead, an ac- others because they actually have some interest in common curate assessment of user influence might also consider the and not due to chance. published content and its influence over the neighbourhood Alike the previous approach, Yamaguchi et al. [2010] of users. presented TuRank for measuring users’ influence based on Future work aims at analysing the influence of Twitter both content information and topology. In this case, the con- users taking into consideration the topics they post about. tent information was considered by analysing how tweets Furthermore, an extensive data analyses involving more Twit- flow among users, i.e. the retweeting phenomenon.Four ter users and human volunteers should be performed in order versions of TuRank were compared to 8 ranking schemes, to obtain more statistical support for the reported results. 20 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina References [Tinati et al., 2012] Ramine Tinati, Leslie Carr, Wendy Hall, [Cha et al., 2010] M. Cha, H. Haddadi, F. Benevenuto, and and Jonny Bentwood. Identifying communicator roles K.P. Gummadi. Measuring user influence in twitter: The in twitter. In Alain Mille, Fabien L. Gandon, Jacques million follower fallacy. In 4th International AAAI Con- Misselis, Michael Rabinovich, and Steffen Staab, edit- ference on Weblogs and Social Media (ICWSM), 2010. ors, WWW (Companion Volume), pages 1161–1168. ACM, 2012. [del Campo-Ávila et al., 2013] J. del Campo-Ávila, [Weng et al., 2010] Jianshu Weng, Ee-Peng Lim, Jing Jiang, N. Moreno-Vergara, and M. Trella-López. Bridging and Qi He. Twitterrank: Finding topic-sensitive influential the gap between the least and the most influential twitter twitterers. In Proceedings of the 3rd ACM International users. Procedia Computer Science, 19(0):437 – 444, Conference on Web Search and Data Mining (WSDM ’10), 2013. The 4th International Conference on Ambient pages 261–270, New York, NY, USA, 2010. Systems, Networks and Technologies (ANT 2013), the 3rd International Conference on Sustainable Energy [Yamaguchi et al., 2010] Yuto Yamaguchi, Tsubasa Taka- Information Technology (SEIT-2013). hashi, Toshiyuki Amagasa, and Hiroyuki Kitagawa. Tur- ank: Twitter user ranking based on user-tweet graph ana- [Gaffney and Puschmann, 2012] Devin Gaffney and Cor- lysis. In Lei Chen, Peter Triantafillou, and Torsten Suel, nelius Puschmann. Game or measurement? algorithmic editors, Web Information Systems Engineering WISE 2010, transparency and the klout score. In #influence12: Sym- volume 6488 of Lecture Notes in Computer Science, pages posium & Workshop on Measuring Influence on Social 240–253. Springer Berlin Heidelberg, 2010. Media Sep. 28-29, volume 5, pages 1–2, 2012. [Gladwell, 2000] Malcolm Gladwell. The tipping point: how little things can make a big difference. Little Brown, Bo- ston, 1st edition, 2000. [Java et al., 2007] A. Java, X. Song, T. Finin, and B. Tseng. Why we Twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Net- work Analysis, pages 56–65, San Jose, CA, USA, 2007. [Katz and Lazarsfeld, 2005] Elihu Katz and Paul Lazarsfeld. Personal Influence: The Part Played by People in the Flow of Mass Communications. Transaction Publishers, Octo- ber 2005. [Keller and Berry, 2003] Edward Keller and Jonathan Berry. The influentials: One American in ten tells the other nine how to vote, where to eat, and what to buy. Simon and Schuster, 2003. [Kendall, 1938] M. G. Kendall. A new measure of rank cor- relation. Biometrika, 30(1/2):81–93, 1938. [Kleinberg, 1999] Jon M Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5):604–632, 1999. [Krishnamurthy et al., 2008] B. Krishnamurthy, P. Gill, and M. Arlitt. A few chirps about Twitter. In Proceedings of the 1st Workshop on Online Social Networks (WOSP’08), pages 19–24, Seattle, WA, USA, 2008. [Kwak et al., 2010] H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Pro- ceedings of the 19th International Conference on World Wide Web (WWW’10), pages 591–600, Raleigh, NC, USA, 2010. [Messias et al., 2013] Johnnatan Messias, Lucas Schmidt, Ricardo Oliveira, and Fabrício Benevenuto. You followed my bot! transforming robots into influential users in twit- ter. First Monday, 18(7), 2013. [Rogers, 2003] Everett M. Rogers. Diffusion of innovations. Free Press, New York, NY [u.a.], 5th edition, 08 2003. 21