<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Novel Metric for Assessing User Influence based on User Behaviour</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonela Tommasel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniela Godoy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISISTAN Research Institute, CONICET-UNCPBA</institution>
          ,
          <addr-line>Tandil, Buenos Aires</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>15</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>People's influence has been the subject of study of several social and humanities disciplines. Lately, the study of user's influence in micro-blogging platforms arises as an important issue. Although social influence or prestige can be defined as the potential or ability of an individual to engage others in a certain act, or to induce others to behave in a particular manner, there is no global consensus on what means to be an influential user. This work aims at shedding some light on how to assess user influence by proposing a novel metric of user influence based on analysing user behaviour regarding both content-based and topological factors. The metric does not only consider each user individually, but also aims at assessing the interactions with his/her neighbourhood. The statistical analysis performed confirmed that only analysing the topological factors is not sufficient for accurately assessing the influence of users. Instead the published content and its influence over the neighbourhood of users has to be also analysed. A comparison with a human assessment of user influence showed that the factors considered by the proposed metric are truly relevant for assessing people's influence.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Several disciplines, such as sociology, communication,
marketing and political sciences have tackle the study of people
and their influence [Rogers, 2003; Katz and Lazarsfeld,
2005]. The notion of people’s influence plays a crucial role in
businesses and in the functioning of societies. For example, it
could modify the spread of fashion or voting patterns
[Gladwell, 2000; Keller and Berry, 2003]. Furthermore, the study
of influence patters could help to understand how trends and
innovations are adopted, and to the design of more effective
publicity campaigns [Cha et al., 2010]. The rapid growth and
exponential usage of social digital media increased the
popularity of micro-blogging platforms, which have become an</p>
      <p>Copyright c 2015 for the individual papers by the papers’
authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
important part of the daily life of millions of users scattered
across the world. As a result, the study of users’ influence in
the context of micro-blogging platforms arises as an
important issue.</p>
      <p>Social influence or prestige can be defined as the potential
or ability of an individual to engage others in a certain act, or
to induce others to behave in a particular manner. In
microblogging sites, the definition traditionally relies on status
attributes such as the number of followers (i.e. the size of the
influence group of a certain user), the number of re-tweets
(i.e. the ability to generate attractive content to be
distributed), and the number of mentions (i.e. the ability to engage
other users in a conversation). However, having a high
number of followers, which would imply a high level of
popularity, it is not sufficient for also being influential in terms of
triggering social responses as retweets or mentions [Cha et
al., 2010]. Moreover, the most influential users are influential
over several topics, but such influence is obtain only through
a concentrated effort of posting tweets related to only one
topic.</p>
      <p>In this context, analysing the behaviour of users
regarding the diffusion of information can be useful for
assessing their influence. Several authors have proposed
characterisations of users based on behavioural patterns observed
through not only topological features [Java et al., 2007;
Krishnamurthy et al., 2008], but also social and
contentrelated features [Tinati et al., 2012]. Such categorisations
analyse the number of published posts, the type of posts
(original posts, replies in conversations, retweets), the number of
times that posts were retweeted, the proportion of retweeted
posts, the proportion of followers, and the number of
interactions with the neighbourhood, among others. Consequently,
the influence of users can be estimated according to certain
behavioural patterns. For example, users who share a large
number of posts, which are highly retweeted, and also have
more followers than followees could be regarded as highly
influential users. On the other hand, users who rarely publish
posts and have a larger proportion of followees than followers
could be regarded as not influential.</p>
      <p>There are also several commercial metrics that claim to
be able to assess the influence of users, such as Klout1 and</p>
      <sec id="sec-1-1">
        <title>1http://klout.com/</title>
        <p>Kred2, among others. However, they have received several
critics and have been the focus of several controversies
regarding how the measurements are computed or the effect
that spam-bots might have on the algorithms. As most of the
commercial measures do not publicly state how scores are
computed, they are not accessible for scrutiny or
reproduction, which might compromise their trustworthiness [Gaffney
and Puschmann, 2012].</p>
        <p>Considering that there might be no consensus on what
means to be an influential user, this work aims at shedding
some light on how to assess user influence by proposing a
novel metric based on analysing user behaviour regarding
the patterns of information diffusion, i.e. it considers both
content-based and topological factors. The metric does not
only consider each user individually, but also aims at
assessing the interactions with his/her neighbourhood. Then, a
statistical analysis is performed for comparing the novel metric
with traditional means for assessing user influence (such as In
Degree or followee/follower ratio), commercial metrics and a
human assessment of user influence.</p>
        <p>The rest of this paper is organised as follows. Section 2
presents several characterisations of users regarding their role
on the diffusion of information. Section 3 presents and
defines the proposed metric for estimating the influence of
Twitter users based on the patterns of user behaviour
regarding the information diffusion process. Section 4 describes the
analysis carried out using Twitter data. Section 5 discusses
related research. Finally, Section 6 summarises the conclusions
drawn from this study and presents future lines of work.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2 Information-based User Characterisation</title>
      <p>Several studies [Java et al., 2007; Krishnamurthy et al., 2008]
have characterised users according to their behaviour in the
information diffusion process by classifying them into three
categories: Information Sources or Broadcasters, Friends
or Acquaintances, and Information Seekers. Information
Sources are those users who have a greater proportion of
followers than followees (i.e. they are followed by more
users than they follow) and also publish valuable and
relevant content on a regular basis. Friends are those users
who have a balanced number of followees and followers,
without necessarily implying the presence of reciprocal
relationships. Finally, Information Seekers are those users who
rarely publish content and have a greater proportion of
followees than followers, aiming at receiving updates.
Furthermore, Tinati et al. [2012] characterised users into five
dimensions (Idea Starters, Amplifiers, Curators,
Commentators and Viewers) according to their social and psychological
behaviour, and how this behaviour affects their posting and
communication activity on Twitter. Each dimension can be
associated to the categories identified in [Java et al., 2007;
Krishnamurthy et al., 2008]. More importantly, these
characterisations can be used for assessing the influence of users:</p>
      <p>Idea Starters are those users who are highly engaged with
the media. As they tend to start conversations, the
majority of their posts correspond to original content and to be
highly retweeted suggesting their influence over their
followers. These users tend to interact with a limited and
selected group of users, ensuring high quality relations with
them. They share the characterisation proposed for
Information Sources.</p>
      <p>Amplifiers are those users who share ideas and opinions
posted by other users. They tend to have a greater number of
followers than followees. They interact with Idea Starters
and share their ideas with a more visible audience. As a
result, the majority of their posts correspond to retweets of
Idea Starters. Posts are also highly retweeted by their
followers. They share the characterisation proposed for Information
Sources.</p>
      <p>Curators are those users who interact with both Idea
Starters and Amplifiers by aggregating their ideas together,
and helping to clarify the topic of conversation. They tend
to have a balanced number of followees and followers, and
to interact with a large number of them. They lie in the
border between Information Sources and Friends as they tend to
share a lot of content (as an Information Source) and to
interact with a large number of users (as a Friend). As the
number of interactions with other users increases, the Curator
behaves as a Friend, whereas as the number of interactions with
other users decreases, the Curator behaves as an Information
Source.</p>
      <p>Commentators are those users who also share the ideas and
opinions of other users, but without interrupting the flow of
the original conversation or immersing in it. They only want
to share content and do not desire to be recognised by their
posts. The main difference between Amplifiers and
Commentators is the impact that their content has over their social
network, measured by the number of retweets received. As the
number of retweets increases the user behaves more as an
Amplifier and less as a Commentator. They can be
characterised as Friends, as they tend to have a balanced number of
followees and followers.</p>
      <p>Viewers are those users who do not share nor publish posts.
They do not engage on conversations or retweet other posts.
Instead, they read or consume large amounts of information.
They tend to have a larger number of followees than
followers. They share the characterisation proposed for Information
Seekers.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Quantitatively Assessing User Influence</title>
      <p>Based on the defined characterisations and dimensions of user
behaviour, this section proposes novel definitions for
quantitatively analysing the behaviour of users in each of the
presented dimensions, and then assessing their influence. The
definitions of Commentators (those users who also share the ideas
and opinions of other users, but without interrupting the flow
of the original conversation or immersing in it) and Viewers
(those users who do not share nor publish posts) are omitted
as they can be inferred from the scores of the other
dimensions. For example, a low score in the Amplifier dimension
could indicate the presence of a Commentator. On the other
hand, low scores in both Idea Starter and Amplifier
dimensions could indicate the presence of a Viewer. In all cases, the
scores are constrained to the interval [0; 1], and corrections
are applied in order to avoid undetermined values.</p>
      <sec id="sec-3-1">
        <title>3.1 Idea Starter</title>
        <p>Idea Starters are characterised for posting a greater
proportion of original content (T weetsORIGINAL) than the other
dimensions. Equation 1 not only assesses the proportion of
original posts (first part), but also the impact of those posts in
the neighbourhood of the user (second part).</p>
        <p>jT weetsORIGINALfT weetsORIGINALRT &gt;=</p>
        <p>P T weejtTswOeRetIsGjINALRT
jReT weetsj
gj
(1)</p>
        <p>The first part considers the ratio between the number
of original tweets with a number of retweets superior to
the inferior limit of the normal distribution of retweets
(jT weetsORIGINAL fT weetsORIGINALRT &gt;= gj,
where T weetsORIGINALRT is the number of retweets
that T weetsORIGINAL has received, and and represent
the arithmetic mean and standard deviation of the retweet
distribution, respectively) and the total number of published
tweets (jT weetsj). The restriction imposed on the number
of retweets assesses whether the received retweets are
uniformly distributed over all published tweets or over a small
proportion of them. The second part assesses the impact
that posts have on the neighbourhood of the user, which is
measured as the ratio between the retweets that the original
content received (P T weetsORIGINALRT ) and the total
number of retweeted tweets (jReT weetsj). The higher the
score, the more the user behaves as an Idea Starter, and thus
as an Information Source.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Amplifier</title>
        <p>They are characterised for posting a greater proportion of
retweeted content (T weetsRT ), engage on conversations
(T weetsREP LY ) or even start conversations by mentioning
other users (T weetsMENT ION ). Equation 2 assesses not
only the interaction between a user and his/her social network
(first part), but also the impact of those posts in such network
(second part).</p>
        <p>jT weetsRT j+jT weetsREP LY j+jT weetsMENT ION j</p>
        <p>T weetsj
P T weetsRT RT +P T weetsjREP LY RT +P T weetsMENT ION RT
jReT weetsj
(2)</p>
        <p>The first part considers the ratio between the added
number of retweeted content, conversations and tweets
containing mentions, and the total number of published tweets.
The second part assesses the impact that posts considered
in the first part have on the social network of the user,
which is measured as the ratio between the retweets
received by the retweeted content (P T weetsRT RT ), the
conversations (P T weetsREP LY RT ) and the mentions of other
users (P T weetsMENT IONS RT ), and the total number of
retweets. Also, the second part aids in the accurate
differentiation between Amplifiers and Commentators. A high score in
the second part might indicate the presence of an Amplifier,
whereas a low score might indicate the presence of a
Commentator. The higher the score, the more a user behaves as
an Amplifier, and thus as an Information Source.
(3)
(4)
They are characterised for interacting with a greater number
of users than those characterised by the other dimensions. By
default any user can interact with any other user, regardless
whether they are actually followers of that other user.
Equation 3 assesses not only the number of interactions with other
users (first part), but also to what extend a user interacts only
with his/her neighbourhood.</p>
        <p>jInteractions2fF ollowers[F olloweesgj</p>
        <p>jInteractionsj
jInteractions2fF ollowers[F olloweesgj</p>
        <p>jF ollowersj+jF olloweesj</p>
        <p>The first part considers the ratio between those
interactions that belong to either the follower or followee list
(Interactions 2 fF ollowers [ F olloweesg), and the total
number of interactions (Interactions). Then, the second
part considers the proportion of users with whom a certain
user interacts regarding the size of the neighbourhood. The
higher the number of interactions, the higher the score, and
thus the less the user behaves as an Information Source.
3.4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Follower/Followee Ratio</title>
        <p>In addition to the content-related dimensions, a topological
factor can also be considered. The content-related
dimensions do not consider the size of the neighbourhood of a user.
As a result, two users might achieve the same score but have a
totally different neighbourhood. In other words, it is more
important a user with a high content-related score and a greater
neighbourhood engaged in his/her content (i.e. a greater
number of followers) than a user with a high content-related
score but a smaller neighbourhood engaged in his/her
content. Consequently, the Follower/FolloweeRatio (F FRatio),
is proposed to leverage the importance of the neighbourhood
size (Equation 4).</p>
        <p>jF ollowersj
jF ollowersj + jF olloweesj</p>
      </sec>
      <sec id="sec-3-4">
        <title>Information Source Index</title>
        <p>Based on the previous metrics, the Information Source Index
(IS) is defined for numerically characterising users according
to their behaviour. The metric denotes to what extent a user
can be considered an Information Source or an Information
Seeker. High values of IS denote users behaving as
Information Sources, whereas low values of IS denote users
behaving as Information Seekers. For computing the IS index the
Idea Starter, Amplifier and 1 Curator are assigned equal
weight and thus combined by means of the arithmetic mean
( IDAC ), as shown in Equation 5.</p>
        <p>IDAC = Idea-Starter+Amplif3ier+(1 Curator) (5)
Then, the combination of the content-related dimensions
(i.e. IDAC ) and the topological factor F FRatio are
combined by means of the Harmonic mean for defining the IS, as
shown in Equation 6. As the content-based dimensions and
the topology factor represent different aspects of user
behaviour, they are different kind of elements, and thus cannot be
combined by means of the arithmetic mean. Consequently,
the Harmonic mean is more adequate for computing the final
score. Furthermore, the Harmonic mean is less biased to the
presence of small numbers or outliers.</p>
        <p>IS (uj ) = 2 IDAC F FRatio</p>
        <p>IDAC+F FRatio
(6)</p>
        <p>As Information Sources represent those users who are
highly engaged with the media, publish valuable and relevant
content on a regular basis they could be considered
influential users. Furthermore, they tend to engage a great audience
of Amplifiers and Commentators who share and enrich their
posts. Due to its relevance, their published content tends to
be highly retweeted, which also implies a high number of
interactions with their neighbourhood. Additionally, they tend
to be highly followed by Viewers. As a result, Information
Sources meet all the requirements for being regarded as
influential users. In this context, the influence of users could be
measured by means of the IS score. The higher the IS of a
user, the higher the influential such user is supposed to be.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data Analysis</title>
      <p>This section presents the experimental evaluation performed
to assess the effectiveness of the proposed metric. Section 4.1
presents other scores for measuring the influence of users in
the context of social networks to which the presented metric
was compared. Section 4.2 describes the data collections and
data analysis settings regarding the human assessment of user
influence. Finally, Section 4.3 presents the results of the data
analysis performed.
4.1</p>
      <sec id="sec-4-1">
        <title>Metrics used for Comparison</title>
        <p>In order to quantitatively assess the effectiveness of the
proposed metric for measuring the influence of users, it was
compare to the scores of other related metrics. The first two are
commercial metrics, whereas the last two are metrics that can
be easily computed with simple data obtained from the users’
profiles.</p>
        <p>Klout was launched in 2008. It provides a
measurement of the online influence of users by combining
information extracted from Twitter, Facebook, LinkedIn, Instagram,
Google+, Flickr, Blogger and Foursquare, among others. It
is based on three fundamental principles: True Reach (how
many users a certain user influences), Amplification (how
much a certain user influence other users), and Network
Impact (the influence of networks of users).</p>
        <p>Kred was launched in 2012. It measures social influence
and outreach in Twitter and Facebook, aiming at assessing the
trust and generosity of users. Social influence is measured by
assessing the retweets, replies, mentions and new followees a
user has, i.e a user receives social influence points every time
people interacts with his or her content.</p>
        <p>In Degree. Computes the influence of a user as the number
of his/her followers. This metric is currently used by many
third-party services, such as TwitterHolic3.</p>
        <p>Follower/Followee Ratio. Computes the influence of a user
as the ratio between their followers and followees. A high
score indicates that the user has a higher proportion of
followers than followees.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Data Analysis Settings</title>
        <p>Several data collections were created by manually selecting
Twitter users who were considered influential or even
popular according to the criteria presented by Twitter Counter4 and
WeFollow5. Selected users were grouped according to their
topic of influence. In total seven datasets were created:
Argentina (13 users), Miscellaneous (8 users), Music (8 users),
Politics (7 users), Sports (7 users), Technology (7 users) and
Tv-Movies (10 users). For every user, all tweets, followees,
followers, favourite tweets and user account information were
retrieved. All the information was obtained by means of
requests to the TwitterAPI6. Additionally, for each user his or
her Kred and Klout score was obtained during March 2015
by means of the Kred 7 and Klout 8 APIs respectively.</p>
        <p>Considering that there is no consensus on what means
to be an influential user [del Campo-Ávila et al., 2013;
Gaffney and Puschmann, 2012], this work aims at
analysing the human perception of influence. The presented
metric was compared not only to several commercial metrics, but
also to a human assessment of user influence. Undergraduate
and graduate students from an Artificial Intelligence course at
UNICEN University (Argentina) were asked to rank the sets
of users previously presented according to the perceived
influence. The ranking task was materialised by means of a web
site 9 in which the students were able to access a brief
summary of the profiles and latest tweets posted by each
Twitter user. All rankings were performed during April 2015.
In total, 31 students ranked the users according to their
perceived influence. In order to combine the rankings provided
by the students, Twitter users were assigned the mode of the
provided rankings.</p>
        <p>Once all users were ranked according to their influence
score in each metric, it is possible to quantify how the rank
of users varies across the different metrics. The correlation
between the different rankings was analysed by means of the
Kendall coefficient [Kendall, 1938], which is a statistic used
for measuring the association between two measured
quantities in the form of lists or rankings. The correlation takes a
value between -1 and 1 so that the higher the score, the higher
the agreement between the two rankings. A score equal to 0
indicates that the two rankings are independent. The
correlation is analysed for the total number of Twitter users in each
dataset.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Analysis of Influence Metrics</title>
        <p>Figure 1 shows the correlation among the different analysed
metrics. As it can be observed, in all cases the correlation
between rankings depends on the dataset under
consideration. This could indicate that the topic that users publish
about might be an important factor to consider in the
analysis. As regard the commercial metrics (Figures 1b and 1c),
their highest correlations values were found for the
Technology dataset, where the correlation between KloutScore and</p>
        <sec id="sec-4-3-1">
          <title>4http://twittercounter.com/</title>
          <p>5http://wefollow.com
6https://api.twitter.com/
7https://developer.peoplebrowsr.com/kred/
8https://klout.com/s/developers/home/
9https://sites.google.com/site/influenciatwitterusers/
KredScore was higher than 0.8. On the contrary, the
lowest correlation among the commercial metrics was found for
the Miscellaneous dataset. However, only for the Technology
dataset the p-values were lower than 0.05, which indicates
that for all the other datasets the null hypothesis cannot be
rejected and thus, it can be stated that both commercial
metrics are actually independent. These results reinforce the idea
that there is no consensus between the commercial metrics on
how the influence of users is computed.</p>
          <p>For most datasets, both commercial metrics were not
correlated with the metric presented in this work. These results
could imply that the content-based and topological
dimensions the IS calculation is based upon are not regarded in the
same manner or with the same importance by the
commercial metrics. Moreover, the commercial metrics could be also
based on a different set of dimensions or features. The only
exception was for the Politics dataset in which the highest
correlation value corresponded to the IS. Additionally, the
degree of correlation between the commercial metrics and the
FF-ratio and In Degree also depended on the considered
dataset. For example, regarding KloutScore, the highest
correlation with In Degree was found for the Miscellaneous dataset,
whereas the highest statistically significant correlation with
the FF-ratio was found for the Technology dataset.
Interestingly, for the Technology dataset, the correlation between In
Degree and KloutScore was lower than for the FF-ratio, which
could indicate that in such topic, it is not highly important the
actual number of followers, but the proportion of followers
regarding the number of followees.</p>
          <p>As regards the IS (Figure 1a), for three datasets (Argentina,
Miscellaneous and Technology) the highest correlation values
corresponded to the human assessment. This could indicate
that the human users agree with the criteria considered by the
proposed metric for measuring influence. On the contrary,
for two datasets (Sports and TV-movies) the highest
correlation values corresponded to the KloutScore. Furthermore,
for those datasets the correlations to the other metrics were
negative. Note that, when analysing the correlation between
FF-ratio, the human assessment and every other metric for
those datasets, results were similar. As a result, it can be infer
that, in those cases, the human assessment of influence was
mostly guided by topological factors, such as the number of
followees, and not by the content they post or the impact that
such content has in the form of retweets. The highest
overall correlations were found for the Politics dataset, being the
most statistically significant the ones with KloutScore and the
human assessment. Conversely, the correlation with the In
Degree was statistically insignificant. These results highlight
the importance of not only considering the topological links,
but also the published content and its impact.</p>
          <p>Regarding the human assessment (Figure 1d), the highest
overall correlations were found for the Miscellaneous and
Technology datasets with the FF-ratio, which could imply the
preference for users with a higher proportion of followers.
However, this could be also caused by human users not being
familiar with the Twitter users they had to analyse. It is worth
mentioning that for the dataset Argentina, the highest
correlation value was found for the IS. Furthermore, for this dataset
the correlations with both topological metrics were negative,
1
0.8
0.6
reo 0.4
cS
ton 0.2
lir
a
e
roC 0
-0.2
-0.4
-0.6
1
0.8
0.6
e
r
coS 0.4
n
o
iltr
rae 0.2
o
C
0
-0.2
-0.4
which reinforced the fact that topological factors are not
sufficient for assessing the influence of users regarding their real
impact or influence in their neighbourhood. These results are
of great importance as the human users were highly familiar
with all the users in the dataset, and thus these rankings can
be regarded as the most accurate ones.</p>
          <p>In summary, as in most cases the analysed metrics were not
highly correlated, results highlighted the fact that there is no
consensus on how the scores are computed and that defining
user influence cannot be considered a trivial task. Moreover,
in several cases the influence assessment of the different
metrics proved to be independent from each other. Furthermore,
results seemed to indicate that among the different topics
might not be an uniform consensus regarding what means to
be influential, which further remarks the fact that there is no
unique definition of user influence. Finally, results showed
that in specific topics (for example Music and Sports) the
appreciation of human users might be related mostly to the
popularity of users measured by means of topological factors.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>Cha et al. [2010], compared three measures of user influence:
In Degree, Retweet and Mention influence. Results showed
a strong correlation between the Retweet and Mention
influence. However, In Degree was not strongly correlated to the
other two measures, which could imply that the most
connected users are not necessarily the ones that are most capable of
engaging others in conversations or on spreading their tweets.</p>
      <p>Also, Kwak et al. [2010] compared three measures of
influence: Followers, Retweets and PageRank. Results agreed
with those in [Cha et al., 2010]. The three rankings comprised
different users, and only 4 users out of 20 appeared in the
three rankings. The authors found that the Retweet ranking
differed from the other two, which could indicate that not
necessarily the most followed users are also the most retweeted
ones. Both works highlighted the fact that user influence can
be defined from different points of view, which are not
necessarily contradictory.</p>
      <p>On the other hand, several works [Weng et al., 2010;
Yamaguchi et al., 2010] focused on creating more
sophisticated approaches for measuring user influence by
combining both topological and content-based features. Weng et
al. [2010] presented TwitterRank, an extension of the
PageRank algorithm, that unlike the original algorithm, considers
the topological structure of the network and the topical
similarity between users. Experimental results showed that
TwitterRank was able to improve the algorithm used by Twitter,
and both the original PageRank and Topic-sensitive
PageRank. Also, the study confirmed the existence of homophily in
Twitter, justified by the fact that there are users who follow
others because they actually have some interest in common
and not due to chance.</p>
      <p>Alike the previous approach, Yamaguchi et al. [2010]
presented TuRank for measuring users’ influence based on
both content information and topology. In this case, the
content information was considered by analysing how tweets
flow among users, i.e. the retweeting phenomenon.Four
versions of TuRank were compared to 8 ranking schemes,
including number of followers and retweets, PageRank,
HITS [Kleinberg, 1999]. According to the authors, all the
other ranking schemes were outperformed by TuRank as they
only consider topological information, suggesting the
importance of considering also content.</p>
      <p>As regards commercial metrics, Messias et al. [2013]
found that they might be vulnerable and easy to manipulate.
The authors developed bot accounts, which were able to
interact with real users by following them or posting tweets about
interesting topics by following different patterns of followee
selection and posting activity. Results showed that bot
accounts were able to become influential by following simple
strategies, reaching similar or higher scores than celebrities
or individuals with great reputation. These results imply that
the commercial measures should review their algorithms to
avoid being influenced by automatic activity. Finally, del
Campo-Ávila [2013] compared the scores of Klout,
PeerIndex and TwitterGrader. They found that the TwitterGrader
is not highly correlated with the other two metrics, whereas
Klout and PeerIndex are highly correlated, as a result the
features considered for measuring the influence of users must
vary for each metric. Furthermore, the authors stated that,
unlike Klout and PeerIndex, TwitterGrader is mainly focused
on network topology.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>This work aimed at shedding some light on how to assess
user influence by proposing a novel metric based on user
behaviour regarding both content-based and topological factors.
The metric does not only consider each user individually, but
also aims to assess the interactions with their neighbourhood.</p>
      <p>The novel metric was compared to traditional means for
assessing user influence, commercial metrics and a human
assessment of user influence. The performed data analysis
showed that there is no consensus on how the scores are
computed and that defining user influence cannot be considered
a trivial task. For example, in most cases, the commercial
metrics proved to be independent from each other.
Furthermore, results seemed to indicate that even among the
different topics there might not be an uniform consensus regarding
what means to be influential, which further remarks the fact
that there is no unique definition of user influence, and that
such definition might differ according to the analysed topic.
Interestingly, the presented metric achieved its highest
correlations with the human assessment of influence, which might
indicate that the factors considered by the IS are truly relevant
for assessing people’s influence. Finally, results confirmed
that only analysing the topological factors is not sufficient for
accurately assessing the influence of users. Instead, an
accurate assessment of user influence might also consider the
published content and its influence over the neighbourhood
of users.</p>
      <p>Future work aims at analysing the influence of Twitter
users taking into consideration the topics they post about.
Furthermore, an extensive data analyses involving more
Twitter users and human volunteers should be performed in order
to obtain more statistical support for the reported results.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Cha et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Haddadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.P.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          .
          <article-title>Measuring user influence in twitter: The million follower fallacy</article-title>
          .
          <source>In 4th International AAAI Conference on Weblogs and Social Media (ICWSM)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>[del Campo-Ávila</surname>
          </string-name>
          et al.,
          <year>2013</year>
          ] J.
          <string-name>
            <surname>del Campo-Ávila</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno-Vergara</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Trella-López</surname>
          </string-name>
          .
          <article-title>Bridging the gap between the least and the most influential twitter users</article-title>
          .
          <source>Procedia Computer Science</source>
          ,
          <volume>19</volume>
          (
          <issue>0</issue>
          ):
          <fpage>437</fpage>
          -
          <lpage>444</lpage>
          ,
          <year>2013</year>
          .
          <source>The 4th International Conference on Ambient Systems, Networks and Technologies (ANT</source>
          <year>2013</year>
          ),
          <source>the 3rd International Conference on Sustainable Energy Information Technology (SEIT-2013).</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Gaffney and Puschmann</source>
          , 2012]
          <string-name>
            <given-names>Devin</given-names>
            <surname>Gaffney</surname>
          </string-name>
          and
          <string-name>
            <given-names>Cornelius</given-names>
            <surname>Puschmann</surname>
          </string-name>
          .
          <article-title>Game or measurement? algorithmic transparency and the klout score</article-title>
          .
          <source>In #influence12: Symposium &amp; Workshop on Measuring Influence on Social Media Sep. 28-29</source>
          , volume
          <volume>5</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Gladwell</source>
          , 2000]
          <string-name>
            <given-names>Malcolm</given-names>
            <surname>Gladwell</surname>
          </string-name>
          .
          <article-title>The tipping point: how little things can make a big difference</article-title>
          .
          <source>Little Brown</source>
          , Boston, 1st edition,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Java et al.,
          <year>2007</year>
          ]
          <string-name>
            <given-names>A.</given-names>
            <surname>Java</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Tseng</surname>
          </string-name>
          .
          <article-title>Why we Twitter: Understanding microblogging usage and communities</article-title>
          .
          <source>In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis</source>
          , pages
          <fpage>56</fpage>
          -
          <lpage>65</lpage>
          , San Jose, CA, USA,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Katz and Lazarsfeld</source>
          , 2005]
          <string-name>
            <given-names>Elihu</given-names>
            <surname>Katz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Lazarsfeld</surname>
          </string-name>
          . Personal Influence:
          <article-title>The Part Played by People in the Flow of Mass Communications</article-title>
          . Transaction Publishers,
          <year>October 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Keller and Berry, 2003] Edward Keller and Jonathan Berry.
          <article-title>The influentials: One American in ten tells the other nine how to vote, where to eat, and what to buy</article-title>
          .
          <source>Simon and Schuster</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>[Kendall</source>
          , 1938]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Kendall</surname>
          </string-name>
          .
          <article-title>A new measure of rank correlation</article-title>
          .
          <source>Biometrika</source>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          /2):
          <fpage>81</fpage>
          -
          <lpage>93</lpage>
          ,
          <year>1938</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Kleinberg</source>
          , 1999]
          <string-name>
            <surname>Jon</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Kleinberg.</surname>
          </string-name>
          <article-title>Authoritative sources in a hyperlinked environment</article-title>
          .
          <source>Journal of the ACM (JACM)</source>
          ,
          <volume>46</volume>
          (
          <issue>5</issue>
          ):
          <fpage>604</fpage>
          -
          <lpage>632</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Krishnamurthy et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>B.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gill</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Arlitt</surname>
          </string-name>
          .
          <article-title>A few chirps about Twitter</article-title>
          .
          <source>In Proceedings of the 1st Workshop on Online Social Networks (WOSP'08)</source>
          , pages
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          , Seattle, WA, USA,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Kwak et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Moon</surname>
          </string-name>
          .
          <article-title>What is Twitter, a social network or a news media</article-title>
          ?
          <source>In Proceedings of the 19th International Conference on World Wide Web (WWW'10)</source>
          , pages
          <fpage>591</fpage>
          -
          <lpage>600</lpage>
          , Raleigh,
          <string-name>
            <surname>NC</surname>
          </string-name>
          , USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Messias et al.,
          <year>2013</year>
          ]
          <string-name>
            <given-names>Johnnatan</given-names>
            <surname>Messias</surname>
          </string-name>
          , Lucas Schmidt, Ricardo Oliveira, and
          <string-name>
            <given-names>Fabrício</given-names>
            <surname>Benevenuto</surname>
          </string-name>
          .
          <article-title>You followed my bot! transforming robots into influential users in twitter</article-title>
          .
          <source>First Monday</source>
          ,
          <volume>18</volume>
          (
          <issue>7</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[Rogers</source>
          , 2003]
          <string-name>
            <surname>Everett</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Rogers</surname>
          </string-name>
          . Diffusion of innovations. Free Press, New York, NY [u.a.],
          <source>5th edition</source>
          ,
          <year>08 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Tinati et al.,
          <year>2012</year>
          ]
          <string-name>
            <given-names>Ramine</given-names>
            <surname>Tinati</surname>
          </string-name>
          , Leslie Carr, Wendy Hall, and
          <string-name>
            <given-names>Jonny</given-names>
            <surname>Bentwood</surname>
          </string-name>
          .
          <article-title>Identifying communicator roles in twitter</article-title>
          . In Alain Mille, Fabien L. Gandon, Jacques Misselis,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          , and Steffen Staab, editors,
          <source>WWW (Companion Volume)</source>
          , pages
          <fpage>1161</fpage>
          -
          <lpage>1168</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Weng et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Jianshu</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ee-Peng Lim</surname>
            , Jing Jiang, and
            <given-names>Qi</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          .
          <article-title>Twitterrank: Finding topic-sensitive influential twitterers</article-title>
          .
          <source>In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM '10)</source>
          , pages
          <fpage>261</fpage>
          -
          <lpage>270</lpage>
          , New York, NY, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Yamaguchi et al.,
          <year>2010</year>
          ]
          <string-name>
            <given-names>Yuto</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          , Tsubasa Takahashi, Toshiyuki Amagasa, and
          <string-name>
            <given-names>Hiroyuki</given-names>
            <surname>Kitagawa</surname>
          </string-name>
          . Turank:
          <article-title>Twitter user ranking based on user-tweet graph analysis</article-title>
          .
          <source>In Lei Chen</source>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Triantafillou</surname>
          </string-name>
          , and Torsten Suel, editors,
          <source>Web Information Systems Engineering WISE</source>
          <year>2010</year>
          , volume
          <volume>6488</volume>
          of Lecture Notes in Computer Science, pages
          <fpage>240</fpage>
          -
          <lpage>253</lpage>
          . Springer Berlin Heidelberg,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>