=Paper= {{Paper |id=Vol-2247/poster11 |storemode=property |title=The Role of Social Capital in Information Diffusion over Twitter: A Study Case over Brazilian Posts |pdfUrl=https://ceur-ws.org/Vol-2247/poster11.pdf |volume=Vol-2247 |authors=Hercules Sandim,Danilo Azevedo,Ana Paula Couto Da Silva,Mirela M. Moro |dblpUrl=https://dblp.org/rec/conf/vldb/SandimASM18 }} ==The Role of Social Capital in Information Diffusion over Twitter: A Study Case over Brazilian Posts== https://ceur-ws.org/Vol-2247/poster11.pdf
                  The Role of Social Capital in
               Information Diffusion over Twitter:
                a Study Case over Brazilian posts

                        Hercules Sandim1,2 , Danilo Azevedo1 ,
                   Ana Paula Couto da Silva1 , and Mirella M. Moro1
               1
                Universidade Federal de Minas Gerais, Belo Horizonte, Brasil
     {herculessandim,danilo-va}@ufmg.br, {ana.coutosilva,mirella}@dcc.ufmg.br
           2
             Universidade Federal de Mato Grosso do Sul, Campo Grande, Brasil


          Abstract. Social Capital is the resulting advantage of the individual’s
          localization in a social structure. It can be measured by traditional com-
          plex networks metrics, or specific ones, such as information capital, bro-
          kerage and bridging. Our goal is to verify which users have high in-
          formation capital, bridging and brokerage for providing and spreading
          information. To do so, we first categorize Twitter users into seven types:
          typical users, primary media, secondary media, independent experts, fan
          accounts, fake accounts and potential bots. Then, we analyze their pro-
          files on trending topics. Our results show potential bots and fan accounts
          as the main information spreaders in Brazil, a very concerning result
          given the upcoming presidential election in October 2018.

          Keywords: Social Networks · Social Capital · Information Diffusion.


    1   Introduction
    Online Social Networks (OSNs), such as Facebook, Twitter, LinkedIn and In-
    stagram, have achieved unprecedent growth in recent years. Current statistical
    data show Facebook has over 2.2 billion monthly active users, while Instagram,
    Twitter and LinkedIn have over 813 million, 330 million and 260 million monthly
    active users respectively [38]. Such huge volume of users and relationships is a
    motivation for several researches in the areas of Complex Networks, Big Social
    Data, and Urban Computing [16,22,23,36].
        Among many usages, OSNs are powerful media for information diffusion
    [1,18,30,27]. Information diffusion occurs when there is a flow of information from
    one individual to another. The information may be retained by an individual or
    spread out on the network [39]. Individuals who are in a privileged position in
    the OSN have more social capital for information diffusion.
        Social capital is a comprehensive concept, without one single definition or
    metric to capture all its facets [31]. In Social Network Analysis (SNA), it is the
    resulting advantage of the individual’s right localization in a social structure [9].
    For instance, the individual who holds information has the power to change what
    happens in one environment and to understand its surroundings [42].




Copyright held by the author(s)
2       H. Sandim, D. Azevedo, Ana P. C. da Silva, M. M. Moro

    In this scenario, new players arise and become relevant: the “human sensors”,
or citizens who share information about their environment via OSNs, supple-
menting, complementing or even replacing information as measured by physical
sensors [13]. Human involvement is particularly useful in detecting multiple pro-
cesses in complex personal, social and urban spaces where traditional embedded
sensor networks have limitations [37]. The society engagement in the OSNs,
whether individually or in small groups, can be facilitated through its social
capital [34]. Overall, humans are relevant data sources, acquiring and spreading
information on their own [37].
    Such human sensing for information diffusion can be useful for emergency
events management, crime detection, urban administration, intelligent trans-
portation, smart traffic control, public healthcare, political engagement, among
others [26,37]. In addition, the information can change the view of the recipi-
ent, motivating him/her to join the social network of the information provider
[37]. Also, trust among communicating individuals strongly affects the reach and
impact of information [19], and trust is an important social capital aspect [3,34].
    However, OSNs have become significant spreaders of false facts, urban leg-
ends, fake news, or, more generally, misinformation. Misinformation drives the
emergence of a post-truth society, where the debate becomes damaged by the
repetition of discussions refuting the primary media or independent experts (of-
ficial sources of truthful information) [27].
    Even so, people still rely on the news published in social media. In 2017,
according to Reuters [32], despite the predominance of TV stations in the media
environment, social media played an essential role in the consumption of news,
being the primary sources of news within the Brazilian urban context. For in-
stance, in 2016, the impeachment of President Dilma Rousseff drew attention for
its repercussion in Brazilian social media. However, only 30% of people believe
that the social media is free from undue political influence.
    Overall, the accurate information diffusion contributes to building smart ur-
ban spaces [29], promoting improvement in the quality of life of their citizens.
Consequently, to ensure reliability to the citizen, the primary media and inde-
pendent experts must take the lead in OSNs. Henceforth, we address the social
capital forms as related to individuals’ abilities to acquire and spread informa-
tion. We model a network of Twitter users relating them through retweets in
messages that contribute to a subject to become Brazilian Twitter Trend. We an-
alyze and compare social capital metrics to verify the importance of the primary
media and independent experts compared to typical users, fan accounts, and
potential bots, in the Brazilian information diffusion context. Although consid-
ering only messages written in Brazilian Portuguese, our methodology is broad
enough to be applied to any language without loss of generality.


2   Related Work

This section is divided in two parts: general work on social capital and online
social networks (Section 2.1), and specific metrics for social capital (Section 2.2).
            The Role of Social Capital in Information Diffusion over Twitter      3

2.1   Social Capital and Online Social Networks



In SNA, social capital is the resulting advantage of the individual’s right local-
ization in a social structure [9]. There are many definitions of social capital. For
instance, Bourdieu [6] emphasizes social capital as the collective resources used
by individual members to obtain services and benefits either in the absence or
conjunction with their economic capital. For Coleman [11], social capital is a
neutral resource that facilitates any action, where the individuals are responsi-
ble for achieving their objectives. In turn, Putnam [34] defines social capital as
characteristics of the social organization, such as social networks, social norms,
and social trust, which facilitate coordination and cooperation for mutual bene-
fit and civil engagement. Moreover, Bertolini and Bravo [3] define social capital
as the resources individuals have access in a given network.
    Authors also have different ways of thinking about how to classify social cap-
ital types. Putnam [33] defines two forms of social capital: bonding and bridging.
In [21], Jackson introduces seven types: information, brokerage, bridging, coor-
dination, favor, reputation, and community capital. Moreover, Burt [9] defines
two activities related to social capital: brokerage (an individual self placing in
a privileged position in the network), and closure (the coordination of a closed
group of individuals in the network).
    Regarding social capital over OSNs, there are metrics to visibility, reputation,
popularity, and authority. Bertolini and Bravo [3] describe that reputation and
authority comprise the sum of knowledge and information disseminated within
a given group. Measuring the relationships’ strength is also relevant (identify-
ing weak ties), as well identifying hubs and influencers. Then, there are tradi-
tional metrics such as: ego-network measure, the structural hole measure [10],
homophilia measure [5], and the standard centrality measures [14]. Also, Kang
and Shen [25] define quantitative metrics for social capital, not validated in
OSNs, and Michalak et al. [31] present measures of social capital based on tech-
niques of cooperative game theory.
   Social capital is also used for explaining information diffusion processes in
social networks. Authors in [2,17] show that the presence of weak ties and hubs
accelerates the information diffusion on social networks. In turn, Kleinberg [14]
models the networks’ cascading behavior like an influence process among indi-
viduals. Differently, many authors deal with the essential nodes detection (influ-
encers) in OSNs as being the individuals who maximize spreading [20,24,35].
   The aforementioned works do not rely on social capital for understanding the
process of information diffusion over OSNs. Here, we assume that information
capital, bridging and brokerage are skills from individuals that are well placed
on the network structure and take advantage of their position to acquire and
control the information flow. Furthermore, we apply the hub and authorities
concepts proposed by Kleinberg in [28] as tool for identifying the use of social
capital on information diffusion process.
4       H. Sandim, D. Azevedo, Ana P. C. da Silva, M. M. Moro

2.2   Metrics for Social Capital
The previous section presented general work on social capital and networks. We
now discuss existing metrics for social capital. We begin by providing notation
that helps to define the metrics. Let G denote a directed graph, where V (G) is its
set of vertices (or nodes), and E(G) its set of edges. Also, n = |V (G)| and m =
|E(G)|. G is represented by its adjacency matrix A ∈ [0, 1]n×n , where Au,v =1
indicates the existence of an edge between u and v, and Au,v =0 otherwise. Given
No (u) as the set of u’s outgoing neighbors and Ni (u) the set of u’s incoming
neighbors, then |No (u)| is u’s out-degree and |Ni (u)| its in-degree. Finally, N (u)
is the set of u’s neighbors, where N (u) = No (u) ∪ Ni (u) and |N (u)| is u’s degree.
     A path in G between two nodes u and v is a succession of distinct nodes u ⇒
u0 , u1 , u2 , ..., up ⇒ v such that Ak,k+1 = 1 (∀k | 0 ≤ k < p). A geodesic (shortest
path) between nodes u and v is a path such that no other path between them
involves a smaller number of edges. Let guv the number of geodesics connecting
u to v, and guv (i) the number of geodesics that node i is on. Lastly, let d the
graph diameter, that is, the largest geodesic distance between any pair of nodes.

Information Capital. Decay Centrality (DC) has been applied to measure
individual’s information capital [21]. In summary, DC measures the number of
individuals reached by a specific individual, regardless the path length to arrive
to them. DC counts paths of different lengths, i.e., how many people one can
reach at different distances. The decay of information with distance is captured
via a parameter p, with 0 < p < 1. Furthermore, DC favors individuals that
reach the largest number of neighbors in up to T hops, known as information’s
endurance. The DC metric of a node i, is given by Equation 1, where N l (i) is
the set of individuals at maximum distance l from i in G.
                                        T
                                        X
                              DC(i) =          pl .|N l (i)|.                     (1)
                                         l=1
    From Equation 1, a node has high information capital if its broadcast infor-
mation reaches a large number of nodes. Otherwise, a node has low information
capital. Although it has a simple calculation, DC does not consider all the pos-
sible paths that information might take [21]. The Eigenvector Centrality (EC)
[4], or simply EigenCentrality, is an alternative to solve such a limitation [21].
For EC metric, the node importance depends on its neighborhood importance.
Moreover, EC is a node influence measure in the network and given by Equation
2, where λ is the largest eigenvalue of R, and R is an eigenvector of A (the
graph’s adjacency matrix).
                                        1 X
                             EC(i) =        EC(t).                                (2)
                                        λ
                                          t∈N (i)


Bridging Capital. Individuals who are bridges in the network topology are
special nodes that can control the information flow [21], and/or accelerate its
             The Role of Social Capital in Information Diffusion over Twitter    5

diffusion [17]. Here, we follow Granovetter’s theory [17] that defines a bridge
as a weak tie in the network. For labeling a node as a bridge, we apply the
Neighborhood overlap metric (NO) as defined in Equation 3. Then, following
Brandao and Moro [7], we classify a tie (edge, link) between nodes u and v as
weak if: 0 ≤ NO(u, v) ≤ 0.2.

                                          |No (u) ∩ No (v)|
                       N O(u, v) =                                              (3)
                                     |No (u) ∪ No (v) − {u, v}|

Brokerage Capital. From [10,21], nodes with high Betweenees Centrality (BC)
play the role of brokers in the network. A normalized BC metric (for directed
graphs) is given by [15] as Equation 4.
                                       P             gjk (i)
                                          j,k∈E(G)    gjk
                            BC(i) =                            .                (4)
                                        (n − 1).(n − 2)

Hubs and Authorities. In order to rank the most important nodes based on
their outgoing and incoming links, we apply the HITS algorithm proposed in
[28]. As we discuss in Section 2.1, this approach enables profiling nodes with
special roles on the information diffusion process. In summary, HITS algorithm
computes two types of ranking: (i ) the authority ranking estimates the node
importance based on the incoming links; and (ii ) the hub ranking estimates the
node importance based on the outgoing links. We refer the reader to [28] for
more details on such an algorithm.


2.3    Main Contributions

Our contributions over the related work are:

      – We apply network topological metrics for analyzing information capital
        (eigenvector centrality), bridging (neighborhood overlap), and brokerage
        (betweenness centrality);
      – We apply decay centrality for measuring information capital;
      – We apply the HITS algorithm for profiling nodes in Twitter; and
      – We provide a comparative analysis of the top 10 Twitter users, regarding
        network topological metrics, decay centrality, and the HITS algorithm.


3     Methodology

Our work evaluates social capital over a social network as extracted from posts
in an online microblogging platform. To do so, we first build a dataset through
collecting real posts in Twitter over a week. In order to be able to easily qualify
the posts and our analyses results, our collecting process is limited to tweets
in Brazilian Portuguese, our native language. Nonetheless, our methodology is
6       H. Sandim, D. Azevedo, Ana P. C. da Silva, M. M. Moro

broad enough to be applied to any language without loss of generality. Then, in
order to better qualify our analysis, we categorize the users in two different forms.
Next, we implement the previously discussed metrics and apply them over a net-
work modeling. Finally, we are able to evaluate such metrics and the potential
relevance of users according to their behavior. The next sections discuss respec-
tively: the dataset building process, our categorization for user types, metrics
implementations and parameters setting, and the network modeling considered.

3.1   Dataset
Our work evaluates social capital over a social network as extracted from tweets.
Tweet is a Twitter3 post (limited to 280 characters, in Brazil), and retweet (RT)
is a re-posting of a tweet, which helps quickly sharing a given tweet. Moreover,
Twitter Trends (TT) are topics that have become immediately popular (as op-
posed to topics have been popular for a while or on a daily basis)4 .
    We initially collect Brazil’s top 50 TTs. For each TT, we collect 100 most
popular tweets. For each tweet, we collect 100 most recent RTs. The number
of collected posts is 100 due to search limitations in the Twitter API5 . We
collect continuously for seven days (2018/04/20 to 2018/04/26), to acquire a
more significant amount of data.
    After cleaning and data processing, the dataset6 contains 165,936 and 371,612
distinct users and messages, respectively. In this fase, 1,648 distinct subjects
appeared in TTs. In the top 10 TTs collected, there are subjects related to TV,
entertainment, musicians, sports, and a single related to commemorative dates
(“Tiradentes”). There is no political or economical subject in the overall top 10
(surprinsingly given the current Brazilian crisis), as presented in Table 1.
    Now, considering only political/economic TT, the top 10 topics regard the
current Brazilian political crises and the upcoming presidential election in Octo-
ber 2018, as shown in the bottom half of Table 1. In summary, the TT cover: Lula
Livre and Prisão de Lula are popular clamor over the arrest of former Brazil-
ian president Luı́s Inácio Lula da Silva; Ciro Nogueira, Palloci, Rocha Loures,
Mantega, and Temer are Brazilian politicians who are currently targets for cor-
ruption investigations; Odebrecht is a large Brazilian construction company, also
investigation target; Marcos Valério is a publicist involved (and delator) in cor-
ruption schemes; PSDB is a political party; and STF and STJ are the federal
supreme and superior courts in Brazil.

3.2   Twitter User Types
We categorize Twitter users in two ways. The first type is called posting cate-
gories and regards user behavior over posting tweets and retweets. There are
3
  https://www.twitter.com
4
  https://help.twitter.com/en/using-twitter
5
  https://developer.twitter.com/en/docs
6
  Dataset available at http://homepages.dcc.ufmg.br/~mirella/projs/apoena/
  datasets.html
              The Role of Social Capital in Information Diffusion over Twitter     7


             Table 1: Number of occurrences of top TTs over dataset.
             Rank          TT                Topic       Occurrences
               1o #MaratonaRedeBBB TV, Entertainment         26
               2o     Júlio César         Football         24
                           Biel            Musician          24
                  #PowerCoupleBrasil TV, Entertainment       24
               5o     Tiradentes     Commemorative date      20
               6o    #FinalBBB18      TV, Entertainment      19
                      Diego Souza           Football         19
               8o      Ferrugem            Musician          17
               9o   Dia de Grêmio          Football         16
              10o #SuperligaNoSporTV         Sports          15
              81o      Lula Livre     Political/Economic      7
                     Ciro Nogueira    Political/Economic      7
                        Palocci       Political/Economic      7
             138o   Lula e Mantega    Politica/Economic       5
                     Rocha Loures     Political/Economic      5
                    Marcos Valério   Political/Economic      5
                    Temer e PSDB      Political/Economic      5
                       Odebrecht      Political/Economic      5
                    Prisão de Lula   Political/Economic      5
             180o     STJ e STF       Political/Economic      4



users who tweet more than retweet (providers), users who retweet more than
tweet (spreaders), and users who balance both actions (neutral ). Therefore, we
define p ratio as being the ratio between the tweets (out-degree) of a user, and
the total number of tweets and retweets (degree), as shown in Equation 5. Thus,
users can be spreaders (0 ≤ p ratio ≤ 0.25), neutral (0.25 < p ratio < 0.75), or
providers (0.75 ≤ p ratio ≤ 1). In our specific dataset (Section 3.1), most users
are spreaders (164,884 users), followed by providers (1,043 users), and then neu-
tral (9 users).

                                           out-degree
                         p ratio =                          .                    (5)
                                     out-degree + in-degree


   The second way, called social categories, distributes users into seven types
regarding the social characteristics observed in the user’s timeline on Twitter:

       i Potential bots for users who have Botometer Score 7 bigger then 2.5;
      ii Independent experts for users who are experts in a given theme, which
         usually include journalists acting independently;
     iii Fake accounts for users who assume a false identity, passing through an-
         other personality;
7
    Botometer checks the activity of a Twitter account and gives it a score based on
    how likely the account is to be a bot. Botometer Score ranges from 0 to 5. Higher
    scores are more bot-like (http://botometer.iuni.iu.edu) [12].
8        H. Sandim, D. Azevedo, Ana P. C. da Silva, M. M. Moro

     iv Fan accounts for users who identify themselves as celebrity fan accounts,
        or use the social network just to demonstrate their fanaticism (about a
        football team or a political party for example);
      v Primary media for users who represent the major channels of communi-
        cation in the Brazilian context of news publishing;
     vi Secondary media for users who act as lower relevance media, comparing
        to primary media; and
    vii Typical users for other users who do not fall into the previous types.


3.3     Implementations and Parameters Setting

Our experiments use implementations8 and parameters settings as follows.

      – DC: we implement DC in Python9 , and we run it setting p=0.5 (moder-
        ated decay) and T =d=8 (maximum endurance). The DC calculated values
        are normalized (0 ≤ DC(i) ≤ 1);
      – EC: we use Python NetworkX10 library, which is an iterative algorithm
        with two parameters: (i) max iter = 100, which defines the maximum
        number of iterations in power method; and (ii) ε =1e-06, which is the error
        tolerance used to check convergence in power method iteration. Calculated
        values are also normalized (0 ≤ EC(i) ≤ 1);
      – BC: we use Brandes Algorithm[8] with normalization (0 ≤ BC(i) ≤ 1);
      – NO: we implement NO in Python; and
      – HITS: we use Python NetworkX library, an iterative algorithm with three
        parameters: (i) max iter = 100, which defines the maximum number of
        iterations in power method; (ii) ε =1e-06, which is the error tolerance used
        to check convergence in power method iteration; and (iii) normalized =
        T rue, which normalizes results by the sum of all of the values.


3.4     Network Modeling

We model the online social network as a directed graph G, where nodes u and v
(u, v ∈ V (G)) represent Twitter users and an edge (v → u) ∈ E(G) means that
user u retweeted a message from user v. Such a modeling provides social capital
for users who publish (out-degree) or retweet (in-degree) messages. The graph
G built from our dataset has |V (G)|=165,936 nodes and |E(G)| =280,898 edges.
    Fig. 1 shows a toy example of our network modeling. It illustrates providers
(red color nodes), spreaders (blue color nodes), and neutral (yellow color nodes).
Another example is given by Fig. 2 with a small piece of our dataset that shows
a potential bot (@Felipe 100) acting on the network.
8
   Code available at http://homepages.dcc.ufmg.br/~mirella/projs/apoena/
   datasets.html
 9
   https://python.org
10
   https://networkx.github.io/documentation
             The Role of Social Capital in Information Diffusion over Twitter   9




Fig. 1: Toy example with the three types of nodes regarding posting behavior.




      Fig. 2: Example of a potential bot (@Felipe 100) retweeting a message.


4     Data Analysis

Here, we analyze and compare metrics for social capital facets regarding the
information diffusion process to explore the importance of primary media and
independent experts compared to typical users, fan accounts and potential bots
in the Brazilian context of information diffusion. First, we analyze our types of
users according to the information capital measures (Section 4.1), an important
contribution of our work in the context of misinformation in urban centers. Then,
we expand such evaluation by considering the other forms of calculating social
capital measure regarding: bridging (Section 4.2), brokerage (Section 4.3), and
hubs and authority index (Section 4.4).


4.1     Information Capital Measures

We start our analyses with Decay Centrality (DC), which favors users that reach
the largest number of neighbors in up to T hops. Thus, the providers that trans-
mit information for other providers have higher DC. Table 2 presents the results,
in which sports and entertainment news providers stand out (seven out of 10).
Therefore, users have high EC if they are related to other users who are also well
10        H. Sandim, D. Azevedo, Ana P. C. da Silva, M. M. Moro


                        Table 2: Top 10 users for Decay Centrality
             Rank       User        Posting Category             Social Category              DC
               1o     @Flamengo          provider           primary media (sports news)        1
                o
               2   @FoxSportsBrasil      provider           primary media (sports news)      0.951
                o
               3        @bbb             provider      primary media (entertainment news) 0.862
               4o   @Esp Interativo      provider           primary media (sports news)      0.672
                o
               5    @VascodaGama         provider           primary media (sports news)      0.610
                o
               6   @globoesportecom      provider           primary media (sports news)      0.563
               7o    @HugoGloss          provider    independent expert (entertainment news) 0.350
               8o  @MomentsBrasil        provider        independent expert (general news)   0.335
               9o        @g1             provider          primary media (general news)      0.323
                 o
              10     @RedeGlobo          provider       primary media (telecommunication)    0.305




                         Table 3: Top 10 users for EigenCentrality
                      Rank       User       Posting Category Social Category EC
                        1o  @ClaraGuimarae       spreader      potential bot   1
                        2o    @hey dann          spreader       fan account  0.723
                         o
                        3  @pastorasandram5      spreader       typical user 0.699
                        4o  @JonahWhite30        spreader       typical user 0.675
                         o
                        5     @ DiasFabio        spreader       typical user 0.673
                         o
                        6    @AntroReality       spreader       fan account  0.666
                         o
                        7  @PedroAlvesFer12      spreader       typical user 0.665
                        8o @jeanbuenodumke       spreader      potential bot 0.652
                        9o    @arte prima        spreader       typical user 0.647
                       10o   @limalblue ofc      spreader       typical user 0.645




connected to the network. In our context, users who retweet the highly retweeted
information are privileged. Hence, typical users, fan accounts and potential bots
stand out, as the results in Table 3. Unfortunately, these are the user types that
contribute to turning a subject into a trend, depreciating the quality of the main
information served on Twitter, in the midst of the Brazilian upcoming elections.
    As seen earlier, DC and EC are centrality metrics that reveal information
capital of individuals in the network. However, both metrics present different
results. DC reveals the main providers because they are nodes that start long
sequences of broadcasts and relays. Meanwhile, EC reveals the main spreaders
because they retweet many messages of important nodes (providers). Therefore,
DC and EC do not measure the same event, and there is no correlation between
them (ρ = −0, 095, p-value = 0)11 .

4.2     Bridging Capital Measure
Our modeling process induces the formation of disconnected components because
we create a component for each message, where this message points to its RTs.
However, users tend to connect as a new TT arises. Then, bridges link weakly
connected components, allowing broadcast through such weak tie. Overall, in
our dataset, there are seven weakly connected components and 165,901 strongly
connected components.
    Fig 3 shows a subgraph that contains the top weak ties (red color edges) and
their neighborhood. Each weak tie is an edge (v → u), where v is a provider (rep-
resented as red nodes), and u is a spreader (represented as blue nodes). Nodes
11
     ρ is Spearman Rank Correlation Coefficient [41]
            The Role of Social Capital in Information Diffusion over Twitter     11




Fig. 3: Example of weak ties (bridges) in our dataset emphasized by red edges.


u and v are bridges because they allow the connection between u and v’s neigh-
borhoods. Since bridging capital refers to the ties’ strength, it does not make
sense to correlate it with the aforementioned node centrality metrics. Specifi-
cally in our dataset, bridging capital highlights five political party fan accounts
as spreaders 12 , two independent experts as spreaders 13 , three primary media as
providers 14 , and three independent experts as providers 15 . Moreover, the poten-
tial bot “@paiva tv” stands out acting as secondary media in the network.
    Bridging and brokerage are similar in the sense that both focus on the singu-
lar position of an individual in the network. Nevertheless, they are still different
and measured through distinct forms.


4.3   Brokerage Capital Measure

Nodes with high BC are important brokers in communication and information
diffusion [21,40]. Table 4 shows that BC presents the primary media and in-
dependent experts as main brokers in the Brazilian information diffusion con-
text, where six out of 10 users belong to the “Grupo Globo”16 , and all top 10
are providers. Thus, the information tends to circulate in the network passing
through the providers, mainly belonging to a single group. We also analyze the
12
   @cmen2908, @ViniciusNikolod, @SergioPRibeiroF, @odio nao, @moema4
13
   @djivanrodrigues, @jornalistavitor
14
   @drauziovarella, @planalto, @AssembleiaRS
15
   @betepachecoGN, @ricardolay, @DanielaLima
16
   Grupo Globo, the largest media broadcaster in Brazil, http://grupoglobo.globo.
   com/ingles
12      H. Sandim, D. Azevedo, Ana P. C. da Silva, M. M. Moro


             Table 4: Top 10 users for Betweenness Centrality (BC)
           Rank     User        Posting Category            Social Category               BC
            1o   @RedeGlobo         providers       primary media (telecommunication)      1
             o
            2        @g1            providers          primary media (general news)      0.619
             o
            3   @Esp Interativo     providers          primary media (sports news)       0.464
            4o   @SBTonline         providers       primary media (telecommunication)    0.450
             o
            5    @HugoGloss         providers    independent expert (entertainment news) 0.408
             o
            6      @gshow           providers      primary media (entertainment news) 0.323
            7o    @SporTV           providers          primary media (sports news)       0.267
            8o   @GloboNews         providers          primary media (general news)      0.225
            9o @MaisVoce Globo      providers          primary media (TV program)        0.126
                @gustavovillani     providers        independent expert (sports news)    0.126




correlation between BC and DC (not shown due to space constraints). There
is a weak positive correlation between them (ρ = 0, 322, p-value = 0), as both
measures identify information providers over network.

4.4   Hubs and Authority for Social Capital
The HITS algorithm [28] calculates Hub Index (HI) and Authority Index (AI),
where HI measures social capital of users who produce much information (like
providers) and AI measures social capital of users who mostly retweet, that is,
users who spend a lot of time browsing the network (like potential bots). Table
5 presents the results in two parts. First, Table 5a shows the hubs as the main
providers, where the sports information providers stand out (nine out of 10).
Meanwhile, Table 5b shows the potential bots and football fan accounts stand out
(seven out of 10) as authorities. Interestingly, all football fan accounts are fans of
the “Clube de Regatas Flamengo (CRF)” (the largest football fan club in Brazil).
The potential bots suspended17 by Twitter were also CRF fans. Probably, the
suspended accounts were linked to tweetdecking 18 practice to inflate popularity
(social capital facet).
    Furthermore, there is a weak positive correlation between AI and EC (ρ =
0, 261, p-value = 0) because both metrics use eigenvector concept and capture
similar events (not equivalent). As seen in Section 4.1, providers who broadcast
information for another providers have high DC values (like a hub). Hence, there
is a very strong positive correlation between HI and DC (ρ = 0, 995, p-value = 0),
and there is a weak positive correlation between HI and BC (ρ = 0, 324, p-
value = 0) – not illustrated due to space constraints.

5     Conclusion
Categorizing users in OSNs is an important task for understanding how infor-
mation spreads over society. Humans as data sources can streamline the shar-
ing of relevant events, whether for emergency events management, crime de-
tection, urban administration, intelligent transportation, smart traffic control,
17
   https://help.twitter.com/pt/managing-your-account/
   suspended-twitter-accounts
18
   http://www.newsweek.com/tweetdecking-why-twitter-suspended-multiple-accounts-840031
            The Role of Social Capital in Information Diffusion over Twitter                 13


                  Table 5: Top 10 users for Hubs and Authority

                                     (a) Hub Index (HI)
          Rank       User        Posting Category          Social Category            HI
            1o    @Flamengo           provider        primary media (sports news)      1
             o
            2   @FoxSportsBrasil      provider        primary media (sports news)    0.837
             o
            3   @Esp Interativo       provider        primary media (sports news)    0.498
            4o @globoesportecom       provider        primary media (sports news)    0.426
             o
            5   @VascodaGama          provider        primary media (sports news)    0.243
             o
            6        @bbb             provider    primary media (entertainment news) 0.185
             o
            7   @venecasagrande       provider     independent expert (sports news) 0.176
            8o     @SporTV            provider        primary media (sports news)    0.160
             o
            9    @maurocezar          provider     independent expert (sports news) 0.108
              o
           10   @lucaspedrosaEI       provider     independent expert (sports news) 0.101


                                  (b) Authority Index (AI)
              Rank       User        Posting Category     Social Category        AI
                1o @MarcosA22444338       spreader      (football) fan account    1
                  o
                2     @ DiasFabio         spreader           typical user       0.995
                  o
                3   @jhonatalima355       spreader      (football) fan account 0.904
                4o    @PHRN1895           spreader    potential bot (suspended) 0.889
                  o
                5    @ arthurpassos       spreader          potential bot       0.880
                6o  @PedroAlvesFer12      spreader           typical user       0.879
                7o     @Rgo17             spreader    potential bot (suspended) 0.870
                8o @FlavioM32255797       spreader      (football) fan account 0.859
                9o     @RlPenha           spreader           typical user       0.851
               10o   @AllanRN1981         spreader      (football) fan account 0.842




public healthcare, or even political engagement. It is a matter of major concern
when bots or malicious users assume such activities.

   Here, we analyzed the Twitter Brazilian users behavior in publishing and
sharing Twitter trend topics to understand how important information flows
over the network. We do so by addressing the social capital forms as related
to individuals’ abilities to acquire and spread information. We analyzed and
compared social capital metrics to verify the importance of the primary media
and independent experts compared to typical users, fan accounts, and potential
bots, in the Brazilian information diffusion context.

    In general, potential bots and fan accounts are users who spread information
through retweets, and they are the main authorities in the social network. Poten-
tial bots have automated behavior. Then, they can be programmed for malicious
purpose. Furthermore, fan accounts exacerbate a fanaticism sentiment. For in-
stance, the retweets may inflate the “hate speech” or spread fake news. This is
very concerning given the upcoming Brazilian presidential election in October
2018. However, despite the Media Groups monopoly, we also found the primary
media and independent experts as the main information providers, which may
represent there is still hope in controlling misinformation over the network.

Acknowledgements. Work funded by CAPES, CNPq and FAPEMIG, Brazil.
14      H. Sandim, D. Azevedo, Ana P. C. da Silva, M. M. Moro

References
 1. Ahsan, M., Kumari, M., Singh, T., Pal, T.L.: Sentiment based information diffusion
    in online social networks. IJKDB 8(1), 60–74 (2018)
 2. Barabási, A.; Frangos, J.: Linked: the new science of networks science of networks.
    Perseus Books Group (2014)
 3. Bertolini, S., Bravo, G.: Social capital, a multidimensional concept. In: Euresco
    Conference (2001)
 4. Bonacich, P.: Power and centrality: A family of measures. American journal of
    sociology 92(5), 1170–1182 (1987)
 5. Borgatti, S., Jones, C., Everett, M.: Network measures of social capital. Connec-
    tions 21(1), 27–36 (1998)
 6. Bourdieu, P.: The forms of capital. In: Richardson, J. (ed.) Handbook of Theory
    and Research for the Sociology of Education, pp. 241–258. Greenwood, New York
    (1986)
 7. Brandão, M.A., Moro, M.M.: Analyzing the strength of co-authorship ties with
    neighborhood overlap. In: International Conference on Database and Expert Sys-
    tems Applications. pp. 527–542. Springer (2015)
 8. Brandes, U.: A faster algorithm for betweenness centrality. Journal of Mathemat-
    ical Sociology 25(2), 163–177 (2001)
 9. Burt, R.S.: Brokerage and closure: An introduction to social capital. Oxford Uni-
    versity Press (2005)
10. Burt, R.S.: Structural holes: The social structure of competition. Harvard Univer-
    sity Press (2009)
11. Coleman, J.S.: Social capital in the creation of human capital. American Journal
    of Sociology 94, S95–S120 (1988)
12. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: Botornot: A system
    to evaluate social bots. In: Int’l Conf on World Wide Web, Companion Volume.
    pp. 273–274. Montreal, Canada (2016)
13. Doran, D., Severin, K., Gokhale, S., Dagnino, A.: Social media enabled human
    sensing for smart cities. AI Communications 29(1), 57–75 (2016)
14. Easley, D., Kleinberg, J.: Networks, crowds, and markets: Reasoning about a highly
    connected world. Cambridge University Press (2010)
15. Freeman, L.C.: A Set of Measures of Centrality Based on Betweenness. Sociometry
    40(1), 35–41 (1977)
16. Gao, S., Janowicz, K., Couclelis, H.: Extracting urban functional regions from
    points of interest and human activities on location-based social networks. Trans.
    GIS 21(3), 446–467 (2017)
17. Granovetter, M.S.: The Strength of Weak Ties. vol. 78, p. 1360–1380. JSTOR
    (1973)
18. Hu, Y., Song, R.J., Chen, M.: Modeling for information diffusion in online social
    networks via hydrodynamics. IEEE Access 5, 128–135 (2017)
19. Hui, C., Goldberg, M., Magdon-Ismail, M., Wallace, W.A.: Simulating the diffusion
    of information: An agent-based modeling approach. IJATS 2(3), 31–46 (2010)
20. Iannelli, F., Mariani, M.S., Sokolov, I.M.: Network centrality based on reaction-
    diffusion dynamics reveals influential spreaders. CoRR abs/1803.01212 (2018)
21. Jackson, M.O.: A typology of social capital and associated network measures.
    CoRR abs/1711.09504 (2018)
22. Jerônimo, C.L.M., Campelo, C.E.C., de Souza Baptista, C.: Using open data to
    analyze urban mobility from social networks. JIDM 8(1), 83 (2017)
             The Role of Social Capital in Information Diffusion over Twitter         15

23. Kadar, C., Brüngger, R.R., Pletikosa, I.: Measuring ambient population from
    location-based social networks to describe urban crime. In: Social Informatics. pp.
    521–535 (2017)
24. Kang, C., Kraus, S., Molinaro, C., Spezzano, F., Subrahmanian, V.: Diffusion
    centrality: A paradigm to maximize spread in social networks. Artificial Intelligence
    239, 70 – 96 (2016)
25. Kang, S., Shen, L.: A quantitative measure for meal-mate social capital networks.
    In: Int’l Conf. on Intelligent Environments. pp. 124–131 (2016)
26. Kim, J., Bae, J., Hastak, M.: Emergency information diffusion on online social
    media during storm cindy in U.S. Int J. Information Management 40, 153–165
    (2018)
27. Kim, J., Tabibian, B., Oh, A., Schölkopf, B., Gomez-Rodriguez, M.: Leveraging
    the crowd to detect and reduce the spread of fake news and misinformation. CoRR
    abs/1711.09918 (2017)
28. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM
    46(5), 604–632 (Sep 1999)
29. Kukka, H., Kostakos, V., Ojala, T., Ylipulli, J., Suopajärvi, T., Jurmu, M., Hosio,
    S.: This is not classified: everyday information seeking and encountering in smart
    urban spaces. Personal and Ubiquitous Computing 17(1), 15–27 (2013)
30. Li, M., Wang, X., Gao, K., Zhang, S.: A survey on information diffusion in online
    social networks: Models and methods. Information 8(4), 118 (2017)
31. Michalak, T.P., Rahwan, T., Moretti, S., Narayanam, R., Skibski, O., Szczepan-
    ski, P.L., Wooldridge, M.: A new approach to measure social capital using game-
    theoretic techniques. SIGecom Exchanges 14(1), 95–100 (2015)
32. Newman, N., Fletcher, R., Kalogeropoulos, A., Levy, D.A., Nielsen, R.K.: Reuters
    institute digital news report 2017. Reuters (2017)
33. Putnam, R.D.: Tuning in, tuning out: The strange disappearance of social capital
    in america. PS: Political science & politics 28(4), 664–683 (1995)
34. Putnam, R.D.: Bowling alone: the collapse and revival of american community. In:
    ACM Conference on Computer Supported Cooperative Work. p. 357 (2000)
35. Saito, K., Kimura, M., Ohara, K., Motoda, H.: Super mediator - a new centrality
    measure of node importance for information diffusion over social network. Inf. Sci.
    329(C), 985–1000 (2016)
36. Smarzaro, R., de Lima, T.F.M., Jr., C.A.D.: Could data from location-based social
    networks be used to support urban planning? In: Proceedings of the 26th Inter-
    national Conference on World Wide Web Companion, Perth, Australia, April 3-7,
    2017. pp. 1463–1468 (2017)
37. Srivastava, M., Abdelzaher, T., Szymanski, B.: Human-centric sensing. Phil. Trans.
    R. Soc. A 370(1958), 176–197 (2012)
38. Statista: Most famous social network sites worldwide as of april 2018, ranked
    by number of active users, https://www.statista.com/statistics/272014/
    global-social-networks-ranked-by-number-of-users, accessed: 2018-04-24
39. Stimmel, C.L.: Building smart cities: analytics, ICT, and design thinking. CRC
    Press (2015)
40. Tang, L., Liu, H.: Community detection and mining in social media. Synthesis
    lectures on data mining and knowledge discovery. Morgan and Claypool (2010)
41. Wayne, W.D., et al.: Applied nonparametric statistics. Boston, MA: PWS-Kent
    (1990)
42. Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: Concepts, method-
    ologies, and applications. ACM Trans. Intell. Syst. Technol. 5(3), 38:1–38:55 (2014)