=Paper= {{Paper |id=Vol-1558/paper25 |storemode=property |title=PageRank Revisited: On the Relationship between Node Degrees and Node Significances in Different Applications |pdfUrl=https://ceur-ws.org/Vol-1558/paper25.pdf |volume=Vol-1558 |authors=Jung Hyun Kim,K. Selcuk Candan,Maria Luisa Sapino |dblpUrl=https://dblp.org/rec/conf/edbt/KimCS16 }} ==PageRank Revisited: On the Relationship between Node Degrees and Node Significances in Different Applications== https://ceur-ws.org/Vol-1558/paper25.pdf
  PageRank Revisited: On the Relationship between Node
 Degrees and Node Significances in Different Applications                                                                                         ∗




                   Jung Hyun Kim                              K. Selçuk Candan                      Maria Luisa Sapino
               Arizona State University                     Arizona State University                   University of Torino
               Tempe, AZ 85287, USA                         Tempe, AZ 85287-8809                       I-10149 Torino, Italy
                 jkim294@asu.edu                              candan@asu.edu                    marialuisa.sapino@unito.it


ABSTRACT                                                                    dation tasks [1, 7, 9, 12, 26]. The significance of a node in a given
Random-walk based techniques, such as PageRank, encode the                  graph often needs to reflect the topology of the graph. Measures
structure of the graph in the form of a transition matrix of a stochas-     like the betweenness measure [27] and the centrality/cohesion [5],
tic process from which significances of the graph nodes can be in-          help quantify how significant any node is on a given graph based on
ferred. Recommendation systems leverage such node significance              the underlying graph topology. The betweenness measure [27], for
measures to rank the objects in the database. Context-aware recom-          example, quantifies whether deleting the node would disconnect or
mendation techniques complement the data graph with additional              disrupt the graph. Centrality/cohesion [5] measures quantify how
data that provide the recommendation context. However, despite              close to a clique the given node and its neighbors are. Other au-
their wide-spread use in many graph-based knowledge discovery               thority, prestige, and prominence measures [1, 5, 6] quantify the
and recommendation applications, conventional PageRank-based                significance of the node through eigen-analysis or random walks,
measures have various shortcomings. As we experimentally show               which help measure how reachable a node is in the graph.
in this paper, one such shortcoming is that PageRank scores are             1.1 PageRank as a Measure of Significance
tightly coupled with the degrees of the graph nodes, whereas in                Since enumerating all paths among the graph nodes would re-
many applications the relationship between the significance of the          quire time exponential in the size of the graph, random-walk based
node and its degree in the underlying network may not be as im-             techniques encode the structure of the network in the form of a tran-
plied by PageRank-based measures. In fact, as we also show in               sition matrix of a stochastic process from which the node signifi-
the paper, in certain applications, the significance of the node may        cance can be inferred.PageRank [6] is one of the most widely-used
be negatively correlated with the node degree and in such appli-            random-walk based methods for measuring node significance and
cations a naive application of PageRank may return poor results.            has been used in a variety of application domains, including web
To address these challenges, in this paper, we propose degree de-           search, biology, and social networks. The basic thesis of PageR-
coupled PageRank (D2PR) techniques to improve the effectiveness             ank is that a node is important if it is pointed to by other important
of PageRank based knowledge discovery and recommendation sys-               nodes – it takes into account the connectivity of nodes in the graph
tems. These suitably penalize or (if needed) boost the transition           by defining the score of the node vi ∈ V as the amount of time
strength based on the degree of a given node to adapt the node sig-         spent on vi in a sufficiently long random walk on the graph. More
nificances based on the network and application characteristics.            specifically, given a graph G(V, E), the PageRank scores are rep-
                                                                            resented as ~r, where
1.    INTRODUCTION
   In recent years, there has been considerable interest in measuring                              ~r = αTG~r + (1 − α)~t
the significance of a node in a graph and relatedness between two
                                                                            where TG is a transition matrix corresponding to the graph G, ~t is
nodes in the graph, as if measured accurately, these can be used
                                                                            a teleportation vector (such that ~t[i] = kV1 k ), and α is the residual
for supporting many knowledge discovery, search, and recommen-
                                                                            probability (or equivalently, (1 − α) is the so-called teleportation
∗                                                                           probability). Unless the graph is weighted, the transition matrix,
  This work is supported by NSF Grants 1339835 ”E-SDMS: Energy Sim-
ulation Data Management System Software”, 1318788 ”Data Manage-             TG , is constructed such that for a node v with k (outgoing) neigh-
ment for Real-Time Data Driven Epidemic Spread Simulations”, 1518939        bors, the transition probability from v to each of its (outgoing)
”RAPID: Understanding the Evolution Patterns of the Ebola Outbreak in       neighbors will be 1/k. If the graph is weighted, then the transi-
West-Africa and Supporting Real-Time Decision Making and Hypothesis
Testing through Large Scale Simulations”, and 1430144 ”Fraud Detection      tion probabilities are adjusted in a way to account for the relative
via Visual Analytics: An Infrastructure to Support Complex Financial Pat-   weights of the (outgoing) edges.
terns (CFP) based Real-Time Services Delivery”.
                                                                            1.2 Tight Coupling of PageRank Scores of
                                                                                Nodes and their Degrees
                                                                               Let us consider an undirected graph G(V, E). There are two
                                                                            factors that contribute to the PageRank of a given node, v ∈ V :
                                                                                • Factor 1: Significance of Neighbors: The more significant
c 2016, Copyright is with the authors. Published in the Workshop Pro-             the neighbors of a node are, the higher its likelihood to be
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-             also significant.
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this
paper is permitted under the terms of the Creative Commons license CC-          • Factor 2: Number of Neighbors (Degree of the Node) : Even
by-nc-nd 4.0                                                                      if the neighbors are not all significant, a large number of
                         Listener Graph    Article Graph    Movie Graph
        Data Set           (Friendship       (co-author    (co-contributor
                                                                             of PageRank based knowledge discovery and recommendation sys-
                         edges, Last.fm)   edges, DBLP)     edges, DBLP)     tems. These techniques suitably penalize or (if needed) boost1 the
  Correlation between         0.988            0.997            0.848        transition strength based on the degree of a given node to adapt the
  PageRank and Degree                                                        node significances based on the network and application charac-
Table 1: Spearman’s rank correlation between the node degree                 teristics. This paper is organized as follows: Next, we discuss the
ranks and the node ranks’ based on PageRank scores for vari-                 related literature. In Sections 3, we introduce the proposed degree-
ous data graphs (see Section 4 for details of the data sets)                 decoupled PageRank techniques. We evaluate the proposed tech-
                                                                             niques in Section 4 and conclude in Section 5.
       neighbors would imply that the node, v, is well-connected
       and, thus, likely to be structurally important.                       2. RELATED WORKS
In theory, these two factors should complement each other. In prac-
tice, however, the PageRank formulation described above implies
                                                                             2.1 Context-Sensitive PageRank
that there is a very tight coupling between the degrees of the nodes            Path-length based definitions of node relatedness, such as those
in the graph and their PageRank scores (see Table 1).                        proposed by [4, 24] help capture the relatedness of a pair of nodes
                                                                             solely based on the properties of the nodes and edges on the shortest
1.2.1 Problem I: When a Large Node Degree Does                               path between the pair. Random-walk based definitions, such as
      Not Indicate High Node Significance                                    hitting distance [10,21] and personalized page rank (PPR) score [1,
   In this paper, we highlight (and experimentally show) that,               9, 16], of node relatedness further take into account the density of
in many applications, node degree and node significance are in fact          the edges: as in path-length based definitions, random-walk based
inversely related and that the tight-coupling between node degrees           definitions also recognize that a node is more related to another
and PageRank scores might be counter-productive in generating ac-            node if there are short paths between them; however, random walk-
curate recommendations.                                                      based definitions of relatedness also consider how well the given
   E XAMPLE 1. Consider, for example, a recommendation appli-                pair of nodes are connected.
cation where a movie graph, consisting of movie and actor                       In [7], authors construct a transition matrix, TS , where edges
nodes, is used for generating movie recommendations. In this ap-             leading away from the seed nodes are weighted less than those
plication, the first factor (significance of neighbors) clearly has a        edges leading towards the seed nodes. An alternative approach for
positive contribution: a movie with good actors is likely to be a            contextualizing PageRank scores is to use the PPR techniques [1,9]
good movie and an actress playing in good movies is likely to be             discussed in the introduction. One key advantage of this tele-
a good actress. On the other hand, the second factor (number of              portation vector modification based approach over modifying the
neighbors) may in fact be a negative contributor to node signif-             transition matrix, as in [7], is that the term α can be used to di-
icance: the fact that an actor has played in a large number of               rectly control the degree of seeding (or personalization) of the PPR
movies may be a sign that he is a non-discriminating (’B movie’)             score. [10, 21] rely on a random walk hitting time based approach,
actor, whereas an actress with relatively fewer movies may be a              where the hitting time is defined as the expected number of steps a
more discriminating (’A movie’) actress.                                     random walk from the source vertex to the destination vertex will
                                                                             take. [17] leveraged these properties of PPR to develop locality-
As we see in Section 4, this observation turns out to be true in many        sensitive algorithms to rank nodes of graphs which are relative to a
applications, where (a) acquiring additional edges has a cost that is        given set of seed nodes efficiently.
correlated with the significance of the neighbor (e.g. the effort one
needs to invest to a high quality movie) and (b) each node has a             2.2 Improvements to the PageRank Function
limited budget (e.g. total effort an actor/actress can invest in his/her        Due to the obvious relationship between ranking and monetary
work).                                                                       rewards (e.g. through selling of advertisements on web search ap-
1.2.2 Problem II: When PageRank Does Not Suffi-                              plications), there has been considerable effort in engineering (or
      ciently Account for Contributions of Degrees                           manipulating) graphs in a way to maximize ranking scores of par-
                                                                             ticular nodes. This is commonly referred to as PageRank optimiza-
   The mismatch between PageRank and node significance is not
                                                                             tion. One way to achieve this goal is carefully adding or removing
limited to the cases where node degrees are inversely related to the
                                                                             certain links: If, for example, one or more colluding webmasters
node significance. As we see in Section 4, there are other scenarios
                                                                             can add or remove edges, PageRank scores of target web pages
where PageRank may, in fact, fail to sufficiently account for the
                                                                             or domains can be increased [23]. [20] established several bounds
contribution of the node degrees to their significances.
                                                                             indicating to what extent the rank of the pages of a website can
1.3 PageRank Revisited: De-coupling Node                                     be changed and the authors derived an optimal referencing strat-
    Significance from Node Degrees                                           egy to boost PageRank scores. A related, but opposite, problem is
                                                                             to protect the PageRank scores against negative links (which may
   As we discussed above, one key shortcoming of the conventional            indicate, for example, negative influence or distrust in a social net-
PageRank scores is that they are often tightly coupled with the de-          work), artificial manipulation, and spam. [3], for example, focused
grees of the graph nodes and in many applications the relationship           on identifying spam pages and link farms and showed that better
between the significance of the node and its degree in the underly-          PageRank scores can be obtained after filtering spam pages and
ing network may not be as implied by PageRank-based measure: in              links. In [14], authors show that PPR algorithms that do not dif-
certain applications, the significance of the node may be negatively         ferentiate among the seed nodes may not properly rank nodes and
correlated with the node degree, whereas in others PageRank may              present robust personalized PageRank (RPR) strategies, which are
not be sufficient in accounting for degree contributions. Naturally,         insensitive to noise in the set of seed nodes.
in such applications a naive application of PageRank in generating
recommendations may return poor results.                                     1
                                                                              In this context, de-coupled does not necessarily imply de-
   To address these challenges, in this paper, we propose degree de-         correlated. In fact, D2PR can boost correlation between node de-
coupled PageRank (D2PR) techniques to improve the effectiveness              gree and PageRank if that is required by the application.
   There are some efforts to change the impact of degrees on the                     B      E            Dest.   deg.        Transition probability
PageRank computation. [2] proposed a way to boost the power of                                            vj     (vj )    from A to its neighbors vj
                                                                                A    C      F                             p=0         2       −2
low-degree nodes in a network. The impact from nodes which are
                                                                                                          B        2       0.33     0.18      0.29
important but are not hubs is relatively small compared to other                     D                    C        3       0.33     0.08      0.64
nodes which are less important with high degrees. To boost the                                            D        1       0.33     0.74      0.07
low-degree important nodes for equal opportunity, the teleportation
                                                                           (a) A sample graph             (b) Transition probabilities from A
vector is modified with being proportional to the degrees of nodes.
[11] boosted the degrees of nodes to reduce the expected cover time       Figure 1: In conventional PageRank (p = 0), the transition
of the entire graph by the biassed random-walk.                           probabilities from node vi = A to all its neighbors vj are the
                                                                          same. In degree de-coupled PageRank (D2PR), the value of p
3.     DEGREE DE-COUPLED PAGERANK                                         can be used to penalize (p > 0) or boost (p < 0) transition
   The key difficulty of de-coupling node degrees from the PageR-         probabilities based on the degree of the destination
ank scores is that the definition of the PageRank, based on random                                               Ranks of the graph nodes
walk transitions, is inherently dependent on the number of transi-                   node        node      for different de-coupling weights (p)
tions available from one node to the other. As we mentioned above,                     id       degree    −4        −2        0       2        4
the more ways there are to reach into a node, the higher will be its                53608        883       1         1       69     5549     6793
                                                                                     351         739       2        12       425    1992     1935
PageRank score.                                                                       ...         ...     ...       ...      ...     ...      ...
                                                                                    79538          1     7661      7545     4149     195      182
3.1 Desideratum                                                                     79917          1     7793      7790     7522    2443     2043
   Therefore, to de-couple the PageRank score from node degrees,
we need to modify the transition matrix. In particular, for each node     Table 2: Ranks of graph nodes of different degrees on a sam-
vi in the graph, we would like to be able to control the transition       ple graph for different de-coupling weights, p: as we see in this
process with a single parameter (p), such that                            figure, when p > 0, high degree nodes are pushed down in the
                                                                          rankings (reducing the correlation between degree and rank),
     • if p ≪ −1, transitions from node vi are ∼ 100% towards
                                                                          while when p < 0, they are pulled up (improving the correla-
       the neighbor with the highest degree,
     • if p = −1, transition probabilities from node vi are propor-       tion between degree and rank)
       tional to the degrees of its neighbors,                                • p ∈ R is a degree de-coupling weight.
     • if p = 0, the transition probabilities mirror the standard
       PageRank probabilities (assuming undifferentiated neigh-           Intuitively, the numerator term, deg(vj )−p , ensures that the edge
       bors),                                                             incoming to vj is weighted by its degree: if p > 0, then its de-
     • if p = 1, transition probabilities from node vi are inversely      gree negatively impacts (reduces) transition probabilities into vj , if
       proportional to the degrees of its neighbors,                      p < 0 then its degree positively impacts (boosts2 ) transition prob-
     • if p ≫ 1, transitions from node vi are ∼ 100% towards the          abilities into vj , and if p = 0, we obtain the standard PageRank
       neighbor with the lowest degree.                                   formulation without degree de-coupling. In other words, the tran-
                                                                          sition function satisfies our desideratum of de-coupling the tran-
In other words, the transition function should de-couple the transi-      sition process from node-degrees and penalizing or boosting the
tion process from node-degrees and penalize or boost the contribu-        contributions of node degrees on-demand. Note that, since all
tions of node degrees in the transition process, as needed.               transitions from the node vi are degree de-coupled individually
                                                                          based
                                                                          P       on the degrees of their destinations, the denominator term,
3.2 Degree De-coupling Transition Matrix                                                                 −p
                                                                              vk ∈neighbor(vi ) deg(vk )    , ensures that the transition probabil-
   In this subsection, we will consider degree de-coupling of the         ities from node vi add up to 1.0. Note also that when there is no
transition matrix as implied by the above desideratum.                    edge between node vi and vj , TD (j, i) = 0 and, consequently, the
                                                                          term TD (j, i) is not affected by the degree de-coupling process.
3.2.1 Undirected Unweighted Graphs
   Let G = (V, E) be an undirected and unweighted graph. Let                 E XAMPLE 2. Figure 1 shows how the random walk probabili-
                                                                          ties are differentiated in a degree de-coupled transition matrix on
α also be a given residual probability parameter, and deg(v) be a
function which returns the number of edges on the node v. We              a sample graph where a node A has three neighbors, B (with de-
represent degree de-coupled PageRank (D2PR) scores in the form            gree 2), C (with degree 3), and D (with degree 1). In conven-
of a vector                                                               tional PageRank, the transition probabilities from node A to all its
                                                                          neighbor nodes are equal to 0.33. In degree de-coupled PageRank
                      d~ = αTD d~ + (1 − α)~t,                            (D2PR), however, the value of p is used for explicitly accounting
                                                                          for the impact of node degree on the transition probabilities: When
where ~t is the teleportation vector, such that ~t[i] = kV1 k for all i
                                                                          p = 2, the transition probabilities from A to its neighbors are 0.18,
and TD is a degree de-coupled transition matrix,                          0.08, and 0.74, which penalizes nodes which have larger degrees,
                                   deg(vj )−p                             whereas when p = −2, D2PR boosts the transition probabilities
            TD (j, i) = P                              −p
                                                          ,        (1)    to large degree nodes leading to transition probabilities 0.29, 0.64,
                            vk ∈neighbor(vi ) deg(vk )
                                                                          and 0.07, respectively.                                            ⋄
where                                                                        This example shows that, in degree de-coupled PageRank
   • TD (j, i) denotes the degree de-coupled transition probabil-         (D2PR), as we also see in Table 2, the value of p can be used to
      ity from node vi to node vj over an edge eij = [vi → vj ]           penalize (p > 0) or boost (p < 0) transition probabilities based on
      when there exists at least one edge between two nodes,              the degree of the destination, vj .
     • neighbor(vi ) is the set of all neighbors of the source node,      2
                                                                            In fact, a similar function was used in [11] to quickly locate nodes
       vi , and                                                           with higher degrees in a given graph.
3.2.2 Directed Unweighted Graphs                                            accounts for the connection strength (as in the conventional PageR-
    The semantics of degree de-coupling is slightly different in di-        ank) whereas TD is a degree de-coupled transition matrix,
 rected graphs. In particular, edges incoming to vi often do not re-
                                                                                                                 Θ(vj )−p
 quire a particular effort from vi to establish and hence are often out               TD (j, i) = P                                      −p
                                                                                                                                              ,
 of the control of vi , but indicate a certain degree of interestingness,                              [vi →vk ]∈out_edges(vi ) Θ(vk )
 usefulness, or authority as perceived by others. The same is not
                                                                            such that, TD (j, i) denotes the degree de-coupled transition prob-
 true for edges outgoing from vi ; in particular, a vertex with a large
                                                                            ability from node vi to node vj over an edge eij = [vi → vj ],
 number of outgoing edges may either indicate a potential hub or
                                                                            p ∈ R is a degree de-coupling weight, and
 simply indicate a non-discerning connection maker. The distinction
 between these two situations gains importance especially in appli-                                       X
                                                                                         Θ(v) =                       w(v → vh ).
 cations where establishing a new connection has a non-negligible
                                                                                                   [v→vh ]∈out_edges(v)
 cost to the source node and, thus, a large number of outgoing edges
 may indicate either (a) a very strong participant to the network or           Note that, above, β controls whether accounting for the connec-
 (b) a very poor participant with a large number of weak linkages.          tion strength or degree de-coupling is more critical in a given appli-
    Let G = (V, E) be a directed graph and for the simplicity of the        cation. In Section 4, we will study the impact of degree de-coupling
 discussion, without any loss of generality, let us assume that G is        in weighted graphs for different scenarios.
 unweighted. Let us also be given a residual probability parameter,
 α and let outdeg(v) be a function which returns the number of
 outgoing edges from the node v. The degree de-coupled PageRank             4. CASE STUDIES
 (D2PR) scores can be represented in the form of a vector d,      ~ d~ =       In this section, we present case studies assessing the effective-
       ~
 αTD d + (1 − α)~t, where ~t is the teleportation vector, such that         ness of the degree de-coupling process and the relationship between
~t[i] = 1 for all i and                                                     the degree de-coupling weight p and recommendation accuracy for
         kV k
                                                                            different data graphs.
                                   outdeg(vj )−p
        TD (j, i) = P                                        −p
                                                                ,           4.1 Setup
                        [vi →vk ]∈out_edges(vi ) outdeg(vk )                   For all experiments, the degree de-coupling weight, p, is varied
                                                                            between -4 and 4 with increments of 0.5. The residual probability,
where TD (j, i) denotes the degree de-coupled transition proba-
                                                                            α, is varied between 0.5 and 0.9, with default value chosen as 0.85.
bility from node vi to node vj over an edge eij = [vi → vj ],
                                                                            We also varied the β parameter, which controls whether accounting
out_edges(vi ) is the set of out-going edges from the source node,
                                                                            for the connection strength or degree de-coupling is more critical
vi , and p ∈ R is a degree de-coupling weight.
                                                                            in a given application, between 0.0 and 1.0, with the default value
   E XAMPLE 3. Figure 2 (a) in Section 4 provides an example il-            set to 0 (indicating full decoupling).
lustrating the correlations between the degree de-coupled PageR-            4.1.1 Datasets
ank (D2PR) scores and external evidence for different values of p
                                                                               Four real data sets are used for the experiments. Each data set
for some application: here, the higher the correlation, the better re-
                                                                            is used to create two distinct data graphs and corresponding ratings
sulting ranking reflects the application semantics. As we see in this
                                                                            data. Table 3 provides further details about the various graphs cre-
example, which we will investigate in greater detail in Section 4,
                                                                            ated using these four data sets. These recommendation tasks based
the optimal de-coupling weight is not always p = 0 as implied by
                                                                            on these data graphs are detailed below:
the conventional PageRank measure. In this particular case, for
example, the correlation between D2PR and external evidence of              • For the IMDB [15] data set, we created (a) a movie-movie graph,
significance is maximized when the de-coupling weight, p, is equal          where movie nodes are connected by an edge if they share common
to 0.5, implying that in this application a moderate degree of pe-          contributors, such as actors, directors, writers, composers, editors,
nalization based on the node degrees is needed to align PageRank            cosmetic designers, and producers and (b) an actor-actor graph
scores and application semantics.                                    ⋄      based on whether two actors played in the same movie. Appli-
                                                                            cations: For this data set, we consider applications where movies
3.2.3 Weighted Graphs                                                       are rated by the users: thus, we merged the IMDB data with the
   Once again, the semantics of degree de-coupling need to be re-           MovieLens 10M [22] data (based on movie names) to identify user
considered for weighted graphs. Let G = (V, E, w) be a directed,            ratings (between 1 and 5) for the movies in the graph. We con-
weighted graph, where w(e) is a function which returns the weight           sider the (a) average user rating as the significance of the movies
of the edge associated with edge e. It is important to note that, in        in the movie-movie graph and (b) average user rating of the movies
such a graph, the weight of an edge can 1) indicate the strength            played in as the significance of the actors in the actor-actor graph.
of the connection between two nodes (thus positively contributing           • For the DBLP [26] data set, we constructed (a) an article-article
to the significance of the destination node); and at the same time          graph where scientific articles were connected to each other if they
and 2) contribute to the degree of a node as a multiplier (thus posi-       shared a co-author and (b) an author-author graph based on co-
tively or negatively contributing to the node significance depending        authorship. Applications: (a) In the article-article graph, the num-
on the degree-sensitivity of the application). In other words, given        ber of citations to an article is used to indicate its significance. Sim-
an edge eij = [vi → vj ], from node vi to node vj , the transition          ilarly, (b) in the author-author graph, average number of citations
probability from vi to vj can be written as                                 to an author’s papers is used as his/her significance.
        T(j, i) = βTconn_strength (j, i) + (1 − β)TD (j, i),                • For the Last.fm [18], we constructed (a) a listener-listener graph,
                                                                            where the nodes are Last.FM listeners and undirected edges reflect
where
                                                                            friendship information among these listeners. We also constructed
                                          w(vi → vj )                       (b) an artist-artist graph based on shared listeners. Applications:
  Tconn_strength (j, i) = P                                            ,
                                [vi →vh ]∈out_edges(vi ) w(vi → vh )        (a) In the listener-listener graph, we considered the total listening
                  Data            Graph             # of       # of       Average     Standard deviation of   Median standard deviation of
                                                   nodes       edge     node degree       node degrees         neighbors’ node degrees
                 IMDB           movie-movie       191,602   4,465,272       23.30            51.86                       2.89
                                 actor-actor      32,208    2,493,574       77.42            67.15                      114.41
                 DBLP           article-article    8,808     951,798       108.06            171.25                     309.92
                               author-author      47,252     310,250        6.57              8.89                       6.39
                 Last.fm      listener-listener    1,892     25,434         13.44            17.31                       22.37
                                 artist-artist    17,626    2,640,150      149.79            299.66                     998.53
                Epinions   commenter-commenter     6,703    2,395,176      425.05            438.97                     609.39
                              product-product     13,384    2,355,460      175.99            224.12                     202.78
                                                       Table 3: Data sets and data graphs

activity of a given listener as his/her significance. (b) In the artist-       when the degrees are over-penalized (i.e., when p ≫ 0.5). The
artist graph, the number of times an artist has been listened is con-          Epinions product-product graph (based on common commenters,
sidered as his/her significance.                                               Figure 2(c)) also provides the highest correlations with p > 0, but
• For the Epinions [25]: We constructed (a) a commenter-                       behaves somewhat differently from the other two cases: the corre-
commenter graph based on the products on which two individuals                 lations stabilize and do not deteriorate significantly when degrees
both commented and (b) a product-product graph based on shared                 are over-penalized, indicating that the need for degree penalization
commenters. Applications: (a) For the nodes on the commenter-                  is especially critical in this case: this is due to the fact that, the
commenter graph, the number of trusts the commenter received                   larger the number of comments a product has, the more likely it
from others is used as his/her commenter significance. (b) For                 is that the comments are negative (Figure 5). In fact, we see that,
each product in the product-product graph, its average rating by               among the three graphs, this is the only graph where the traditional
the commenters is used as its node significance.                               PageRank (with p = 0) leads to negative correlations between
                                                                               node ranks and node significances.
4.2 Measures                                                                      These results indicate that actors who have had many co-actors,
   In this section, our goal is to observe the impact of differ-               commenters who commented on products also commented by
ent D2PR degree de-coupling weights on the relationship between                many others, or products which received comments from individ-
D2PR rankings and application specific significance measures for               uals who also commented on many other products are not good
the above data sets3 . We also aim to verify whether de-coupling               candidates for transition during random walk. This aligns with
weights can also be used to improve recommendation accuracies.                 our expectation that, in applications where each new movie role or
   In order to measure the relationship between the degree de-                 comment requires additional effort, high degree may indicate lower
coupled PageRank (D2PR) scores and the application-specific node               per-movie or per-comment effort and, hence, lower significance.
significance, we used Spearman’s rank correlation,
                      P                                                         4.3.2 Application Group B: When Conventional
                          (xi − x̄)(yi − ȳ)                                          PageRank is Ideal
                    pP i                          ,
                                 2              2
                                   P
                      i (xi − x̄)    i (yi − ȳ)                                  Figure 3 shows that, for movie-movie (based on common actors)
 which measures the agreement between the D2PR ranks of the                    and author-author (based on common articles) graphs, the peak
nodes in the graph and their application-specific significances.               correlation is at p = 0 indicating that the conventional PageRank
Here, x are rankings by D2PR and y are significances for an ap-                which gives positive weight to node degree, is appropriate.
plication and x̄ and ȳ are averages of two values.                               This perhaps indicates that movies with a lot of actors tend to
                                                                               be big-budget products and that authors with a large number of co-
4.3 Impact of De-Coupling in Different Appli-                                  authors tend to be experts with whom others want to collaborate.
    cations (Unweighted Graphs)                                                Note that, in these applications, additional boosting, with p < 0,
   In this subsection, we present results that aim to assess D2PR un-          negatively affects the correlation, indicating that the relationship
der the settings described above. For these experiments, the resid-            between node degree and significance is not very strong (Figure 5).
ual probability, α, and the parameter, β, are set to the default val-          The quick change when p < 0 is because, as we see in Table 3,
ues, 0.85 and 0, respectively. In these experiments, we consider               median standard deviations of neighbors’ degrees are low; i.e., de-
only unweighted graphs (we will study the weighted graphs and                  grees of neighbors of a node are comparable: there is no dominant
the impact of parameter β later in Section 4.5).                               contributor to TD (j, i) in Equation 1 (Section 3) and, thus, the
   Figures 2 through 4 include charts showing the Spearman’s cor-              transition probabilities are sensitive to changes in p, when p < 0.
relations between the D2PR ranks and application specific node
significances for different values of p and for different data graphs.
                                                                                4.3.3 Application Group C: When Degree Boosting
These figures clearly illustrate that different data graphs require dif-
                                                                                      Helps
ferent degrees of de-coupling4 to best match the application specific             Figure 4 shows that there are scenarios where additional boost-
node significance criterion.                                                   ing based node degrees provides some benefits. The article-article
                                                                               (based on common authors), listener-listener (based on common
4.3.1 Application Group A: When Degree Penaliza-                               artists), and artist-artist (based on common listeners) graphs reach
      tion Helps                                                               their peaks around p ∼ −1, indicating that these also benefit from
   The actor-actor (based on common movies) and commenter-                     large node degrees though improvements over p = 0 are slight.
commenter (based on common products) graphs have highest cor-                     A significant difference between applications in Group B and
relation at p = 0.5, with the correlations dropping significantly              Group C is that, for p < 0, the correlation curve is more or less
3
  In this paper, we are not proposing a new PageRank computation               stable. This is because, as we see in Table 3, in these graphs median
mechanism. Because of this (and since the focus is not improving               standard deviations of neighbors’ degrees are high: in other words,
scalability of PR), we do not report execution times and compare               for each node, there is a dominant neighbor with a high degree and
our results with other PageRank computation mechanisms.                        this neighbor has the highest contribution to TD (j, i); thus, the
4
  Degree penalization or degree-based boosting                                 rankings are not very sensitive to p, when p < 0.
                                                   +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*8BC;DE*$27,%1$27,%*                                                                                                                                 +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*8B"4'4,')C*2,&&#'7#%1                                                                                                                                                +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*8B"4'4,')C*"%,/3271"%,/327*
                                                                         3'5#4067#/*0%$"69**                                                                                                                                                                            2,&&#'7#%*3'5#4067#/*0%$"69*                                                                                                                                                                                              3'5#4067#/*0%$"69*
                                                                                               !"&'#                                                                                                                                                                                                                                                                                                                                                                                                              "#"$%
       !"#$%&$'()*+,%%#-$.,'*                                                                                                                                                                                                                                                                              "#"'%




                                                                                                                                                                                                                                                                                                                                                                                                                                !"#$%&$'()*+,%%#-$.,'*
                                                                                                                                                                                                      !"#$%&$'()*+,%%#-$.,'*
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  "#"&%
                                                                                               !"!%#


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    "#""%
                                                                                                                                                                                                                                                                                                           "#"&%                                                                                                                                         !&%    !'#(%    !'%    !)#(%     !)%    !*#(%    !*%    !"#(% "%   "#(%   *%      *#(%   )%   )#(%   '%   '#(%   &%
                                                                                               !"!$#
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                                                                                                                                                               !(%        !'#)%     !'%        !$#)%     !$%        !&#)%     !&%        !"#)%      "%   "#)%                        &%      &#)%     $%     $#)%      '%         '#)%     (%                                                                                                     !"#"&%
                                                                                                                                                                                                                                                                                            /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                                 !"!!#
                                ($#   ()"*#     ()#         ('"*#      ('#    (&"*#   (&#     (!"*# !#   !"*#                        &#    &"*#    '#    '"*#     )#    )"*#    $#
                                                                                                                                                                                                                                                                                                          !"#"$%                                                                                                                                                                                                  !"#"$%
                                                                                      /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                    ;#0%##*/#12,3"-#/*                                                +,'F#'.,'$-*8"GH9*                                                                                                               ;#0%##*/#12,3"-#/*                                                                 +,'D#'.,'$-*8"EF9*                                                                                                            ;#0%##*/#12,3"-#/*                          +,'D#'.,'$-*8"EF9*


                                                              (a) IMDB (actor-actor)           (b) Epinions (commenter-commenter)             (c) Epinions (product-product)
                                                                      Figure 2: Application Group A: p > 0 is optimal (i.e., node degrees need to be penalized)
                                                                                                                                                     +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*8;BC=D*$376,%1$376,%*
                                                                                                                                                                                                                                                                                                                                                                    +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*8BC;DE*&,F4#1&,F4#*
                                                                                                                                                                             3'5#4067#/*0%$"69**
                                                                                                                                                                                                                                                                                                                                                                                           3'5#4067#/*0%$"69*
                                                                                                                                                                                       "#$%                                                                                                                                                                                                                          "#"(&
                                                                                                            !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                                                                                                                          !"#$%&$'()*+,%%#-$.,'*
                                                                                                                                                                                       "#&%                                                                                                                                                                                                                            "#""&
                                                                                                                                                                                                                                                                                                                                                   !(&     !)#*&    !)&    !%#*&     !%&        !$#*&    !$&        !"#*& "&   "#*&                        $&     $#*&     %&     %#*&      )&     )#*&     (&
                                                                                                                                                                                                                                                                                                                                                                                                        /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                                                                                                                         "#"%                                                                                                                                                                                                                        !"#"(&
                                                                                                                                     !'%   !(#)%   !(%   !$#)%   !$%   !&#)%   !&%   !"#)% "%   "#)%                                 &%      &#)%         $%      $#)%         (%      (#)%         '%
                                                                                                                                                                           /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                                                                                                                       !"#&%                                                                                                                                                                                                                         !"#"'&



                                                                                                                                                                                       !"#$%                                                                                                                                                                                                                         !"#$%&

                                                                                                                                                           ;#0%##*/#12,3"-#/*                                                        +,'E#'.,'$-*8"FG9*                                                                                                                            ;#0%##*/#12,3"-#/*                                                          +,'F#'.,'$-*8"GH9*


                                                                                                                                                    (a) DBLP (author-author)                (b) IMDB (movie-movie)
                                                                                                                                                            Figure 3: Application Group B: p = 0 is optimal
                                                      01++)2*31-%1?%@ABC%C*-D/%*-4%E14)%'95-9F7*-7)%=@GHBI%*+372)6*+372)%                                                                                                                     01++)2*31-%1?%@#AB%B*-C/%*-4%D14)%'95-9E7*-7)%%=F*/<"?,G%29/<)-)+629/<)-)+%                                                                                                                                                12,,*3+42.&2@&A%BC&C+.D0&+.5&E25*&(:6.:F8+.8*&>G+0="@-H&+,40=7+,40=&
                                                                             8-:)95;<)4%5+*(;>%%                                                                                                                                                                        8-:)95;<)4%5+*(;>%                                                                                                                                                                                                      9.;*:6<=*5&6,+))?&
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         !)% !"#(% "%     "#(%   )%        )#(%   &%   &#(%   '%   '#(%   $%
                                !'%   !(#&%     !(%         !)#&%      !)%    !$#&%   !$%     !"#&% "%   "#&%                        $%    $#&%    )%     )#&%    (%    (#&%    '%                                             !'%        !(#)%     !(%        !$#)%     !$%        !&#)%     !&%        !"#)% "%        "#)%                        &%      &#)%     $%     $#)%          (%     (#)%         '%
                                                                                  4)5+))%4)6718(29-5%:)95;<%=(>%
                                                                                                                                                                                                                                                                                       4)5+))%4)6718(29-5%:)95;<%=(>%                                                                                                                                                                                              !"#&%
                                                                                               !"#"&%                                                                                                                                                                                                       !"#&%


                                                                                               !"#$"%                                                                                                                                                                                                       !"#$%                                                                                                                                                                                                  !"#$%

                                                                    @)5+))%4)6718(2)4%                                               01-J)-31-*2%=(K!>%                                                                                                                @)5+))%4)6718(2)4%                                                                01-H)-31-*2%=(I!>%                                                                                                              A*6,**&5*7829)3*5&                             12.I*.42.+3&>)J!?&


                                                       (a) DBLP (article-article)            (b) Last.fm (listener-listener)                (c) Last.fm (artist-artist)
                                                                 Figure 4: Application Group C: p < 0 is optimal (i.e., node degrees need to be boosted)

                                                                    "#'&%                                                                                         p<0                              2/3'+,+"(                                                                                                                           the weighted graphs are similar).
                                          ),$(,#$+(3/5,/6&),&+(
                                          &#""-((#4((,#$+($+5"++(




                                                                    "#'"%                                                                                                                                                                                                                                                                 First thing to notice in these figures is that the grouping of the
                                                                    "#$&%                                                                          )%'0#"(
                                                                                                                                                                                                                                                                                                                                       applications (into those where, respectively, p > 0, p = 0, or p < 0
                                                                    "#$"%                                                                                                                       )"13'(                                                                                                                                 is useful) is preserved when different values of α are considered.
                                                                                                                                       *#./+(                              )"1&2+(
                                                                    "#"&%              &#**+,'-(                                                                                                                                                                                                                                          Figure 6 studies the impact of the value of α in application group
                                                                                                                                                    p=0                                                                                                                                                                                A, where degree penalization helps (p > 0). As we see here,
                                                                    "#""%
                                                                                                         )&'#"(                                                                                                                                                                                                                        for the IMDB actor-actor (Figure 6(a)) and Epinions commenter-
                                                               !"#"&%
                                                                             !"#$%&'(                                      p>0                                                                                                                                                                                                         commenter (Figure 6(b)) graphs, having a lower value of α (i.e.,
                                                               !"#$"%
                                                                                                                                                                                                                                                                                                                                       lower probability of forward movement during the random walk)
Figure 5: Correlations between node degrees and applica-                                                                                                                                                                                                                                                                               provides the highest possible correlations between D2PR ranks and
tion specific significances for different data graphs (each color                                                                                                                                                                                                                                                                      node significance (with the optimal value of p being ∼ 0.5 inde-
group is a distinct pattern in Figures 2 through 4).                                                                                                                                                                                                                                                                                   pendent of the value of α). This indicates that in these graphs,
                                                                                                                                                                                                                                                                                                                                       it is not necessary to traverse far during the random walk. Inter-
4.3.4 Summary: Correlations between Node De-                                                                                                                                                                                                                                                                                           estingly, though, when degrees are over-penalized (i.e., p ≫ 0),
      grees and Application Specific Significances                                                                                                                                                                                                                                                                                     smaller values of α start leading to worse correlations, indicating
   The experiments reported above show that degree de-coupling                                                                                                                                                                                                                                                                         that (while not being optimal) severe penalization of node degrees
is important as different applications, even on the same data set,                                                                                                                                                                                                                                                                     helps make random traversals more useful than random jumps. As
may associate different semantics to node degrees and the conven-                                                                                                                                                                                                                                                                      we have already observed in Figure 2(c), the Epinions product-
tional PageRank scores are too tightly coupled with node degrees                                                                                                                                                                                                                                                                       product graph (Figure 6(c)) behaves somewhat differently from the
to be effective in all scenarios. Figure 5, which plots correlations                                                                                                                                                                                                                                                                   other two cases where degree penalization (p > 0) leads to larger
between node degrees and application specific significances for dif-                                                                                                                                                                                                                                                                   correlations: in this case, unlike the other two graphs, the highest
ferent data graphs, re-confirms that the ideal value of the p is related                                                                                                                                                                                                                                                               possible correlations between D2PR ranks and node significance
to the usefulness of the node degree in capturing the application                                                                                                                                                                                                                                                                      are obtained for large values of α, indicating that this application
specific definition of node significance.                                                                                                                                                                                                                                                                                              benefits from longer random walks (though the differences among
                                                                                                                                                                                                                                                                                                                                       the correlations for different α values are very small).
4.4 Relationship between α and p                                                                                                                                                                                                                                                                                                          Figure 7 shows that the pattern is different for application group
   In Figures 6 through 8, we investigate the relationship between                                                                                                                                                                                                                                                                     B, where conventional PageRank is ideal (p = 0): in this case, hav-
the value α and the degree de-coupling parameter p for different                                                                                                                                                                                                                                                                       ing a larger value of α (i.e., larger probability of forward movement
application types. Here we use the default value, 0, for the param-                                                                                                                                                                                                                                                                    during the random walk) provides the highest correlations between
eter β and present the results for unweighted graphs (the results for                                                                                                                                                                                                                                                                  ranks and significance. Interestingly, in these applications, when
                                                +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*                                                                                                                           +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)**                                                                                                                                     +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*,'*/4A#%#'7*B*C$-3#)*
                                                               8EF;GH*$27,%1$27,%*3'5#4067#/*0%$"69*                                                                                                                                                  8E"4'4,')F*2,&&#'7#%12,&&#'7#%*3'5#4067#/*0%$"69*                                                                                                                                                     8D"4'4,')E*"%,/3271"%,/327*3'5#4067#/*0%$"69*
                                                                                        "#')%                                                                                                                                                                                           "#"(%                                                                                                                                                                                            "#"$%




                                                                                                                                                                                                                                                                                                                                                                                                   !"#$%&$'()*+,%%#-$.,'*
     !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                  !"#$%&$'()*+,%%#-$.,'*
                                                                                                                                                                                                                                                                                        "#"'%
                                                                                        "#'(%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         "#"&%
                                                                                                                                                                                                                                                                                        "#"$%
                                                                                        "#"&%
                                                                                                                                                                                                                                                                                        "#"&%                                                                                                                                                                                            "#""%
                                                                                        "#"$%                                                                                                                                                                                                                                                                                                                                !&%    !'#(%    !'%     !)#(%    !)%    !*#(%     !*%    !"#(% "%    "#(%      *%     *#(%   )%   )#(%   '%   '#(%   &%
                                                                                                                                                                                                                                                                                         "#""%
                                                                                                                                                                                                                             !(%    !'#)%     !'%     !$#)%    !$%    !&#)%    !&%    !"#)% "%   "#)%                         &%     &#)%      $%      $#)%     '%     '#)%       (%
                                                                                       "#""%                                                                                                                                                                                                                                                                                                                                                                                             !"#"&%
                                                                                                                                                                                                                                                                                        !"#"&%
                                  !$%   !*#+%   !*%    !(#+%   !(%   !'#+%    !'%   !"#+% "%     "#+%                         '%     '#+%     (%      (#+%     *%     *#+%      $%
                                                                                                                                                                                                                                                                          /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                                     /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                        /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                        !"#"$%                                                                                                                                                                                          !"#"$%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         !"#"$%

                                                      CIJKL*                   CIJKM*                                      CIJKNL*                      CIJKO*                                                                                       CGHIJ*                    CGHIK*                                  CGHILJ*                           CGHIM*                                                                                     BFGHI*                      BFGHJ*                   BFGHKI*                BFGHL*


            (a) IMDB (actor-actor)             (b) Epinions (commenter-commenter)           (c) Epinions (product-product)
    Figure 6: Relationship between p and α, for application group A, where p > 0 is optimal (i.e., degrees need to be penalized)
                                                                                                                                           +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*                                                                                                                              +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*
                                                                                                                                                         8;EF=G*$376,%1$376,%*3'5#4067#/*0%$"69*                                                                                                                                                          8EF;GH*&,D4#1&,D4#*3'5#4067#/*0%$"69*
                                                                                                                                                                                        "#&%                                                                                                                                                                                             "#"(%
                                                                                                  !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                                                                                                   !"#$%&$'()*+,%%#-$.,'*
                                                                                                                                                                                                                                                                                                                                                                                         "#")%
                                                                                                                                                                                        "#'%
                                                                                                                                                                                                                                                                                                                                                                                          "#""%
                                                                                                                                                                                                                                                                                                                            !(%    !*#+%     !*%     !)#+%    !)%    !$#+%     !$%     !"#+% "%   "#+%                         $%     $#+%     )%      )#+%     *%     *#+%      (%
                                                                                                                                                                                         "#"%                                                                                                                                                                                            !"#")%
                                                                                                                            !(%    !$#)%    !$%     !&#)%    !&%    !'#)%     !'%    !"#)% "%   "#)%                           '%     '#)%      &%      &#)%     $%     $#)%     (%                                                                                          /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                                                                                                            /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                                               !"#"(%
                                                                                                                                                                                        !"#'%
                                                                                                                                                                                                                                                                                                                                                                                         !"#"'%
                                                                                                                                                                                        !"#&%
                                                                                                                                                                                                                                                                                                                                                                                         !"#"&%

                                                                                                                                                                                        !"#$%                                                                                                                                                                                            !"#$"%

                                                                                                                                                   CHIJK*                     CHIJL*                                  CHIJMK*                             CHIJN*                                                                                    CIJKL*                      CIJKM*                                      CIJKNL*                      CIJKO*


                                                                                                (a) DBLP (author-author)                  (b) IMDB (movie-movie)
                                                                                     Figure 7: Relationship between p and α, for application group B, where p = 0 is optimal
                                                +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)**                                                                                                                         +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*                                                                                                                             +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*
                                                              8;EF=G*$%.2-#1$%.2-#*3'5#4067#/*0%$"69*                                                                                                                                                    8E$)7F:&G*-4)7#'#%1-4)7#'#%*3'5#4067#/*0%$"69*                                                                                                                                                    8E$)7F:&G*$%.)71$%.)7*3'5#4067#/*0%$"69*
                                                                                        "#$%&                                                                                                                                                                                             "#'%                                                                                                                                                                                             "#$%
                                                                                                                                                                                                    !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                                                                                                                                                                                                    !"#$%&$'()*+,%%#-$.,'*
                                                                                        "#$"&
         !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                                                                                          "#$%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           "#&%
                                                                                        "#"%&
                                                                                                                                                                                                                                                                                          "#&%
                                                                                       "#""&                                                                                                                                                                                                                                                                                                                                                                                              "#"%
                                  !'&   !(#%&   !(&    !)#%&   !)&   !$#%&    !$&   !"#%& "&     "#%&                         $&     $#%&     )&      )#%&     (&     (#%&      '&                                                                                                        "#"%                                                                                                                               !$%    !'#(%    !'%     !&#(%    !&%    !)#(%     !)%    !"#(% "%    "#(%      )%     )#(%   &%   &#(%   '%   '#(%   $%
                                                                                        !"#"%&                                                                                                                               !(%    !'#)%     !'%     !$#)%    !$%    !&#)%    !&%    !"#)% "%   "#)%                         &%     &#)%      $%      $#)%     '%     '#)%       (%                                                                                 /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                             /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                             /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                                                    !"#&%
                                                                                                                                                                                                                                                                                         !"#&%
                                                                                        !"#$"&

                                                                                        !"#$%&                                                                                                                                                                                           !"#$%                                                                                                                                                                                            !"#$%

                                                      CHIJK*                   CHIJL*                                      CHIJMK*                      CHIJN*                                                                                       CHIFJ*                    CHIFK*                                  CHIFLJ*                           CHIFM*                                                                                     CHIFJ*                      CHIFK*                   CHIFLJ*                CHIFM*


           (a) DBLP (article-article)              (b) Last.fm (listener-listener)               (c) Last.fm (artist-artist)
  Figure 8: Relationship between p and α, for application group C, where p < 0 is optimal (i.e., node degrees need to be boosted)

p ≪ 0 or p ≫ 0, higher probabilities of random walk traversal                                                                                                                                                                                                                                                 the greater β is), the larger is the optimal value of p.
(i.e., larger α) stop being beneficial and lower values of α lead to                                                                                                                                                                                                                                             Figure 10 shows that, for applications in group B, where p ∼ 0
larger correlations. This re-confirms that, for these applications,                                                                                                                                                                                                                                           is ideal, when the connection strength is given significantly more
p ∼ 0 leverages the random walk traversal the best.                                                                                                                                                                                                                                                           weight than degree de-coupling (i.e., β ∼ 0), we observe high rank-
   As we see in Figure 8, in application group C, where degree                                                                                                                                                                                                                                                significance correlations. Interestingly however, for the movie-
boosting helps (p < 0), it is also the case that larger values of α                                                                                                                                                                                                                                           movie graph (where the edge weights denote common actors) the
(i.e., larger probabilities of forward transitions during the random                                                                                                                                                                                                                                          highest correlations are obtained not with p = 0, but with p = 0.5
walk) provides the highest correlations between node ranks and sig-                                                                                                                                                                                                                                           and β = 0.75, indicating that degree penalization is actually ben-
nificance. On the other hand, in these applications, p ∼ 0.5 serves                                                                                                                                                                                                                                           eficial in this case: movies that share large numbers of actors with
as a balance point where the value of α stops being relevant; in                                                                                                                                                                                                                                              other movies are likely to be B-movies, which are not good candi-
fact, for p > 0.5 the higher values of α stops being beneficial and                                                                                                                                                                                                                                           dates for transitions during the random walk.
lower values of α lead to larger correlations. This re-confirms that                                                                                                                                                                                                                                             Figure 11 shows that in application group C, where degree boost-
smaller values of p (which provides degree boosting) help leverage                                                                                                                                                                                                                                            ing (p < 0) helps, giving more weight to connection strength (i.e.,
the random walk traversal the best.                                                                                                                                                                                                                                                                           β ∼ 1.0) is a good, but not necessarily the best strategy. In fact,
                                                                                                                                                                                                                                                                                                              in these graphs, the highest overall correlations are obtained with
4.5 Relationship between                                                                                                                                                            β           and                                                   p                  in                                   β = 0 or β = 0.25, indicating that degree de-coupling is bene-
    Weighted Graphs                                                                                                                                                                                                                                                                                           ficial also in these cases. Interestingly, (unlike the case with the
   Finally, in Figures 9 through 11, we investigate the relation-                                                                                                                                                                                                                                             unweighted listener-listener graph, where the best correlation was
ship between the value β (which controls whether accounting for                                                                                                                                                                                                                                               obtained when p < 0) for the weighted version of the listener-
the connection strength or degree de-coupling is more critical in a                                                                                                                                                                                                                                           listener graph (where edge weights denote the number of shared
given application) and the degree de-coupling parameter p for dif-                                                                                                                                                                                                                                            friends), when β = 0 through 0.5, p = 0 provides the highest
ferent application types. Here we use the default value, 0.85, for                                                                                                                                                                                                                                            correlation and when β = 0.75, p = 0.5 provides the highest cor-
the parameter α and present the results for weighted graphs:                                                                                                                                                                                                                                                  relation – these indicate that listeners who have large numbers of
   Figure 9 depicts the impact of the value of the parameter β in ap-                                                                                                                                                                                                                                         shared friends with others are good candidates for random walk.
plication group A, where degree penalization helps (p > 0). As we                                                                                                                                                                                                                                                Note that a key observation from the above results is that the
see here, for all three weighted graphs, performing degree penal-                                                                                                                                                                                                                                             conventional PageRank, based on connection strength (i.e., β =
ization (i.e., β < 1.0) provides better rank-significance correlation                                                                                                                                                                                                                                         1.0), is not always the best strategy for the applications considered.
than relying solely on the connection strength (i.e., β = 1.0). Note
that the value of β impacts the optimal value of degree penalization                                                                                                                                                                                                                                          5. CONCLUSIONS
parameter p: the more weight is given to connection strength (i.e.,                                                                                                                                                                                                                                                                In this paper, we noted that in many applications the relation-
                                                  +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*                                                                                                                                   +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*8E"4'4,')F*                                                                                                                                                          +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*,'*/4A#%#'7*B*C$-3#)**
                                                8EF;GH*$27,%1$27,%*5#4067#/*0%$"69*I#/0#15#4067J*K*,:*2,&&,'*&,D4#)L*                                                                                                                                    2,&&#'7#%12,&&#'7#%*5#4067#/*0%$"69*G#/0#15#4067H*I*,:*)6$%#/*"%,/327)J*                                                                                                                                                  8D"4'4,')E*"%,/3271"%,/327*5#4067#/*0%$"69*F#/0#15#4067G*H*,:*)6$%#/*2,&&#'7#%)I*
                                                                                         "#)"%                                                                                                                                                                                                                                                                                                                                                                                                                                      "#$&%
                                                                                                                                                                                                                                                                                                             "#")%
      !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                    !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                                                                                                                                                                                                                                           !"#$%&$'()*+,%%#-$.,'*
                                                                                         "#"(%
                                                                                                                                                                                                                                                                                                             "#"(%                                                                                                                                                                                                                  "#$"%
                                                                                         "#"'%
                                                                                                                                                                                                                                                                                                             "#"'%
                                                                                         "#"$%                                                                                                                                                                                                                                                                                                                                                                                                                                      "#"&%
                                                                                                                                                                                                                                                                                                             "#"$%
                                                                                         "#"&%                                                                                                                                                                                                               "#"&%                                                                                                                                                                                                                   "#""%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                    !'%    !(#&%      !(%     !)#&%       !)%    !$#&%     !$%    !"#&% "%        "#&%   $%   $#&%   )%     )#&%   (%   (#&%   '%
                                                                                        "#""%                                                                                                                                                                                                                 "#""%
                                  !$%   !*#+%    !*%     !&#+%   !&%   !)#+%   !)%   !"#+% "%          "#+%                           )%     )#+%     &%         &#+%     *%      *#+%      $%                                               !(%    !'#)%     !'%      !$#)%      !$%     !&#)%     !&%    !"#)% "%        "#)%                           &%     &#)%      $%         $#)%     '%      '#)%     (%                                                                                                                  !"#"&%
                                                                                       !"#"&%                                                                                                                                                                                                                !"#"&%
                                                                           /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                                                          /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                                                   /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                         !"#"$%                                                                                                                                                                                                              !"#"$%                                                                                                                                                                                                                 !"#$"%
                                                                  CMN*         CMNO 0 is optimal (i.e., node degrees need to be penalized)
                                                                                                                                                    +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*                                                                                                                                                   +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*
                                                                                                                                                     8;EF=G*$376,%1$376,%*5#4067#/*0%$"69*H#/0#15#4067I*J*,:*2,"$"#%)K*                                                                                                                                                 8EF;GH*&,D4#1&,D4#*5#4067#/*0%$"69*I#/0#15#4067J*K*,:*2,&&,'*$27,%)L*
                                                                                                                                                                                                   "#$%                                                                                                                                                                                                                 "#"(%




                                                                                                                                                                                                                                                                                                                               !"#$%&$'()*+,%%#-$.,'*
                                                                                                           !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                                                                                                                                                                                                                        "#")%
                                                                                                                                                                                                   "#&%
                                                                                                                                                                                                                                                                                                                                                                                                                        "#""%
                                                                                                                                                                                                                                                                                                                                                        !(%    !*#+%     !*%     !)#+%       !)%    !$#+%     !$%    !"#+% "%         "#+%                            $%     $#+%       )%         )#+%     *%     *#+%      (%
                                                                                                                                                                                                                                                                                                                                                                                                                       !"#")%
                                                                                                                                                                                                     "#"%                                                                                                                                                                                             /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                                                                    !'%    !(#)%    !(%     !$#)%       !$%     !&#)%     !&%    !"#)% "%       "#)%                           &%     &#)%      $%         $#)%     (%      (#)%      '%                                                                                                                !"#"(%
                                                                                                                                                                                        /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                                                                                                                                                                                                                                                                                                                                                                        !"#"'%
                                                                                                                                                                                                   !"#&%
                                                                                                                                                                                                                                                                                                                                                                                                                        !"#"&%

                                                                                                                                                                                                   !"#$%                                                                                                                                                                                                                !"#$"%

                                                                                                                                                                         CLM*            CLMN*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*                                                                                                                                      +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)*                                                                                                                                                   +,%%#-$.,'*,:*;<=>*>$'?)*$'/*@,/#*!40'4A2$'2#*,'*/4B#%#'7*C*D$-3#)**
                                                        8;EF=G*$%.2-#1$%.2-#*5#4067#/*0%$"69**H#/0#15#4067I*J*,:*2,$376,%)K*                                                                                                                                 8E$)7F:&G*-4)7#'#%1-4)7#'#%*5#4067#/*0%$"69*H#/0#15#4067I*J*,:*)6$%#/*:%4#'/)K*                                                                                                                                        8E$)7F:&G*$%.)71$%.)7*5#4067#/*0%$"69*H#/0#15#4067I*J*,:*)6$%#/*-4)7#'#%)K*
                                                                                         "#$&%                                                                                                                                                                                                                 "#'%                                                                                                                                                                                                                   "#$%
                                                                                                                                                                                                                    !"#$%&$'()*+,%%#-$.,'*




                                                                                                                                                                                                                                                                                                                                                                                                                                          !"#$%&$'()*+,%%#-$.,'*
         !"#$%&$'()*+,%%#-$.,'*




                                                                                         "#$"%                                                                                                                                                                                                                 "#$%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "#&%

                                                                                         "#"&%                                                                                                                                                                                                                 "#&%
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      "#"%
                                                                                        "#""%                                                                                                                                                                                                                  "#"%                                                                                                                                                 !$%    !'#(%      !'%     !&#(%       !&%    !)#(%     !)%    !"#(% "%        "#(%   )%   )#(%   &%     &#(%   '%   '#(%   $%
                                  !'%   !(#&%    !(%     !)#&%   !)%   !$#&%   !$%   !"#&% "%          "#&%                           $%     $#&%     )%         )#&%     (%      (#&%      '%                                               !(%    !'#)%     !'%     !$#)%       !$%     !&#)%     !&%    !"#)% "%        "#)%                           &%     &#)%      $%         $#)%     '%      '#)%     (%                                                                                                       /#0%##*/#12,3"-4'0*5#4067*8"9*
                                                                         /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                                                           /#0%##*/#12,3"-4'0*5#4067*8"9*                                                                                                                                                                                                     !"#&%
                                                                                         !"#"&%                                                                                                                                                                                                               !"#&%


                                                                                         !"#$"%                                                                                                                                                                                                               !"#$%                                                                                                                                                                                                                  !"#$%

                                                                  CLM*         CLMN