Random Walk and Feedback on Scholarly Network

                      Yingying Yu                               Zhuoren Jiang                           Xiaozhong Liu
              College of Transportation                    College of Transportation              School of Informatics and
                    Management                                   Management                              Computing
              Dalian Maritime University                   Dalian Maritime University                Indiana University
               Dalian, China, 116026                        Dalian, China, 116026                       Bloomington
              uee870927@126.com                             jzr1986@gmail.com                   Bloomington, IN, USA, 47405
                                                                                                    liu237@indiana.edu

ABSTRACT                                                                    could significantly improve the scholarly recommendation perfor-
The approach of random walk on heterogeneous bibliographic graph            mance [3,7,9,12]. For instance, Liu et al., [2,3] constructed the het-
has been proven effective in the previous studies. In this study, by        erogeneous scholarly graph and proposed a novel ranking method
using various kinds of positive and negative feedbacks, we propose          based on pseudo relevance feedback (PRF), which can effectively
the novel method to enhance the performance of meta-path-based              recommend candidate citation papers via different kinds of meta-
random walk for scholarly recommendation. We hypothesize that               paths on the graph.
the nodes on the heterogeneous graph should play different roles               In this paper, we intend to further investigate feedback informa-
in terms of different queries or various kinds implicit/explicit feed-      tion and enhance the meta-path-based random walk performance.
backs. Meanwhile, we prove that the node usefulness probabil-               Intuitively, for different information needs, when user feedbacks
ity has significant impact for the path importance. When positive           are available, the nodes on the graph should play different roles
and negative feedback information is available, we can calculate            in the final measure. For example, given two different queries
each node’s proximity to the feedback nodes, and use the prox-              "Content-based Citation Recommendation" and "Heterogeneous In-
imity to infer the usefulness probability of each node via the sig-         formation Network", the same paper "ClusCite: effective citation
moid function. By combining the transition probability and the use-         recommendation by information network-based clustering" may be
fulness probability of nodes on the path instance, we propose the           retrieved by scholarly search engines, e.g., Google Scholar. But the
new random walk function to compute the importance of each path             target paper can be more useful (positive) for the second query than
instance. Experimental results with ACM full-text corpus show               the first one. As another example, for user X, if she prefers to cite
that the proposed method (considering the node usefulness) sig-             influential scholars’ work, the highly cited authors will be useful for
nificantly outperforms the previous approaches.                             her. While for user Y, if she tends to cite the frontiers, she will mark
                                                                            the newest publications and the newly topics as the useful feedback
                                                                            information. Therefore, the same node may perform significantly
Categories and Subject Descriptors                                          different based on different information needs and feedback infor-
H.3.3 [Information Storage and Retrieval]: Information Search               mation. Furthermore, by using (implicit/explicit positive/negative)
and Retrieval                                                               feedbacks, it is possible to infer the usefulness probability of other
                                                                            nodes on the graph. So that, the importance of path instance will
                                                                            vary in terms of the probability of node usefulness.
Keywords                                                                       The main contribution of this paper is threefold. First, in
Meta-path-based Random Walk, Feedback, Heterogeneous Graph                  this paper, the feedback is not limited to documents. In scholarly
                                                                            network, user could provide feedback judgments for authors, key-
                                                                            words and venues, either useful or not useful. If the explicit user
1.    INTRODUCTION                                                          feedback is unavailable, we propose an approach to automatically
   The volume of scientific publications has increased dramatically         generate the feedback nodes based on user queries and the relation-
in the past couple of decades, which challenges existing systems            ships among the entities on the heterogeneous graph. Second, we
and methods to retrieve and access scientific resources. Classi-            infer the usefulness of the nodes in terms of feedback information.
cal text-based information retrieval algorithms can recommend the           For instance, a node is less useful when it is close to the negative
candidate publications for scholars. However, most of them ig-              node(s). We make a conjecture that the usefulness probability of
nored the complex and heterogeneous relations among the schol-              each node depends on its average proximity to the feedback set and
arly objects. Not until recently, some studies proved that adopt-           can be estimated via sigmoid function. Third, we emphasize the
ing the mining approaches on heterogeneous information networks             node usefulness has a great impact on the path importance. Our ap-
                                                                            proach about computing the random walk probability differs from
                                                                            the previous study in that, not only the transition probability, but
                                                                            also the usefulness probability of the node should be taken into ac-
                                                                            count for random walk. To verify these hypotheses, we adopt a
                                                                            number of meta-paths on the graph (Figure 1) and make a com-
                                                                            parison between the classical random walk function and the novel
                                                                            method. Experimental results on ACM corpus show that the pro-
Copyright c 2015 for the individual papers by the papers’ authors. Copy-
ing permitted for private and academic purposes. This volume is published
                                                                            posed method significantly outperforms the original one.
and copyrighted by its editors.                                                The remainder of this paper is structured as follows. We 1) re-
Published on CEUR-WS: http://ceur-ws.org/Vol-1393/.
view relevant methodologies for pseudo relevance feedback, 2) in-          Given a specific scholarly network, there can be many kinds of
                                                                                                           w      w
troduce the preliminaries, 3) propose the improved methods, 4) de-       meta-paths. For example, P ∗ → A ← P ? is a simple meta-path
scribe the experiment setting and evaluation results, and 5) con-        on the scholarly network, denoting all the papers published by the
clude with a discussion and outlook.                                     seed paper’ author. P ∗ is the starting paper node (seed node) in this
                                                                         path. P ? denotes the candidate publication node. More examples
2.    RELATED WORK                                                       can be found in Table 1.
   Pseudo relevance feedback, also known as blind relevance feed-
back, provides a way for automatic local analysis. When the user         4.    RESEARCH METHODS
judgments or interactions are not available, it turns out to be an ef-
fective method to improve the retrieval performance. Traditional         4.1     Generate the Feedback Nodes
pseudo relevance feedback tends to treat the top ranked documents           Generally, given user initial queries, a list of ranking publications
as relevant feedback, and then expand the initial queries. How-          would be found via text retrieval. Based on the top ranked docu-
ever, some of the top retrieved documents may be irrelevant, which       ments, user would probably give explicit judgments on whether the
could result in noisy feedback into the process. So that, there are      related keywords, authors or venues are useful or not. However,
various efforts to improve the traditional pseudo feedback. [11] ex-     explicit feedback is not easy to get. In this study, we propose meth-
ploited the possible utility of Wikipedia for query dependent ex-        ods to infer the implicit feedback nodes on the heterogeneous graph
pansion. From the perspective of each query and each set of feed-        according to the given information.
back documents, [4] proposed how to dynamically predict an op-              The feedback is a collection of multiple nodes marked with use-
timal balance coefficient query expansion rather than using a fixed      ful (positive) or unuseful (negative) on the heterogeneous graph.
value. [1] suggested to use evolutionary techniques along with se-       We represent this collection as N F . N FP and N FN denote the
mantic similarity notion for query expansion. [6] introduced an ap-      positive and negative nodes set respectively. The kinds of feedback
proach to expand the queries for passage retrieval, not based on         nodes in discussion include keyword (K), author (A) and venue (V).
the top ranked documents, but via a new term weighting function,
which gives a score to terms of corpus according to their related-       4.1.1     Generate the Positive Feedback Nodes
ness to the query, and identify the most relevant ones. Instead of          Since we know the initial queries (i.e., author provided paper
using term expansion, graph-based feedback provides a new rank-          keywords) that the users should be most concerned with, it is rea-
ing assumption based on topology expansion. [2] used the pseudo          sonable to take the explicit keywords KP as the positive feedback
relevant papers as the seed nodes, and then explored the potential       nodes. Next, we will infer the positive authors and venues based on
relevant nodes via specific restricted/combined meta-paths on the        KP . We deem that the authors or venues that are highly likely re-
heterogeneous graph. Our study is motivated by this approach and         lated to KP are positive as well. So we rank authors via meta-paths
                                                                               con               r      w
mainly focused on updating the random walk algorithm by inves-           KP −−→ A? and KP ← P −         → A? , and take the top ranked Kpos
tigating both the positive and negative feedbacks. In fact, posi-        authors as the pseudo positive authors AP . Similarly, we locate the
                                                                                                   con                r     p
tive and negative feedback approach has been studied in image re-        positive venues via KP −−→ V ? and KP ← P −       → V ? , and select
trieval [5]. With several steps of positive and negative feedback,       the top ranked Kpos venues as the positive nodes VP .
the retrieval performance could be increasingly enhanced. From
the view of negative feedback, [10] studied and compared different       4.1.2     Generate the Negative Feedback Nodes
kinds of methods, it addressed that negative feedback is important          Intuitively, to generate the negative feedbacks, our basic assump-
especially when the target topic is difficult and initial results are    tion is that the negative nodes should be directly related to the
poor. Besides, using multiple negative feedback methods could be         searched results, but least relevant to the explicit positive keywords.
more effective.                                                          First, based on text retrieval results, we define the top ranked topK
                                                                         papers as Pr , and then we locate the keywords, authors and venues
                                                                                                                                              r
3.    PRELIMINARIES                                                      that are directly connected to Pr via different meta-paths, Pr →
                                                                                   w               p
   Following the work [2,8], an information network can be defined       Kr , Pr → Ar and Pr → Vr .
as follows.                                                                 Next, we filter collections of Kr , Ar and Vr . 1. Rank the key-
                                                                                                                                        con   r
                                                                         words Kr via the transition probability of meta-path KP → P →
   D EFINITION 1. (Information network) An information network           Kr . Use the last ranked Kneg keywords as the pseudo negative
is defined as a directed graph G = (V, E) with an object type            nodes KN . 2. Similar to keywords, rank the authors Ar via the
mapping function τ : V → A and a link type mapping function                                                          con     w
                                                                         transition probability of meta-path KP → P → Ar , and use the
φ : E → R, where each object v ∈ V belongs to one particular
                                                                         last ranked Kneg authors as the pseudo negative nodes AN . 3.
object type τ (v) ∈ A, each link e ∈ E belongs to a particular                                           con      p
relation φ(e) ∈ R, and if two links belong to the same relation          Rank the venues Vr via KP → P → Vr , and use the last ranked
                                                                                                                                        con
type, the two links share the same starting object type as well as       Kneg venues as the negative nodes VN . Here we use KP → P in-
                                                                                         r
the ending object type.                                                  stead of KP ← P because the "contribution" characterizes the im-
                                                                         portance of each paper, given a topic. It does not necessarily means
When there are more than one type of node or link in the infor-          paper is relevant to topic [2]. Even if one paper is not explicit rel-
mation network, it is called heterogeneous information network.          evant to some topic, it might also be important. The "contribute"
In [8], Sun further defined meta-path as follows.                        conveys more information.
                                                                            Thus, we obtain all the positive and negative feedback nodes.
  D EFINITION 2. (Meta-path) A meta-path P is a path defined             N FP includes KP , AP and VP . N FN contains KN , AN and VN .
on the graph of network schema TG = (A, R), and is denoted
in the form of Ȧ1 −→
                     R1        R
                               2
                          Ȧ2 −→
                                         R
                                         l
                                  . . . −→ Ȧl+1 , which defines a       4.2     Infer the Usefulness Probability of Node
composite relation R = R1 ◦ R2 ◦ . . . ◦ Rl between types Ȧ1 and          Unlike previous studies, in this paper, the importance of nodes
Ȧl+1 , where ◦ denotes the composition operator on relations.           on scholarly network is not even. The usefulness probability of
node Ni is determined by the feedback nodes. Intuitively, if node          path to optimize the weight of each sub-meta-path. For this study,
Ni is more closely related to the positive nodes, it could be more         we set β = 0.6.
useful. Conversely, if Ni is much closer to the negative nodes,               Then, the random walk probability will be decided by the tran-
and further away from the positive nodes, it indicates that Ni may         sition probability and the usefulness probability of the node on the
be not very useful. Therefore, the proximity between given node            path instance. In this paper, we use eight meta-paths to investigate
and feedback node set is very crucial. We should note that the             the novel random walk method with node feedback information for
usefulness probability of each node varies from different feedback         citation recommendation. All the meta-paths are listed in Table 1.
node sets.
   To infer the usefulness probability of node Ni , we adopt the sig-
                                    1
                                                                           5.    EXPERIMENT
moid function Pu (Ni ) = 1+e−αD(N        i)
                                             to convert the proximity
into probability, where α controls the convergent rate (default is         5.1     Data Preprocessing
1). In our assumption, if Nj is positive node, Pu (Nj ) = 1, other-           We used 41,370 publications (as candidate citation collection),
wise P (Nj ) = 0. D(Ni ) denotes the proximity between Ni and              published between 1951 and 2011, on computer science for the ex-
the feedback node set N F . It can be derived from the following           periment (mainly from the ACM digital library). As [2] introduced,
formula.                                 P                                 we constructed the heterogeneous graph shown in Figure 1 and Ta-
                                            Nj ∈N FP d(Ni ,Nj ))
                P
                  Nk ∈N FN d(Ni ,Nk )
   D(Ni ) =            |N FN |
                                      −          |N FP |
                                                                 , where   ble 2.
|N FN | and |N FP | represents the size of collection N FN and N FP           For the evaluation part, we used a test collection with 274 papers.
respectively. d(Ni , Nj ) indicates the proximity between node Ni          The selected papers have more than 15 citations from the candidate
and node Nj . In this paper, we will estimate the proximity d(Ni , Nj )    citation collection.
based on the paths Ni       Nj on the graph. There could be lots of        5.2     Generate Feedback Nodes
path instances connected node Ni and Nj . If the length of path is
too long, the influence would be too small to be considered. We as-           Attaining different types of feedback information is the most im-
sume the maximum of path length is 10. Then we select the shortest         portant part in this research. Since it is not available to get the user
path and define its length as the proximity d(Ni , Nj ).                   judgments right away. We used the method introduced in section
   If D(Ni ) is negative, it reflects node Ni is closer to negative        4.1 to create positive and negative feedback nodes. As aforemen-
nodes than positive ones, which means node Ni could be less im-            tioned, the collection KP is the set of user given keywords. It is ex-
portant, and vice versa. Particularly, if D(Nj ) → +∞, it indi-            plicit positive feedbacks. While AP and VP can be derived by their
cates that Nj is far away from negative feedback nodes, so the im-         connectivity to set KP based on the heterogeneous graph. Here we
portance of this node approach to 1; If D(Nj ) = 0, it indicates           set Kpos = 10, and take the top 10 ranked authors/ venues as the
that Nj has the same distance to negative and positive nodes, then         implicit positive feedbacks.
Pu (Nj ) = 0.5 ; If D(Nj ) → −∞, it indicates that Nj is closest              Next, we produced the implicit negative feedback nodes. Through
to negative feedback node, then Pu (Nj ) → 0.                              the text retrieved results, we grabbed the top ranked papers as Pr
                                                                           (topK = 20). Then we located the list of keywords/ authors/
                                                                           venues which have direct correlations to Pr , but the least relevance
4.3     Compute the Random Walk Probability                                to KP . Find the last ranked Kneg = 10 and used them as KN , AN
        Based on Meta-path                                                 and VN respectively.
   Meta-path illustrates how the nodes are connected in the hetero-
geneous graph. Once a meta-path is specified, a meta-path-based            5.3     Experiment Result
ranking function is defined, so that relevant papers determined by            In the evaluation part, we experimented with 8 different meta-
the ranking function can be recommended [3]. It turns out that             paths. For each meta-path, two sets of results were shown on row
meta-path based feedback on heterogeneous graph performs better            ‘N’ and ‘Y’ in Table 3. The ‘N/Y’ column in Table 3 indicates
than other methods (PageRank) based PRF [2]. Random walk on                whether we use the positive and negative feedback nodes or not for
heterogenous network can explore more global information, com-             computing the path importance. ‘N’ indicates that the result was
bining multiple feedback nodes, which might be very important for          from the baseline in [2], while ‘Y’ means multiple feedback nodes
the recommendation tasks.                                                  were employed and the node influence was appended into the final
   In order to quantify the ranking score of candidates relevant to        random walk function. MAP and NDCG are used as the ranking
the seeds following one given meta-path, a random walk based ap-           function training and evaluation metrics. For MAP, binary judg-
proach was proposed in [2]. The relevance between P ∗ and P ?              ment is provided for each candidate cited paper (cited or not cited).
                          (1)  (l+1)      P
can be estimated via s(ai , aj       ) =        (1)   (l+1) RW (t),        NDCG estimates the cumulative relevance gain a user receives by
                                             t=a  i  a    j
                                         (1)      (l+1)
                                                                           examining recommendation results up to a given rank on the list.
where t is a path instance from node ai to aj            following the     We used an importance score, 0-4, as the candidate cited paper im-
specified meta-path, and RW (t) is the random walk probability of          portance to calculate NDCG scores. Apparently, in most cases,
the instance t.                                                            row ‘Y’ significantly outperforms row ‘N’ , which shows that the
                    (1)   (2)        (l+1)
   Suppose t = (ai1 , ai2 , . . . , ail+1 ), the random walk proba-        positive/negative feedbacks enhance the random walk performance
                                          Q        (j)  (j+1)
bility can be computed via RW (t) = j w(aij , ai,j+1 ). While              quite well. We also used t-test to verify this improvement and most
this formula only considers the weight of link on the path instance.       meta-paths are significantly refined.
Based on our hypothesis, the node usefulness probability has a
great effect on the path importance. So in this study, we propose a        6.    CONCLUSION AND LIMITIONS
novel randomQ   walk function as follows.                                     In this study we use multiple kinds of feedback nodes and pro-
                          (j)  (j+1)                   (j+1)
   RW (t) = j (β ·w(aij , ai,j+1 )+(1−β)·Pu (ai,j+1 )), where              pose a new method to enhance the meta-path-based random walk
      (j+1)                                                   (j+1)
Pu (ai,j+1 ) is the usefulness probability of the node ai,j+1 on the       performance. The new random walk function considers both tran-
path (derived from section 4.2), and β determines which factor is          sition probability and node usefulness probability on the path in-
more important. Theoretically, we need to tune β for each meta-            stance. We find that the node influence varies from the set of
feedback nodes, which could be inferred based on the explicit user
queries via a series of steps. Experimental results with ACM data                                              Table 2: Graph statistics
                                                                                              Node/Edge   Number       Description
illustrate that the new approach with positive/negative feedback in-                          P           41,370       Paper
formation helps to improve the performance of meta-path-based                                 A           63,323       Author
recommendation.                                                                               V           369          Venue
   For further study, we will continue this approach based on real                            K           3,911        Keyword
user explicit feedbacks and design the personalized recommenda-                                  c
                                                                                              P →P        168,554      Paper cites another paper
tion model to improve user experience. Not only the node useful-                                w
                                                                                              P →A        105,992      Paper is written by an author
ness is related to the feedback nodes, but also the weight of each re-                           p
                                                                                              P →V        41,013       Paper is published at venue
lation type may be affected by the feedback nodes or retrieval task.                            co
                                                                                              A→A         239,744      Co-author relationship
If the retrieval task is to search the relevant papers based on given                            r
                                                                                              P →K        587,252      Paper is relevant to keyword(topic)
authors, the author feedback nodes will be more useful for "writ-                               con
tenby" relation, "writtenby" and "co-author" relation might be more                           K → P       3,577,111    Keyword (topic) is contributed by paper
                                                                                                con
important. This hypothesis will be discussed in the next step. Be-                            K → A       2,397,205    Keyword (topic) is contributed by author
                                                                                                con
sides, more sophisticated inference models will be adopted which                              K → V       18,450       Keyword (topic) is contributed by venu
may enhance the ranking performance.

7.       FIGURES AND TABLES                                                              Table 3: Meta-path Based Random Walk Performance
                                                                                         Comparison(|P ∗ | = 10)
                                                                                          NO.    N/Y   MAP     MAP@5 MAP@10 NDCG                NDCG@5     NDCG@10
                                                                                                 N     0.0277 0.0085      0.0129      0.1035    0.0306     0.0394
                                                                                          1
                                                                                                 Y     0.0365 0.015       0.0211      0.1149    0.0459     0.0565
                                                                                                       ***     ***        ***         ***       **         ***
                                                                                                 N     0.1315 0.0552      0.0773      0.2193    0.1427     0.1548
                                                                                          2
                                                                                                 Y     0.1459 0.0678      0.0904      0.2307    0.1656     0.1705 **
                                                                                                       ***     ***        ***         **        ***
                                                                                                 N     0.0744 0.0306      0.0404      0.1539    0.0689     0.0766
                                                                                          3
                                                                                                 Y     0.0948 0.0441      0.0582 *    0.1707    0.0945 *   0.1002 **
                                                                                                       ***     ***                    ***
                                                                                                 N     0.027   0.0042     0.0076      0.1378    0.0146     0.025
                                                                                          4
                                                                                                 Y     0.038   0.0109     0.0153      0.1521    0.0318     0.0387
                                                                                                       ***     ***        ***         ***       ***        ***
                                                                                                 N     0.0436 0.0121      0.0187      0.1672    0.0476     0.0585
                                                                                          5
                                                                                                 Y     0.0561 0.0257      0.0328      0.1854    0.0867     0.0885
                                                                                                       ***     ***        ***         ***       ***        ***
                                                                                                 N     0.0327 0.0234      0.03        0.0734    0.0693     0.0748
                                                                                          6
                                                                                                 Y     0.0872 0.0359      0.0471      0.1962    0.0805 *   0.09 *
                                                                                                       ***     ***        ***         ***
           Figure 1: Heterogeneous Bibliographic Graph                                           N     0.0238 0.0083      0.0097      0.1529    0.0216     0.0224
                                                                                          7
                                                                                                 Y     0.0373 0.0133      0.0163      0.1718    0.0317     0.0344 **
                                                                                                       ***     ***        ***         ***       **
                                                                                                 N     0.0092 0.0005      0.0007      0.1397    0.0011     0.0013
                                                                                          8
                                                                                                 Y     0.012   0.0011     0.0017      0.1476    0.0027     0.0045
           Table 1: All the meta-paths used in this study                                              ***     ***        ***         ***       ***        ***
 NO.                 Meta-path             Feedback ranking hypothesis                                     p < 0.05: *, p < 0.01: **, p < 0.001: ***
                     w      w
  1             P ∗ −→ A ←− P ?            Relevant paper’s author’s other papers
                                           can be relevant
                          c
     2               P∗ −
                        → P?               Relevant paper’s cited papers can be rel-         recommendation. In Proceedings of the 23rd ACM International
                                           evant
                      c        c                                                             Conference on Conference on Information and Knowledge
     3          P∗ −  → P?
                   →P −                    Relevant paper’s cited paper’s cited pa-          Management, pages 121–130. ACM, 2014.
                                           per can be relevant
                 c        w        w                                                     [3] X. Liu, Y. Yu, C. Guo, Y. Sun, and L. Gao. Full-text based
     4     P∗ −
              → P −→ A ←− P ?              Relevant paper’s cited papers’ authors’
                                                                                             context-rich heterogeneous network mining approach for citation
                                           papers can be relevant
                  w       co       w                                                         recommendation. In ACM/IEEE Joint Conference on Digital
     5      P∗ → A → A ← P?                Relevant paper’s author’s co-author’s pa-
                                                                                             Libraries, 2014.
                                           pers can be relevant
                 w        w        c                                                     [4] Y. Lv and C. Zhai. Adaptive relevance feedback in information
     6     P ∗ −→ A ←− P −
                         → P?              Relevant paper’s author’s cited papers
                                           can be relevant                                   retrieval. In Proceedings of the 18th ACM conference on Information
                  p       p        c                                                         and knowledge management, pages 255–264. ACM, 2009.
     7     P∗ → V ← P → P?                 Paper can be relevant if it is cited by the
                                           ones published at the same venue as the       [5] H. Muller, W. Muller, S. Marchand-Maillet, T. Pun, and D. M.
                                           relevant paper                                    Squire. Strategies for positive and negative relevance feedback in
     8
            p         p        w
         P ∗ → V ← P −→ A ←− P ?
                                       w
                                           Paper can be relevant if its authors’ pa-         image retrieval. In Pattern Recognition, 2000. Proceedings. 15th
                                           pers are published at the same venue as           International Conference on, volume 1, pages 1043–1046. IEEE,
                                           the relevant paper                                2000.
                                                                                         [6] H. Saneifar, S. Bonniol, P. Poncelet, and M. Roche. Enhancing
                                                                                             passage retrieval in log files by query expansion based on explicit
                                                                                             and pseudo relevance feedback. Computers in Industry,
8.       REFERENCES                                                                          65(6):937–951, 2014.
 [1] P. Bhatnagar and N. Pareek. Improving pseudo relevance feedback                     [7] Y. Sun and J. Han. Meta-path-based search and mining in
     based query expansion using genetic fuzzy approach and semantic                         heterogeneous information networks. Tsinghua Science and
     similarity notion. Journal of Information Science, page                                 Technology, 18(4), 2013.
     0165551514533771, 2014.                                                             [8] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. PathSim: Meta
 [2] X. Liu, Y. Yu, C. Guo, and Y. Sun. Meta-path-based ranking with                         path-based top-k similarity search in heterogeneous information
     pseudo relevance feedback on heterogeneous graph for citation                           networks. In Proc. 2011 Int. Conf. Very Large Data Bases
     (VLDB’11), Seattle, WA, 2011.
 [9] Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu. Integrating
     meta-path selection with user-guided object clustering in
     heterogeneous information networks. In Proc. of 2012 ACM
     SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
     (KDD’12), Beijing, China, 2012.
[10] X. Wang, H. Fang, and C. Zhai. A study of methods for negative
     relevance feedback. In Proceedings of the 31st annual international
     ACM SIGIR conference on Research and development in information
     retrieval, pages 219–226. ACM, 2008.
[11] Y. Xu, G. J. Jones, and B. Wang. Query dependent pseudo-relevance
     feedback based on wikipedia. In Proceedings of the 32nd
     international ACM SIGIR conference on Research and development
     in information retrieval, pages 59–66. ACM, 2009.
[12] X. Yu, X. Ren, Y. Sun, B. Sturt, U. Khandelwal, Q. Gu, B. Norick,
     and J. Han. Recommendation in heterogeneous information networks
     with implicit user feedback. In Proc. of 2013 ACM Int. Conf. Series
     on Recommendation Systems (RecSys’13), pages 347–350, Hong
     Kong, 2013.