=Paper=
{{Paper
|id=None
|storemode=property
|title=On Principles of Egocentric Person Search in Social Networks
|pdfUrl=https://ceur-ws.org/Vol-880/VLDS-p03-Cohen.pdf
|volume=Vol-880
|dblpUrl=https://dblp.org/rec/conf/vlds/CohenKKV11
}}
==On Principles of Egocentric Person Search in Social Networks ==
On Principles of Egocentric Person Search in
Social Networks
Sara Cohen Benny Kimelfeld Georgia Koutrika Jan Vondrák
Dept. of Computer Science and Engineering IBM Research–Almaden
Hebrew University of Jerusalem San Jose, CA 95120, USA
sara@cs.huji.ac.il {kimelfeld,gkoutri,jvondrak}@us.ibm.com
ABSTRACT topics such as social search [3, 9, 10] and link prediction [8].
Person search is the problem of finding, by means of key- The focus of this paper is on egocentric person search.
word search, relevant people in a social network. In egocen- Person search is the problem of finding, by means of keyword
tric person search, the search query is issued by a person s search, relevant people in a social network. Person search
participating in the social network, and the goal is to find is an important type of query over a social network, as it
people that possess two qualities: relevancy to the query, is an aid in finding people of interest. In egocentric person
and relevancy to s herself. This position paper considers search, the search query is issued by a person s participating
the latter quality, and specifically, scoring functions that in the social network, and the goal is to find people that
rank persons by their relevancy to s. In particular, the pa- possess two qualities: relevancy to the query, and relevancy
per proposes general principles (i.e., properties) that should to s herself. This position paper considers the latter quality,
be held by such scoring functions. Several functions, which and specifically, scoring functions that rank persons by their
were proposed in the past for measuring node connectivity, relevancy to a given node s. Results that are highly ranked
are analyzed with respect to the proposed principles. It is by relevancy to the query poser s are people (transitively)
shown that none of these functions sufficiently satisfy the trusted by s. Hence, s can be less wary of entering into a
principles. In contrast, the paper presents two additional real-life relationship (social or otherwise) with these people.
functions that satisfy the principles in a strong sense. Suppose, for example, that the searcher s is node Sally in
the small fragment of a social network in Figure 1, and she
poses the query “oral surgeon” (or “car mechanic”, “immi-
1. INTRODUCTION gration lawyer”, “really nice guy”). Obviously, her goal is to
Online social networks have grown in popularity at an find a person satisfying the query, who is also trusted or rec-
extraordinary pace over the last few years. In fact, social ommended by people who she trusts. Assuming that nodes
networks, such as Facebook, MySpace and Twitter, have Tim, Ted and Tony are relevant to the keywords, our goal is
become so widespread that they currently boast hundreds to measure their relevancy to s by taking into consideration
of millions of users. The graph structure defined by a social the graph structure.
network encodes interesting and useful information about Egocentric person search highly differs from social (web)
the social relations between users. Leveraging this data to search [3,9,10]. The latter generally refers to the problem of
effectively answer different types of queries is an interesting ranking web pages while taking into consideration social rela-
and challenging problem. tions. In contrast, the former problem is that of ranking so-
Abstractly, a social network is simply a graph of people. cial network nodes. The problem studied in this paper bears
Directed edges indicate that one person (node) likes/trust- similarity to expert search [4]. However, the latter has not
s/recommends another. (We use a directed model, as in taken into consideration the egocentric aspect of this prob-
Twitter, to model possibly asymmetric relations.) In ad- lem. Another different yet related problem is efficient search
dition, each node is associated with textual data, such as within a social network [1, 2, 13]; there, the focus is usually
personal information, posts, etc. on efficiently finding nodes with given properties (and not
Social networks have been the focus of extensive research, on sophisticated scoring functions). Link prediction, which
studying metrics like centrality and cohesion, as well as phe- is the problem of predicting which social relations are likely
nomena like the small world property. See [5] for a history of to be added to a social network [8], is also highly related.
the development of social network analysis. More recently, (This relationship is discussed further in Section 4). Intu-
online social networks have been studied in the context of itively, egocentric person search differs from link prediction
in that we must rank nodes t who are relevant to the search
keywords, even if it would a priori seem unlikely for s and t
to form a social relation.
Permission to make digital or hard copies of all or part of this work for As mentioned previously, when ranking results of an ego-
personal or classroom use is granted without fee provided that copies are centric person search, one must measure relevancy to the
not made or distributed for profit or commercial advantage and that copies query, and relevancy to s herself. The former can be quanti-
bear this notice and the full citation on the first page. To copy otherwise, to fied using standard information retrieval ranking functions.
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. This article was presented at the workshop Very
Thus, the focus of this work is on the latter. As an aid to
Large Data Search (VLDS) 2011. developing and studying node scoring functions, we present
Copyright 2011.
Tim vance of t for s in graph G. Relevance of t for s is important
in a social setting; in particular, it measures how much s
Sally Ted (her friends, friends of friends, and so forth) like/trust/rec-
ommend t. In the following section we present three simple
properties that any score(s, t, G) should satisfy.
Tony
3. SCORING FUNCTION PROPERTIES
Figure 1: Small fragment of a social network In this section, we fix a graph G and three distinct nodes
v
s, t, and v of G. We denote by πs,t (G) the graph that
consists of all the simple paths from s to t through v. We
several properties that seem intuitively appealing, which we are interested in three special types of nodes that are on
expect to hold in any natural node scoring function. These paths from s to t: connectors, mergers, and splitters. Each
properties are given in the form of graph manipulations, and of these plays a special part in G, and hence, manipulating
how they effect scoring of nodes. Intuitively, the underlying such nodes, will yield a new graph, for which we will expect
assumptions of our properties are, all else being equal, (1) that s and t will be even more closely related.
a node closer to s should score higher than one farther from We say that v is a connector if v lies on a path from s to
s, and (2) a node with multiple independent paths from s t, v has a single incoming edge (u, v), a single outgoing edge
should score higher than one with fewer paths. For example, (v, w), and there is no edge from u to w. Intuitively, v is a
in Figure 1, it would seem that Tim is more relevant to Sally connecting link on a path from s to t, and has no additional
than Tony, who in turn is more relevant than Ted. All three interplay with the graph. We say that v is a merger if v
nodes being at distance 2 from Sally, there are more inde- has multiple incoming edges that are on simple paths from
pendent paths from Sally to Tim than there are to Tony s to t, v has a single outgoing edge, and v separates s from
(and more to Tony than to Ted). Note that independent v
t in the graph πs,t (G). Intuitively, merger nodes serve as a
paths translate, in the real world, to independent opinions, merging point of multiple paths from s to t. Finally, v is
and hence, are quite valuable. a splitter if v has a single incoming edge, multiple outgoing
After presenting our general principles (properties), we edges that are on simple paths from s to t, and v separates
consider several node scoring functions that have been mainly s from t in the graph πs,t v
(G). Thus, v can be thought of
used in the context of link prediction. We analyze the degree as splitting an incoming path from s into multiple diverging
to which these functions satisfy our properties, and show paths to t.
that this degree is insufficient: each of these functions satis- Our properties use three types of graph transformations,
fies one or more properties in a trivial manner, or even vio- as depicted pictorially in Figure 2 and formally defined next.
lates them. We then introduce two additional node scoring Let v be a node in G.
functions, expected distance and reliability, and show that
these functions satisfy all properties in a strong sense. 1. If v is a connector, with incoming edge (u, v) and out-
going edge (v, w), then shortenv (G) denotes the graph
that is obtained from G by removing v, and adding an
2. EGOCENTRIC PERSON SEARCH the edge (u, w). (See Figure 2(a).)1
A social network is a directed graph G(V, E), where V is
a set of nodes, called people, and E is a set of edges. We 2. If v is a merger with outgoing edge (v, w) and incoming
use a directed model to take into consideration asymmetric edges (u1 , v), . . . , (un , v), then unmergev (G) denotes
social relations, as in Twitter. Thus, an edge (u, v) indicates the graph that is obtained from G by removing v, and
that u views v in a positive light, that is, u likes/trusts/rec- for all i ∈ {1, . . . , n} adding a new node vi and the
ommends v. In addition, each person v is associated with edges (ui , vi ) and (vi , w). (See Figure 2(b).)
textual content, such as personal information, posts and so
forth. A small fragment of a social network appears in Fig- 3. If v is a splitter with incoming edge (u, v) and outgoing
ure 1. Note that most textual content has been omitted in edges (v, w1 ), . . . , (v, wn ), then unsplitv (G) denotes the
this figure for simplicity in presentation (we only show the graph that is obtained from G by removing v, and for
first names of the nodes). all i ∈ {1, . . . , n} adding a new node vi and the edges
In egocentric person search, a keyword search query is (u, vi ) and (vi , wi ). (See Figure 2(c).)
issued by a person s in the network, and the result is a ranked
list t1 , . . . , tk of nodes. Abstractly, we denote a query as a We now list the properties that we expect a scoring func-
pair (s, c), where s is the node initiating the query, and c is a tion score(s, t, G) to satisfy.
string of keywords. For example, the query “tax consultant”,
• Shortening property: If node v is a connector, then
issued by Sally in Figure 1, will result in a ranked list of
it holds that score(s, t, G) ≤ score(s, t, shortenv (G)).
nodes, relevant to the keywords, from the graph.
The main challenge in egocentric keyword search is to for-
• Unmerge property: If node v is a merger, then it
mulate an effective scoring function for results of a query
holds that score(s, t, G) ≤ score(s, t, unmergev (G)).
(s, c). Clearly, the score of a node t should take into consid-
eration two aspects. First, how relevant are the keywords c • Unsplit property: If node v is a splitter, then it
to t? Second, how relevant is t for s? For the former aspect, holds that score(s, t, G) ≤ score(s, t, unsplitv (G)).
standard information retrieval scoring mechanisms can be
used. Therefore, we focus on the latter aspect. Formally, we 1
Note that when a node v is removed, every edge that is
will consider functions score(s, t, G) that measure the rele- incident to v is removed as well.
of nodes s and t. Due to space limitations, this is not dis-
u v w u w cussed further.
(a) Shortening: shortenv (G)
In the upcoming section we explore several scoring func-
tions, and determine which of the properties they satisfy,
u1 u1 v1 and thus, whether they are appropriate for use in ranking
results of egocentric person search.
u2 v w u2 v2 w
4. SCORING FUNCTIONS
...
...
The goal of a scoring function is to quantify the relation-
un un vn ship between s and t in a graph G, for the purpose of ego-
(b) Unmerging: unmergev (G) centric person search. Attempts to quantify the relationship
between nodes have been made in the past, for different
goals. For example, the link prediction problem is defined
w1 v1 w1 as follows: Given a snapshot of a social network at time T ,
link prediction is to accurately predict the edges that will be
u v w2 u v2 w2 added to the network during the interval from time T to a
given future time T 0 . Intuitively, a link is more likely to be
...
...
...
...
added from s to t during this interval, if they are already
wn vn wn well-related at time T . Hence, scoring functions for link
prediction measure the relatedness of s and t.
(c) Unsplitting: unsplitv (G) Several link prediction measures have been studied in the
past. We consider some of the more prominent functions:
Figure 2: Transformations
• rdist(s, t, G) is the reciprocal of the distance from s to
t in G, i.e., (dist(s, t, G))−1 , where dist(s, t, G) is the
Intuitively, the shortening property should be satisfied, as length of the shortest path from s to t.
shorter paths from s to t obviously reflect a closer relation- P∞ l l
ship between the two. The unmerge and unsplit properties • allPaths(s, t, G) is the sum l=1 β |pathss,t,G | where
l
should be satisfied, as they intuitively state that introducing pathss,t,G is the set of all length-l paths from s to t in
more independent opinions (i.e., disjoint edges), should only G [7, 8]. Thus, allPaths(s, t, G) directly sums over all
improve the ranking of t. paths from s to t, exponentially dampening by length
to count short paths more heavily.
Example 3.1. To demonstrate these properties, let G be
the graph of Figure 1, and let s be the node Sally. The node • rootPR(s, t, G) (i.e., rooted PageRank [8]) is the sta-
on the path from Sally to Ted is a connector. Hence, remov- tionary probability of t in a random walk that returns
ing this node, and directly connecting Sally to Ted, should to s with probability α at each step, moving to a ran-
raise Ted’s score (shortening property). Observe the merger dom neighbor with probability 1 − α.
node on the paths from Sally to Tony. Applying an unmerge
transformation to this node would result in the graph struc- Table 1 shows which properties are satisfied (and to which
ture currently existing between Sally and Tim, and hence, degree) by the three functions.
should raise Tony’s score (unmerge property).
Example 4.1. Consider again the graph in Figure 1. Sup-
Note that all properties use the rather weak “≤” to in- pose that s is the node Sally, and we are interested in ranking
dicate that the change in G should not lower score(s, t, G). nodes Tim, Ted and Tony. The function rdist gives the same
Given a specific scoring function, the shortening property is score to all three nodes, as all are at distance 2 from Sally.
satisfied in the strong sense if for all s, t, G and connector The function allPaths will score Tim and Tony equally (and
nodes v, score(s, t, G) < score(s, t, shortenv (G)). (Note the above Ted), as they both have precisely one path of length 2,
strict inequality.) Similarly, this property is satisfied in the and two paths of length 3. Similarly, rootPR will score Tim
trivial sense if score(s, t, G) = score(s, t, shortenv (G)) always and Tony equally (and above Tim). Rooted PageRank can-
holds. Finally, this property is satisfied in the weak sense not differentiate between the graph structure relating Sally
if the inequality score(s, t, G) ≤ score(s, t, shortenv (G)) is to Tim, and that to Tony. On the other hand, it would seem
sometimes strict. We define satisfaction in the strong, triv- that Tim should be the highest scoring, as its paths are in-
ial and weak sense similarly for the other two properties. dependent (while those to Tony are not). None of the three
scoring functions achieve such a scoring.
Remark 3.2. Due to the small-world property, often ob-
served in social networks [12], there is a high likelihood that
any two given nodes will be connected by a short path. 4.1 Expected Distance and Reliability
Hence, it sometimes may make sense to only take a lim- It is easy to see that with respect to our properties, none of
ited neighborhood of a node into account when computing the functions considered thus far is a good fit for node scor-
the scoring function. We do not directly formulate this re- ing. Therefore, we introduce two new functions (expected
quirement as one of our properties. However, it can usually distance and reliability), which can be appropriate for node
be taken into consideration by taking any scoring function, scoring. (To the best of our knowledge, these functions have
and applying it only to a neighborhood-bounded projection not been considered in the past for related social network
Shortening Unmerge Unsplit ...
t1 s ... t2
rdist(s, t, G) weak trivial trivial
...
...
...
allPaths(s, t, G) strong trivial trivial ...
rootPR(s, t, G) strong trivial fails k−1
Figure 3: Expected distance vs. reliability
Table 1: Table of property satisfaction.
rel (s, t2 , G) > p. Since rel (s, t1 , G) = p, we get that rel ranks
scoring problems.) t1 lower than t2 . However, the expected distance from s to
t1 is p + (1 − p)m, and that from s to t2 is at least k; hence,
Expected Distance. We first consider the expected dis- choosing the parameters to be such that k > p + (1 − p)m
tance function. Intuitively, this function measures the ex- would make expd rank t1 higher than t2 .
pected distance from s to t, when each edge is removed with
probability p. Note that for this value to be well defined,
we will chose a number m, that is returned by the function, 5. CONCLUSION
when no path from s to t exists. This paper presents a first attempt at defining principles
We fix a parameter m ∈ R, and a probability p ∈ (0, 1). that should guide node relevance ranking in egocentric per-
We will implicitly assume that m is larger than the number son search. Three properties, determining how graph manip-
of nodes in the graph G. The m-bounded distance from s to ulations should affect node scoring, were presented. Tradi-
t, denoted δ̂G (s, t), is defined by tional node scoring functions were analyzed with respect to
def these properties, as well as two additional functions, which
δ̂G (s, t) = min{dist(s, t, G), m} . are shown to strongly satisfy the properties.
Thus, if G has no path from s to t, then δ̂G (s, t) = m. Note For future work we intend to experimentally test the ef-
fectiveness of various node scoring schemes and validate the
that if s 6= t, then δ̂G (s, t) is always in the interval [1, m].
given properties. We also intend to study additional models
We denote by Gr a random subgraph of G that is obtained
of social networks, such as the undirected model (e.g., to
by removing each edge of G, independently, with probability
represent social relationships in Facebook), as well as social
1−p. The expected m-bounded distance, denoted by δG (s, t),
networks with weak and strong (or more generally, weighted)
is defined as follows.
h i edges [6].
def
δG (s, t) = E δ̂Gr (s, t) .
Acknowledgements
That is, δG (s, t) is the expected m-bounded distance from
The research of Sara Cohen was partially supported by the
s to t in a random subgraph Gr of G. Finally, our scoring
ISF (Grant 143/09) and the Israeli Ministry of Science and
function is the reciprocal of δG (s, t), namely
Technology (Grant 3-6472).
expd (s, t, G) = (δG (s, t))−1 .
def
6.[1] L.REFERENCES
Reliability. Reliability is another function that strongly Adamic and E. Adar. How to search a social network. Social
Networks, 27(3):187–203, 2005.
satisfies all properties. Intuitively, reliability measures the
[2] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman. Search
likelihood that a random subgraph Gr of G contains a path in power-law networks. Phys. Rev., E 64(046135):187–203,
from s to t. Intuitively, reliability satisfies the shortening 2001.
property, since longer paths are more likely to be discon- [3] P. Briggs and B. Smyth. Harnessing trust in social search. In
ECIR, pages 525–532, Berlin, Heidelberg, 2007.
nected when a random subgraph is chosen. Similarly, re- Springer-Verlag.
liability satisfies the unmerge and unsplit properties, since [4] R. D’Amore. Expertise community detection. In SIGIR, 2004.
they give preference to graphs with disjoint paths, which in [5] L. C. Freeman. The Development of Social Network Analysis:
turn, increase the likelihood of s and t being connected in A Study in the Sociology of Science. Empirical Press, 2004.
Gr . Fixing a probability p ∈ (0, 1), we define [6] S. Granovetter. The strength of weak ties. American Journal
of Sociology, 78:1360–80, 1973.
rel (s, t, G) = Pr [Gr has a path from s to t] . [7] L. Katz. A new status index derived from sociometric analysis.
Psychometrika, 18(1):39–43, 1953.
The following theorem shows that both functions strongly [8] D. Liben-Nowell and J. M. Kleinberg. The link-prediction
problem for social networks. JASIST, 58(7):1019–1031, 2007.
satisfy all three properties. The proof is nontrivial, and is
[9] K. McNally, M. P. O’Mahony, B. Smyth, M. Coyle, and
omitted, due to space restrictions. In the case of expected P. Briggs. Towards a reputation-based model of social web
distance, the proof is based on the notion of stochastic or- search. In IUI, pages 179–188, New York, NY, USA, 2010.
dering [11]. ACM.
[10] R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann,
J. X. Parreira, and G. Weikum. Efficient top-k querying over
Theorem 4.2. The functions expd (s, t, G) and rel (s, t, G) social-tagging networks. In SIGIR, pages 523–530, New York,
strongly satisfy all three properties. NY, USA, 2008. ACM.
[11] M. Shaked and J. G. Shanthikumar. Stochastic orders and
Remark 4.3. Although expd and rel strongly satisfy all their applications. Probability and Mathematical Statistics.
Academic Press Inc., Boston, MA, 1994.
properties, they do not necessarily imply the same relative
[12] J. Travers and S. Milgram. An experimental study of the small
ranking for all nodes. To demonstrate that, we consider world problem. Sociometry, 32(4):425–443, 1969.
the ranking of the nodes t1 and t2 in Figure 3. Assume that [13] S. B. Yang and H. Garcia-Molina. Improving search in
there are sufficiently many disjoint paths from s to t2 so that peer-to-peer networks. In ICDCS, pages 5–14, 2002.