<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Temporal Social Network: Storage, Indexing and Query Processing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Xiaoying Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chong Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bin Ge</string-name>
          <email>3gebin1978@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weidong Xiao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Algorithms</institution>
          ,
          <addr-line>Measurement, Performance</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Collaborative Innovation Center of Geospatial Technology</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Science and Technology on Information Systems Engineering Laboratory National University of Defense Technology</institution>
          ,
          <addr-line>Changsha 410073</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the increasing of requirements from many aspects, various queries and analyses arise focusing on social network. Queries like nding users, friends or social activities satisfying a certain period gives temporal insights into retrieval or statistics, hence augmenting temporal query capability in such context, temporal social network (TSN), is meaningful. We propose three kinds temporal queries in social network, aiming to explore temporal dimension in users, relationships and social activities. A storage model is designed to logically and physically represent TSN, and then we propose two index structures, TUR-tree and TUA-tree for accelerating query process. Query processing algorithms are designed for the three queries, and we evaluate our idea on a dataset which is synthetically generated from real dataset, and experimental results show that our indexes and query processing are e ective and scalable.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.2.4 [Database Management]: Systems - query processing</p>
    </sec>
    <sec id="sec-2">
      <title>General Terms</title>
    </sec>
    <sec id="sec-3">
      <title>1. INTRODUCTION</title>
      <p>Online social networks have become more and more
popular, numerous sites such as Facebook, Twitter and LinkedIn
This work is supported by NSF of China grant 61303062
and 71331008.
allow users to interact and share content using social links.
Based on vast amount of information contained in social
networks, personalized and expressive search features are
essential for users, vendors or third-parties based on di erent
purposes. In various kinds of data, time is a common and
necessary dimension in social networks. For instance, when
a user logins or logouts the community system, there is often
a time stamp recording such action. When two users become
friends, a time stamp also records this event, and similar for
unfriending case. Similarly, when a user posts text, photo,
video, or comments others' posts, or shares web links, these
social activities are all recorded with time stamps.</p>
      <p>When these temporal information is taken into
consideration in an ordinary query for social networks, it becomes
interesting to discover that some historical insights are
associated with the results, e.g., nd some users who are
active during a certain period, or nd pairs of friends valid
in some year, or nd some activities posted at some time.
Such queries add temporal axes for user requirement, which
is able to distinguish old and new, or active and inactive
users, friends or activities. However, such kind of query
functions are not studied well in academic community, nor
developed in industrial platforms.</p>
      <p>In this paper, our goal is to explore temporal queries in
social networks associated with time dimension, we call it
temporal social networks (TSN). And we focus on the
following three kinds of queries.</p>
      <p>(1) Friends of Interesting Activities (FIA) query.
FIA queries aim to nd a user's friends who are involved in
some given social activities. For instance, George wants to
nd which friends of him post the texts containing co ee and
pasta during last two weeks. This query would help users to
nd interesting events or people along the time axis.
(2) Users of Time Filter (UTF) query. UTF queries
aim to nd certain users who are not only active during a
given time interval, i.e., the period of logon status intersects
with the given time interval, but also whose friends take
part in some given interesting activities. For example, a
fashion advertiser plans to look for some users who are active
in recent month, and the user's friends have taken part in
the social activities containing boot, because the advertiser
wants to persuade the users to buy their products utilizing
in uence from the friendships.</p>
      <p>(3) Group of Users with Relationship Duration (GU
RD) query. GURD queries aim to nd a set of groups,
where the number of users in each group is equal to a given
number, and average intimate degree of it satis es a given
value (here average intimate degree is measured by average
relationship duration, relationship duration is calculated by
di erence between current time and friend-make time), and
all the members of it have taken part in some given
activities. For instance, a pizza restaurant wants to send coupons
to groups of users for promotion, so it aims to nd
fourmember groups (due to the size of its dining table), and
the average relationship duration in each group is not less
than one year, and all four members have participated in
the activities containing pizza.</p>
      <p>Motivated by the important role time may assume in
social query, our goal is to design appropriate storage scheme,
index for typical temporal-social query, which has been mostly
ignored so far, to improve the expressiveness and quality of
social search queries. We rst formally describe the problem,
and use a storage model to store TSN data, then we design
two indexing structure to accelerate the query processing.
We implement our indexes and algorithms and empirically
verify our methods are feasible. Our main contributions are
listed as following:</p>
      <p>Three new query patterns are proposed in social
network, taking time dimension into consideration.</p>
      <p>A storage model for TSN is presented, including logical
view and physical model.</p>
      <p>Querying processing algorithms are designed, along
with two indexing structures to solved the three query
patterns.</p>
      <p>The rest of this paper is organized as follows. Problem
is formally de ned in section 2. In section 3, storage model
for TSN is presented. Two indexing structures are described
in section 4, followed by query processing in section 5. In
section 6, we carry out our experiments, and related works
are surveyed in section 7. Finally, section 8 concludes the
paper with directions for future works.</p>
    </sec>
    <sec id="sec-4">
      <title>2. PROBLEM DEFINITION</title>
      <p>In this paper, we focus on the social network of undirected
graph, however, it is straightforward to extend it to directed
graph. Our model is formally de ned as following:</p>
      <p>Given an undirected graph G=(V , E), each vertex vi in
V represents a user in the community and is associated with
a time interval [vts, vte), meaning the valid period during
which vi exists (or being logon status) in community, and
[vts, ) means vi is still in the community now. Each edge
(vi, vj) in E represents a friend relationship between user
vi and vj, and it is also associated with a time interval [ets,
ete), in which ets means the time when the relationship is
established and ete means the time when the relationship
is removed. Similarly, [ets, ) means the relationship still
exists at current.</p>
      <p>For a given social network graph G, there is also a set A of
activities, in this paper, we use term activity to denote social
event in the social network, e.g., publishing a post, sharing
a link and etc. Each vi in G can connect to an activity ak
in A, which means that vi gets involved in ak, we use term
participation to describe the relationship between user and
activity, e.g., in social network, user may post some texts,
or forward other's post, or share a link of web page, or
comment other's activity, or add some activity into favorite, in
general, we call these actions as participations. Without loss
generality, each activity can be represented as &lt;aid, Wa&gt;,
where aid is the activity identi er, Wa is a keyword set to
describe the activity. Assuming user vi publishes a post ak
in the Facebook at time tp, and then her/his friend vj
comments the post at some time tq, thus the two participating
actions could be formally represented by (vi, ak, tp) and (vj,
ak, tr), respectively.</p>
      <p>Figure 1 illustrates a temporal social network, where users
and activities are plotted as dots and triangles, respectively.
The relationship are divided into two categories:
user-touser (solid line) and user-to-activity (dashed line). And the
label on edge indicates time interval of relationship or time
stamp when a user participate in an activity. For instance,
user v1 logins community at time vt1 and does not logout
at current, and user v1 and v2 become friends at et1 and
unfriend each other at et3, and v4, v7 and v2 participate in
a2 at t4, t5 and t6, respectively.</p>
      <p>(vt10, *) v10
(vt1, *)</p>
      <p>v1 (et3, *)
(et1, et3)
v2
(vt2, *)
(et2, *)</p>
      <p>t6
(et4, *)
v3 (vt3, vt5)</p>
      <p>t2
t1</p>
      <p>v5
(vt5, *)
t3
W2
a2
t4
v4
(vt4, *)
a1 W1
t5
(et5, *)
v9</p>
      <p>(vt9, *)
(et6, *)
v
7 (vt7, *)
v6
(vt6, vt9)</p>
    </sec>
    <sec id="sec-5">
      <title>3. STORAGE MODEL</title>
      <p>Based on the description of temporal social network (TSN),
we give TSN's storage model, in which two perspectives are
presented: logical view and physical model.
3.1</p>
    </sec>
    <sec id="sec-6">
      <title>Logical View</title>
      <p>In logical view, users and activities are partitioned
horizontally, and social events are partitioned vertically along
the time axis. In the upper half of the logical view, along
the time axis, the actions which a speci c user makes are
presented in chronological order. Further, in the lower half,
a list of users which participate in a speci c activity also can
be obtained along the time axis. Figure 2 shows an example
of the logical view, along the time axis, user v1 logins
social network at time t1, makes friends with user v2 at time
t2 and participates in activity a1 at time t3. And activity
a1 is participated in by user v1 and v2 at time t3 and t4,
respectively.
userst1 t2 t3 t4 t5 t6 time
v1 (v1,t1)(login) (v1, v2, t2)(be friends) (v1, a1 ,t3)(post)
v2 (v2, v1, t2)(be friends) (v2, a1, t4)(like)(v2, v3, t5)(unfriend)
v3 (v3, v2, t5)(unfriend)(v3, a2, t6)(share)
activities time
a1
a2</p>
      <p>(v1, a1 ,t3)(v2, a1, t4)
To e ciently access temporal social network, we design a
physical model for storing TSN. The user and activity
information is separately stored, and users are clustered into
pages on disk, which is similar for activities. In particular,
each user page consists of several items, where each item
represents a user, including user id, name and etc, to look
up quickly, it also contains three pointers, the rst one
references to a list of time intervals, each of which represents
a pair of login and logout time stamps of the user, and the
second one points to a list, each item of which contains the
user's friend id, besides friend-making time and unfriending
time, and the third one also points to a list, each item of
which contains activity id and corresponding participating
time related to the user.</p>
      <p>The model for activity is similar to user's, i.e., each item
in activity page contains activity id, the link to keywords
and other objects in the activity, and a pointer references to
a list, where each item represents the user participating in it
as well as the time stamp. Additionally, the activity in user
page also points to the corresponding item in activity page.
Note that, the lists in the physical model do not belong to
user page or activity page, they are sequentially stored on
disk. Figure 3 illustrates a physical model example.
&lt;tin1, tout1&gt;
&lt;uid2, tf1, *&gt;
users page
uid1 ptrt_list ptrf_list ptra_list &lt;a1, ta1&gt;
uid2 ptrt_list ptrf_list ptra_list
... ... ... ...
uidn ptrt_list ptrf_list ptra_list</p>
      <p>&lt;tin2, tout2&gt;
&lt;uid3, tf2, tuf2&gt;
&lt;a2, ta2&gt;</p>
      <p>&lt;tin3, *&gt;
&lt;uidj, tfj, *&gt;
&lt;a3, ta3&gt;
&lt;a1, Wa1, obj&gt; ptru_list &lt;uid1, ta1&gt;
&lt;a2, Wa2, obj&gt; ptru_list
&lt;… … ...&gt; ... activities page
&lt;ak, Wak, obj&gt; ptru_list
Figure 3: Physical Model Example
&lt;uid2, ta4&gt;</p>
    </sec>
    <sec id="sec-7">
      <title>4. INDEXES</title>
      <p>The storage model works for storing temporal social
network, when processing queries, the sequential scanning would
be ine cient under large volume. Thus it is essential to
develop indexing technique for accelerating query processing.
However, it is di cult to build one almighty index structure
to cover users, relationships and activities with temporal
information. We design 2 structures, one is called Temporal
Users and Relationships tree (TUR-tree), for temporally
indexing users and relationships, the other is called Temporal
Users and Activities tree (TUA-tree), for users and their
activities.
4.1</p>
    </sec>
    <sec id="sec-8">
      <title>TUR-tree</title>
      <p>
        Considering that both users and relationships are
associated with time interval, and queries are subject to time
range type, thus, we propose to use MVB-tree[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to index
users and relationships. Our main contribution is to adapt
temporal users and relationship to MVB-tree.
      </p>
      <p>In MVB-tree, the entry of leaf node is formatted as &lt;key,
ts, te&gt;, where key is the value to index, and [ts, te) is the
valid time interval for key. On the other hand, the entry
of non-leaf node is formatted as &lt;key, ts, te, subnodeid&gt;,
where the function of key is for comparing and routing, and
subnodeid points to the root of subtree.</p>
      <p>In our scenario, user id is usually used as key to query, so
we can directly use user id as the key in MVB-tree. However,
relationship is di erent from user id, and can't be applied
to MVB-tree straightforward. Hence, we use encoding
technique to generate keys for MVB-tree. In particular, we use
string concatenating to generate keys both for users and
relationships. To generate key for a user vi identi ed by uidi,
we combine 0 and uidi, resulting 0juidi (j represents
concatenating operator), as the key in MVB-tree. For relationship
of uidi and uidj , we use 1 as pre x, and concatenate uidi
and uidj , resulting 1juidijuidj and 1juidj juidi, note that,
for undirected graph, we generate two keys for one
relationship. Additionally, for fast looking up relationships when
given a user, links are made among user leaf node entry and
its corresponding relationship leaf nodes.</p>
      <p>Figure 4 depicts an example for construction of TUR-tree.
It illustrates the process that three users joins social network
and then make friends. Assuming that the length of user id
is three, and it is only composed of 0-9 and a-z (this
assumption is also used in the rest of the paper). In particular, three
users with identi ers 001, 002 and 003 joins the network at
time 1, 2 and 4, respectively, and 001 and 002 make friends
at time 3, and 002 make friends with 003 at time 5. In this
example, we set the capacity of a node is 3, and Figure 4(a)
presents the result structure of TUR-tree after 001 and 002
join the network, i.e., there is only one node, namely node
A. After 001 and 002 make friends, two entries
representing the relationship are inserted into node A, which causes
over ow of A, and according to the split rule of MVB-tree, a
version split should be carried out, i.e., copying the current
version of entries in node A, and combining them with the
new entries to form a new node, if it is over ow, key split
is carried out, after that for building links between user leaf
node entry and its relation leaf nodes, entry &lt;0j001, 1, *&gt;
points to the node containing &lt;1j001j002, 3, *&gt;, and
similar for other user leaf node entries, thus the above sequential
operations generate a structure in Figure 4(b). In node B,
the two entries are copied from node A, while for node A,
root R1's rst entry refers to it with time interval [1, 3).
At time 4, user 003 joins the network, resulting insertion to
node B, and Figure 4(c) describes the structure. At time
5, user 002 make friends with 003, two new entries are
inserted, similar to the process in Figure 4(b), this causes node
C over ow, and such over ow propagates to root R1, similar
to leaf node split, R1 is split into R1 and R2, and Figure 4
shows the result.
4.2</p>
    </sec>
    <sec id="sec-9">
      <title>TUA-tree</title>
      <p>
        The temporal information associated with activities is
different from that with users and relationships, i.e., when user
participates an activity, the temporal type is time stamp not
interval, hence it is not necessary to use MVB-tree. Another
fact is for querying activities, activity identi ers are not
usually used as searching keys, but user ids, time predicates and
keywords. Thus, TUA-tree is designed to cover user id, time
and keywords, in fact it is a hybrid structure of B+-tree and
Bloom Filter[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We give the detail structure of TUA-tree
as following.
      </p>
      <p>TUA-tree is a B+-tree-like structure, the entry of leaf node
is formatted as &lt;key, ptr&gt;, where ptr points to an activity,
and key is concatenation of user id and time associated with
the activity. Node split, merge and redistribution operations
are carried out according to key. The internal node structure
is similar to that of B+-tree, i.e., entries are distinguished
0|001
[1, *)
0|002
[2, *)
0|001
[1, *)
0|002
[2, *)
by router and pointer, and the number of pointer entries is
larger than that of router entries by one, di erent from
B+tree, pointer entry also contains Bloom Filter of all keywords
in the referred subtree.</p>
      <p>Figure 5 illustrates a process of TUA-tree construction,
and the capacity of tree node is set to 2. First, user 001
participates in activity a1 at time 1, then 002 get involved
in a1 at time 2, resulting a leaf node showed in Figure 5(a).
Then user 003 publishes activity a2 at time 3, which causes
over ow, and node split operation follows, resulting a new
root linking two leaf nodes, we can see from Figure 5(b), the
left pointer entry in the root node contains Bloom Filters
BF1 which is calculated from the keywords from a1,
similarly, BF2 is calculated from the keywords from a2. After
that, user 001 participates a2 at time 4 (Figure 5(c)), and
user 004 posts activity a3 at time 5 (Figure 5(d)). At last,
user 005 and 001 participates in a4 at time 6 and 7,
respectively, resulting the root node split and tree level increased,
and BF6 contains keywords from a1, a2 and a4, while BF7
contains keywords from a1, a2, a3 and a4.</p>
    </sec>
    <sec id="sec-10">
      <title>QUERY PROCESSING</title>
      <p>In this section, we present algorithms for processing FIA
query, UTF query and GURD query.
5.1</p>
    </sec>
    <sec id="sec-11">
      <title>FIA Query</title>
      <p>Given a user vq, a keyword set Wq, and a time interval [ts,
te], a FIA query (vq, Wq, ts, te) aims to nd a set of pairs
f&lt;vk, lak&gt;g, in each of which vk is vq's friend valid during
the period [ts, te], lak is a list of activities participated in
by vk during [ts, te], and each activity in lak overlaps Wq
(i.e., the intersection of activity's keyword set and Wq is not
null).</p>
      <p>For processing a FIA query (vq, Wq, ts, te), an ideal
approach is using join algorithm to search TUR-tree and TUA
tree simultaneously, however such concept can not be
realized by our structures, due to combination of user's and his
friend's id representing relationship key in the index. We
propose a two-phase processing algorithm, i.e., rst
TURtree is traversed to retrieve friends of vq satisfying the
temporal predicate, then these friends as well as (Wq, ts, te) are
used as query input to TUA-tree to look up nal results.</p>
      <p>In particular, for the rst phase, a MVB-tree range query
([1jvqj000, 1jvqjzzz], [ts, te]) is generated, and submitted to
TUR-tree to retrieve vq's friends, set F , valid between ts
and te. The rst phase is a straightforward invocation of
MVB-tree's searching method.</p>
      <p>In the second phase, a query which consists of F and (Wq,
ts, te) is submitted to TUA-tree to retrieve nal results.
Note that, in such query, F is a set containing users, if
TUAtree is traversed at one time for each element in F , it would
be costly. Hence, we propose a batch search algorithm for
query F with (Wq, ts, te) in TUA-tree. Initially, the root
of TUA-tree is loaded into memory, including router entries
(re) and pointer entries (pe), formatted as root=&lt;pe0, re1,
pe1, re2. pe2, ..., ren, pen&gt;. For batch comparison, we
generate an interval intvl=[fminjts, fmaxjte], where fmin
and fmax are minimum and maximum user id in F ,
respectively. After that, for each pei, if intvl intersects with (rei,
rei+1] (note that, for the rst pe, pe0, the interval should
be ( 1, re1], and (ren, +1) for the last one, for
simplicity, we omit the judgment in the algorithm presentation)
as well as Bloom Filter in pei intersects with Wq, the child
node referenced by pei is inspected by query (Fsub, Wq, ts,
te), where Fsub is the intersection of F and the users set in
[rei 1, rei+1]. When leaf node of TUA-tree is reached, each
entry is examined whether it satisfy current query predicate,
and the results are fetched into list.</p>
      <p>Algorithm 1 presents the second phase of FIA query. Tree
root is loaded in line 1, and then from line 3 to line 19,
function F IA2nd-P () recursively traverses the tree, with
continuously dividing F into subset. If the parameter node is
an internal node (line 4), variable intvl is generated (line 5).
From line 6 to 11, each pointer entry of the node is inspected
to detect whether its child node satisfy the predicate, if it
does, only the users who are contained by the interval [rei,
rei+1] are preserved to descend the subtree (line 8 and 9).
Otherwise, i.e., node is a leaf (line 12), each entry of the leaf
is inspected (line 13 and 14), and the result is added into
Rlist (line 15).</p>
      <p>FIA query 2nd-P processing example. We use
Figure 5 (e) as a running example, and assuming query is (001,
002, a2.Wa2 , 3, 5) and the keyword sets of each pair
activities are not overlapped. First, root node &lt;pe0, 001j7,
pe1&gt; is loaded, then for pe0, [001j3, 002j5]\( 1, 001j7]6=
and BF6 contains keywords in a2, hence the subtree
referenced by pe0 is traversed, then similarly for pe1, the subtree
of which is also traversed. Then node A and B is loaded,
the similar processing is carried out until descending the
leaf node entries &lt;001j1, P tra1 &gt;, &lt;001j4, P tra2 &gt;, &lt;002j2,
P tra1 &gt;, &lt;003j3, P tra2 &gt;, and then by function compare(),
&lt;001j4, P tra2 &gt; is the result.
5.2</p>
    </sec>
    <sec id="sec-12">
      <title>UTF Query</title>
      <p>Given a keyword set Wq, and a time interval [ts, te], a
001|1 002|2
Ptr-a1 Ptr-a1
(e)</p>
      <p>B
001|1 002|2
Ptr-a1 Ptr-a1
003|3
Ptr-a2
UTF query (Wq, ts, te) aims to nd a set of pairs f&lt;vk,
F Ak&gt;g, in each of which vk is the user existing in the social
network during [ts, te], and F Ak is a set f&lt;vjk, lajk&gt;g similar
to the results of FIA query, i.e., vjk is vk's friend, and lajk
is a list of activities participated in by vjk during [ts, te] as
well as satisfying overlapping Wq.</p>
      <p>The aim of previous query, FIA query, is to nd a list
of friends of the given user, while for UTF query, the aim
is di erent, which is to nd a set of users active during a
given time period, whose friends take part in some certain
activities. The processing for UTF query is a little similar
to that of FIA query, i.e., there are also two phases, rst is
using TUR-tree, then TUA-tree is traversed.</p>
      <p>In the rst phase, a query formatted as ([0j000, 0jzzz], [ts,
te]) is submitted to TUR-tree, using range query algorithm
of MVB-tree, a set of leaf node entries which contain the
satis ed users are retrieved, then following the links to leaf
node of the corresponding relationships, a set of friends are
fetched, speci cally, for each result user vi, a list friends Fi
of vi is formed.</p>
      <p>In the second phase, a query formatted as (Si(Fi), Wq,
ts, te) is submitted to TUA-tree, similar to 2nd-P in FIA
query, batch query is executed and a set F2nd of friends and
corresponding activities are found. At this point, for each vi
found in the rst phase, we check whether his friend list Fi
fetched in the rst phase intersects with the friend set F2nd
in the second phase, i.e., Finti =Fi \ F2nd, if Finti is not null,
then vi and Finti as well as the corresponding activities are
collected as results.</p>
      <p>Algorithm 2 UTF Query Processing
Input: q=(Wq, ts, te)
Output: Rlist
1: LE=T U RRangeQuery([0jmin, 0jmax], [ts, te])
2: f&lt;vi, Fi&gt;g=f etchRelations(LE)
3: F2nd=2nd-P (Si(Fi), Wq, ts, te)
4: for each Fi do
5: Finti =Fi \ F2nd
6: if Finti 6= then
7: Rlist &lt;vi, Finti &gt;
8: end if
9: end for
10: return Rlist</p>
      <p>Algorithm 2 presents the processing of UTF query. In
line 1, function T U RRangeQuery retrieves the users who
are active between ts and te, and then through links, the
corresponding friends are fetched in line 2. Next, similar to
the second phase in FIA, in line 3, set Si(Fi) is re ned by
condition (Wq, ts, te) using function 2nd-P . From line 4
to the end, for each Fi, intersection examination is made to
re ne and then collect the results.</p>
      <p>
        UTF query example. We use Figure 4 (d) and 5 (e) as
a running example, and assuming query input is (a1.Wa1 ,
1, 2). In the rst phase, through the TUR-tree, we nd leaf
node &lt;0j001, 1, *, P tr001&gt;, &lt;0j002, 2, *, P tr002&gt; satisfying
time condition [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Then using P tr001, the set of friends
F001=f002g of user 001 is found, similarly, F002=f001; 003g
is returned. At the second phase, rstly a total friends set
f001; 002; 003g is formed, then using algorithm 2 with input
(f001; 002; 003g, a1.Wa1 , 1, 2), F2nd=f001; 002g is returned.
Then because F001\F2nd6= and F002\F2nd6= , the result is
f&lt;001, f&lt;002, a1&gt;g&gt;, &lt;002, f&lt;001, a1&gt;g&gt;g.
5.3
      </p>
    </sec>
    <sec id="sec-13">
      <title>GURD Query</title>
      <p>Given a number m, a time length (duration) td, and a
keyword set Wq, a GURD query (m, td, Wq) aims to nd
groups of users, in each group of which users form a
connected graph, and the number of users is m, and average
relationship duration (ARD) is not less than td, and for
each user in the group, at least one activity participated in
overlaps Wq. Here, ARD is de ned as following: assuming
the number of users in a group is n, and let d(i; j) be the
relationship duration between user vi and vj in the group
(current time minus relationship establishing time, note that
this query implies all the connections in a group are valid
at current), if there is no relationship between vi and vj ,
d(i; j)=0, then</p>
      <p>ARD =</p>
      <p>Pn
i6=j d(i; j)
n(n
1)
(1)
Note that, in equation (1), for vi and vj , both d(i; j) and
d(j; i) are taken into consideration.</p>
      <p>The aim of GURD query is to nd groups of users, which
is totally di erent from the previous two queries. In this
query processing, TUA-tree is mainly used to accelerate the
process, this is because calculating user group is costly using
TUR-tree. The basic work ow is rst using TUA-tree to
nd a set Uc of users who participate in the activities
constrained by Wq, then we propose a group forming algorithm
to construct satis ed groups iteratively. The goal of the
algorithm is to terminate the iteration as early as possible.</p>
      <p>In particular, Wq is submitted to TUA-tree, at this time,
only Bloom Filter is used to route the query descending the
candidate subtree, resulting Uc, each user of which
satises the activity condition. Next is our group forming
algorithm. We rst calculate relationship duration between
each pair of users in Uc if they are friends, such step could
be accomplished by searching TUR-tree or directly
looking up the storage model. Then we sort the relationship
duration of each pair of friends in Uc in descending order,
resulting RD= f&lt;(vi, vj ), dik;j &gt; j vi, vj 2Uc ^ vi and vj
are friends ^ 1 k jRDjg, where dik;j is the relationship
duration between vi and vj , and is the kth one in
descending order of jRDj durations. Next, top element is
constantly removed from RD, and we compare the following two
variables: Rumk =Pkn+=(k(m 1)m=2) din;j and Rmrs=td m(m
1)=2, if Rumk Rmrs, it implies possibility for nding groups
satisfying the ARD requirement, and then vi and vj are used
to nd their friends from the rest users of RD, and for each
possible connected graph containing vi and vj with vertex
amount being m, ARD is calculated and only the group with
ARD td is collected into result list. Otherwise, i.e., Rumk
&lt; Rmrs, which means it is impossible for this edge to nd
other connected vertices to form an m member group whose
ARD is not less than td, hence, the rest elements in RD are
discarded, and iteration is terminated.</p>
      <p>Algorithm 3 GURD Query Processing
Input: q=(m, td, Wq)
Output: Rlist
1: Uc=searchT U Atree(Wq)
2: F D=sortF D(Uc)
3: while F D 6= do
4: F D =F D n f&lt;(vi, vj ), dik;j &gt;g
5: if Rumk &lt; Rmrs then
6: break
7: else
8: Gi;j;m=constructGroups(vi, vj , F D, m)
9: for each g 2 Gi;j;m do
10: if g.ARD td then
11: Rlist g
12: end if
13: end for
14: end if
15: end while</p>
      <p>Algorithm 3 presents GURD query processing. In line 1,
Wq is searched in TUA-tree to nd quali ed users, which
are organized as friend pairs sorted by their relationship
duration (line 2). From line 3, top element in F D is
constantly removed (line 4) and Rumk is calculated, if Rumk
&lt; Rmrs, iteration is terminated (line 6), otherwise, function
constructGroups nds a set Gi;j;m of connected graphs, each
of which contains m users including vi and vj (line 8), and
each group in Gi;j;m is inspected (line 9) and the one with
ARD td is added into Rlist (line 10 and 11).</p>
      <p>Group forming example for GURD query. We use
Figure 6 as a running example. Assuming Uc=fv1; v2; :::; v10g,
m=3 and td=5, hence Rmrs=15. After sorting, F D=f&lt;(v9,
v10), 9&gt;, ... ,&lt;(v5, v8), 1&gt;g. In iteration 1, &lt;(v9, v10), 9&gt; is
removed from F D, due to Rum1 =9+8+7=24 &gt; Rmrs,
function constructGroups is invoked, resulting G9;10;3=f(v5, v9,
v10)g, and ARD of (v5, v9, v10) is (3+9)/3&lt;5, so the group
is discarded. Similarly, in iteration 2, due to Rum2 =22 &gt;
Rmrs, so G1;5;3=f(v1, v2, v5), (v1, v3, v5), (v1, v5, v8), (v1,
v5, v10)g, and at this time Rlist=f(v1, v2, v5)g. At iteration
5, due to Rum5 =14 &lt; Rmrs, the iteration is terminated and
nal result Rlist=f(v1, v2, v5)g is returned.
6.</p>
    </sec>
    <sec id="sec-14">
      <title>EXPERIMENTAL EVALUATION</title>
      <p>In this section, we experimentally evaluate our three queries
on a hybrid dataset, where hybrid means that, we
download a real dataset describing YouTube social network from
KONECT1, however, this dataset only contains edges and
their time stamps (friend-making time), hence, base on this
dataset, we synthetically generate user login and logout time,
activities and participation time. In particular, for each
user, 0 to 10 login and logout pairs are randomly
generated, and edges are randomly picked up to be assigned an
unfriending time stamp (this is because there are already
friend-making time stamps in the original dataset). Then
we generate 5 million activities containing di erent keyword
sets, and we assign users to activities in a Zipf distribution,
and randomly assign them participating time stamps. Table
1 describes our dataset, the rst column means the amount
of the items with temporal dimension, the second column
means the amount without temporal dimension.
users
relationships
activities</p>
      <p>We implement TUR-tree and TUA-tree as well as query
processing algorithms in Java. To make comparison, we use
non-index query processing as a baseline, i.e., for the three
kinds queries, the storage model is directly accessed and
sequential scans are carried out. And then we use our two
indexes to process the three queries according to the
approaches in section 5. We vary the query parameters and
at each testing point, 10 queries are issued to collect the
average results. The experiments are conducted on a DELL
server with Intel(R) Xeon(R) 2.40GHz processor, 8GB
memory and 500GB disk. Table 2 describes the query
parameters.
6.1</p>
    </sec>
    <sec id="sec-15">
      <title>FIA Query</title>
      <p>In this experiment, rst, we vary user's degree, and
Figure 7(a) shows the results. Response time increases with
1http://konect.uni-koblenz.de/networks/youtube-u-growth</p>
      <p>Iteration 3</p>
      <p>v8</p>
      <p>v8
v5 1</p>
      <p>3
v4 v10</p>
      <p>2 v6
Rum=14 &lt; Rmrs
Rlist={(v1,v2,v5)}
v8&lt;v1,v2&gt; 6</p>
      <p>&lt;v1,v3&gt; 5
4 &lt;v2,v5&gt; 3
&lt;v5,v10&gt; 3
&lt;v4,v6&gt; 2
&lt;v5,v8&gt; 1
degree for both non-index and index approaches. This is
because a user with a large degree (meaning he has lots
of friends) would involve a wide search in TUR-tree and
then in TUA-tree. However, we can see the performance of
non-index degenerates quickly, while our two indexes
accelerate the processing apparently, above 50% order of
magnitude. Speci cally, response time for non-index increases
linearly, while it is logarithmic for the index one, due to
the fact TUR-tree adapts MVB-tree's architecture which is
asymptotic to the performance of B-tree, and TUA-tree is
similar to B-tree. Next, Figure 7(b) shows the results on
varying the number of querying keywords, we can see the
larger the number of keywords is, the more responding time
it takes, due to more comparisons would happen when
sequential scanning on disk or traversing TUA-tree. At this
time, we can see the change of response time is slower than
that in Figure 7(a), this can be explained that the keywords
have less impact than user's degree to query performance.
Figure 7(c) presents the results on varying time selectivity
(length[ts;te]/temporal extent), still we can see our indexes
accelerate processing logarithmically and have a good
scalability.
6.2</p>
    </sec>
    <sec id="sec-16">
      <title>UTF Query</title>
      <p>Next, similar to FIA query experiment, we vary number of
keywords and time selectivity for testing UTF query
performance. From Figure 8, we can see processing time of UTF
query is larger than that of FIA query, this is because there
is no given user in UTF query, which causes a large
cardinality query for searching storage model or TUR-tree, more
items would be compared. However, we can see TUR-tree
and TUA-tree greatly accelerate query processing.
6.3</p>
    </sec>
    <sec id="sec-17">
      <title>GURD Query</title>
      <p>In the following, we vary the number m of group members,
Figure 9(a) shows that response time increases with m on
both methods, this is because a larger m involves more
combinations to be examined, however, we can see TUA-tree and
group forming algorithm is e ective for the premium
performance they show. Then we vary the number of keywords,
and results are similar to that of previous two queries (see
Figure 9(b)). Figure 9(c) shows the results when increasing
1 2 3 4 5</p>
      <p>n u m b e r o f k e y w o r d s
(b) E ect of keywords</p>
      <p>Figure 7: Results of FIA Queries
inn od ne -xin d e x</p>
      <p>inn od ne -xin d e x
5 0 0 0 0</p>
    </sec>
    <sec id="sec-18">
      <title>7. RELATED WORK</title>
      <p>In recent years, plenty of works have focused on e cient
methods for managing and querying social networks. One
branch of works on social networks focus on e cient
algorithms and data structures for searching users and friends of
interest. However, these studies focus on evolution of graphs
and topological characteristics of social network, ignoring
examining how the social network links are being used by
users to interact. Hence, some researchers propose activity
network, a network that is based on the actual interaction
between users rather than mere social network, and argue
that this is more suitable to social network-based
applications. B. Viswanath et al. study the activity network from
3 2 0 0 0
8 0 0 0
5 0 0 0
nin od ne -xin d e x
ni n o d n e - xi n d e x
4 5 6 7</p>
      <p>m
(a) E ect of m
8
9
5
1 2 3 4 5</p>
      <p>n u m b e r o f k e y w o r d s
(b) E ect of keywords</p>
      <p>
        Figure 9: Results of GURD Queries
Facebook New Orleans network[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and observe that the
activity network is rapidly changing over time, but many of
the global structural properties remain relatively constant.
      </p>
      <p>
        P. Holme et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] present the emergent eld of
temporal networks, and discuss methods for analyzing topological
and temporal structure, and models for elucidating their
relations to the behaviors of dynamical systems. However,
they only discuss some models proposed for temporal
social networks and epidemiological contact networks without
query processing method.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the authors present a model for capturing graph
evolution through time based on snapshots and deltas, and
introduce a general two-phase query plan based on snapshot
reconstruction to evaluate any historical query. However,
this work dose not provide e cient techniques and index
structures for various types of queries.
      </p>
      <p>
        U. Khurana et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] build a graph data management
system that focuses on optimizing snapshot retrieval queries
over historical traces, and on supporting rich temporal
analysis of large networks. Another storage approach was
proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], namely, the historical evolving graph sequence
(EGS). Various snapshots and deltas are explicitly stored,
with temporally close snapshots being clustered together.
Nevertheless, if the query asks for a single time point or a
time interval (i.e., not the whole history), this approach will
be ine ective.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a storage model maintains the current graph and
deltas to previous time snapshots; as a result, the rst step
of evaluating a historical query is to reconstruct the
corresponding snapshot or snapshots that relate to the query's
temporal predicate. However such a reconstruction phase
is costly, and this is an issue also in other related works[
        <xref ref-type="bibr" rid="ref4 ref5">5,
4</xref>
        ]. Chronos[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a storage and execution engine designed
and optimized speci cally for running in-memory iterative
graph computation on temporal graphs, which takes further
in locality, parallelism, and incremental computation.
      </p>
      <p>
        However, in work[
        <xref ref-type="bibr" rid="ref2 ref4 ref5 ref7">5, 4, 7, 2</xref>
        ], the activity network is not
considered. The work[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] de nes two kinds of entities in
graph: users and objects. In our paper, we combine user
social network and activity network, which consists of two
kinds of entities: users and activities, and various temporal
social queries can be satis ed.
8.
      </p>
    </sec>
    <sec id="sec-19">
      <title>CONCLUSIONS</title>
      <p>In this paper, we argue that time axis in social network
is an important and useful tool to give insight into retrieval
or statistics, and augmenting temporal query capability in
such context is meaningful. We propose three kinds queries,
namely FIA, UTF and GURD query, and model the
temporal social network (TSN) with users, relationship and
activity as well as corresponding temporal labels. A storage
model is designed to logically and physically represent TSN,
and then we propose two index structures, TUR-tree and
TUA-tree for accelerating query process. We propose
algorithms of query processing for the three queries, and evaluate
our idea on a dataset which is synthetically generated from
real dataset, and experiment results show that our indexes
and query processing are e ective and scalable. In the
future, we plan to extend this work to geo-location application,
and solve spatio-temporal queries in social networks.
9.</p>
    </sec>
    <sec id="sec-20">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work is supported by NSF of China grant 61303062
and 71331008. We would like to thank Prof. Dai and Dr.
Hu for helping with the proof.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gschwind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ohler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Seeger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Widmayer</surname>
          </string-name>
          .
          <article-title>An asymptotically optimal multiversion b-tree</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <volume>264</volume>
          {
          <fpage>275</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Chronos: A graph engine for temporal graph analysis</article-title>
          .
          <source>In Proceedings of the Ninth European Conference on Computer Systems</source>
          , EuroSys '
          <volume>14</volume>
          , pages
          <issue>1:1</issue>
          {1:
          <fpage>14</fpage>
          , New York, NY, USA,
          <year>2014</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Holme</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sarama</surname>
          </string-name>
          <article-title>ki. Temporal networks</article-title>
          .
          <source>Physics Reports</source>
          ,
          <volume>519</volume>
          (
          <issue>3</issue>
          ):
          <volume>97</volume>
          {
          <fpage>125</fpage>
          ,
          <year>2012</year>
          . Temporal Networks.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>U.</given-names>
            <surname>Khurana</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          .
          <article-title>E cient snapshot retrieval over historical graph data</article-title>
          .
          <source>In Data Engineering (ICDE)</source>
          ,
          <year>2013</year>
          IEEE 29th International Conference on, pages
          <volume>997</volume>
          {
          <fpage>1008</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Koloniari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Souravlias</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Pitoura</surname>
          </string-name>
          .
          <article-title>On graph deltas for historical queries</article-title>
          .
          <source>CoRR, abs/1302.5549</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Moon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Saltz</surname>
          </string-name>
          .
          <article-title>Analysis of the clustering properties of the hilbert space- lling curve. Knowledge and Data Engineering</article-title>
          , IEEE Transactions on,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):
          <volume>124</volume>
          {
          <fpage>141</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , and R. Cheng.
          <article-title>On querying historical evolving graph sequences</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>11</issue>
          ):
          <volume>726</volume>
          {
          <fpage>737</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Stefanidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Koloniari</surname>
          </string-name>
          .
          <article-title>Enabling social search in time through graphs</article-title>
          .
          <source>In Proceedings of the 5th International Workshop on Web-scale Knowledge Representation Retrieval &amp; Reasoning</source>
          , Web-KR '
          <volume>14</volume>
          , pages
          <fpage>59</fpage>
          {
          <fpage>62</fpage>
          , New York, NY, USA,
          <year>2014</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Viswanath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mislove</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cha</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          .
          <article-title>On the evolution of user interaction in facebook</article-title>
          .
          <source>In Proceedings of the 2Nd ACM Workshop on Online Social Networks, WOSN '09</source>
          , pages
          <fpage>37</fpage>
          {
          <fpage>42</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>