Categories and Subject Descriptors

Temporal Social Network: Storage, Indexing and Query Processing

Xiaoying Chen

0 2

Chong Zhang

0 2

Bin Ge

3gebin1978@gmail.com 0 2

Weidong Xiao

0 2 0 Algorithms , Measurement, Performance 1 Collaborative Innovation Center of Geospatial Technology , China 2 Science and Technology on Information Systems Engineering Laboratory National University of Defense Technology , Changsha 410073 , China

With the increasing of requirements from many aspects, various queries and analyses arise focusing on social network. Queries like nding users, friends or social activities satisfying a certain period gives temporal insights into retrieval or statistics, hence augmenting temporal query capability in such context, temporal social network (TSN), is meaningful. We propose three kinds temporal queries in social network, aiming to explore temporal dimension in users, relationships and social activities. A storage model is designed to logically and physically represent TSN, and then we propose two index structures, TUR-tree and TUA-tree for accelerating query process. Query processing algorithms are designed for the three queries, and we evaluate our idea on a dataset which is synthetically generated from real dataset, and experimental results show that our indexes and query processing are e ective and scalable.

Categories and Subject Descriptors

H.2.4 [Database Management]: Systems - query processing

General Terms 1. INTRODUCTION

Online social networks have become more and more popular, numerous sites such as Facebook, Twitter and LinkedIn This work is supported by NSF of China grant 61303062 and 71331008. allow users to interact and share content using social links. Based on vast amount of information contained in social networks, personalized and expressive search features are essential for users, vendors or third-parties based on di erent purposes. In various kinds of data, time is a common and necessary dimension in social networks. For instance, when a user logins or logouts the community system, there is often a time stamp recording such action. When two users become friends, a time stamp also records this event, and similar for unfriending case. Similarly, when a user posts text, photo, video, or comments others' posts, or shares web links, these social activities are all recorded with time stamps.

When these temporal information is taken into consideration in an ordinary query for social networks, it becomes interesting to discover that some historical insights are associated with the results, e.g., nd some users who are active during a certain period, or nd pairs of friends valid in some year, or nd some activities posted at some time. Such queries add temporal axes for user requirement, which is able to distinguish old and new, or active and inactive users, friends or activities. However, such kind of query functions are not studied well in academic community, nor developed in industrial platforms.

In this paper, our goal is to explore temporal queries in social networks associated with time dimension, we call it temporal social networks (TSN). And we focus on the following three kinds of queries.

(1) Friends of Interesting Activities (FIA) query. FIA queries aim to nd a user's friends who are involved in some given social activities. For instance, George wants to nd which friends of him post the texts containing co ee and pasta during last two weeks. This query would help users to nd interesting events or people along the time axis. (2) Users of Time Filter (UTF) query. UTF queries aim to nd certain users who are not only active during a given time interval, i.e., the period of logon status intersects with the given time interval, but also whose friends take part in some given interesting activities. For example, a fashion advertiser plans to look for some users who are active in recent month, and the user's friends have taken part in the social activities containing boot, because the advertiser wants to persuade the users to buy their products utilizing in uence from the friendships.

(3) Group of Users with Relationship Duration (GU RD) query. GURD queries aim to nd a set of groups, where the number of users in each group is equal to a given number, and average intimate degree of it satis es a given value (here average intimate degree is measured by average relationship duration, relationship duration is calculated by di erence between current time and friend-make time), and all the members of it have taken part in some given activities. For instance, a pizza restaurant wants to send coupons to groups of users for promotion, so it aims to nd fourmember groups (due to the size of its dining table), and the average relationship duration in each group is not less than one year, and all four members have participated in the activities containing pizza.

Motivated by the important role time may assume in social query, our goal is to design appropriate storage scheme, index for typical temporal-social query, which has been mostly ignored so far, to improve the expressiveness and quality of social search queries. We rst formally describe the problem, and use a storage model to store TSN data, then we design two indexing structure to accelerate the query processing. We implement our indexes and algorithms and empirically verify our methods are feasible. Our main contributions are listed as following:

Three new query patterns are proposed in social network, taking time dimension into consideration.

A storage model for TSN is presented, including logical view and physical model.

Querying processing algorithms are designed, along with two indexing structures to solved the three query patterns.

The rest of this paper is organized as follows. Problem is formally de ned in section 2. In section 3, storage model for TSN is presented. Two indexing structures are described in section 4, followed by query processing in section 5. In section 6, we carry out our experiments, and related works are surveyed in section 7. Finally, section 8 concludes the paper with directions for future works.

2. PROBLEM DEFINITION

In this paper, we focus on the social network of undirected graph, however, it is straightforward to extend it to directed graph. Our model is formally de ned as following:

Given an undirected graph G=(V , E), each vertex vi in V represents a user in the community and is associated with a time interval [vts, vte), meaning the valid period during which vi exists (or being logon status) in community, and [vts, ) means vi is still in the community now. Each edge (vi, vj) in E represents a friend relationship between user vi and vj, and it is also associated with a time interval [ets, ete), in which ets means the time when the relationship is established and ete means the time when the relationship is removed. Similarly, [ets, ) means the relationship still exists at current.

For a given social network graph G, there is also a set A of activities, in this paper, we use term activity to denote social event in the social network, e.g., publishing a post, sharing a link and etc. Each vi in G can connect to an activity ak in A, which means that vi gets involved in ak, we use term participation to describe the relationship between user and activity, e.g., in social network, user may post some texts, or forward other's post, or share a link of web page, or comment other's activity, or add some activity into favorite, in general, we call these actions as participations. Without loss generality, each activity can be represented as <aid, Wa>, where aid is the activity identi er, Wa is a keyword set to describe the activity. Assuming user vi publishes a post ak in the Facebook at time tp, and then her/his friend vj comments the post at some time tq, thus the two participating actions could be formally represented by (vi, ak, tp) and (vj, ak, tr), respectively.

Figure 1 illustrates a temporal social network, where users and activities are plotted as dots and triangles, respectively. The relationship are divided into two categories: user-touser (solid line) and user-to-activity (dashed line). And the label on edge indicates time interval of relationship or time stamp when a user participate in an activity. For instance, user v1 logins community at time vt1 and does not logout at current, and user v1 and v2 become friends at et1 and unfriend each other at et3, and v4, v7 and v2 participate in a2 at t4, t5 and t6, respectively.

(vt10, *) v10 (vt1, *)

v1 (et3, *) (et1, et3) v2 (vt2, *) (et2, *)

t6 (et4, *) v3 (vt3, vt5)

t2 t1

v5 (vt5, *) t3 W2 a2 t4 v4 (vt4, *) a1 W1 t5 (et5, *) v9

(vt9, *) (et6, *) v 7 (vt7, *) v6 (vt6, vt9)

3. STORAGE MODEL

Based on the description of temporal social network (TSN), we give TSN's storage model, in which two perspectives are presented: logical view and physical model. 3.1

Logical View

In logical view, users and activities are partitioned horizontally, and social events are partitioned vertically along the time axis. In the upper half of the logical view, along the time axis, the actions which a speci c user makes are presented in chronological order. Further, in the lower half, a list of users which participate in a speci c activity also can be obtained along the time axis. Figure 2 shows an example of the logical view, along the time axis, user v1 logins social network at time t1, makes friends with user v2 at time t2 and participates in activity a1 at time t3. And activity a1 is participated in by user v1 and v2 at time t3 and t4, respectively. userst1 t2 t3 t4 t5 t6 time v1 (v1,t1)(login) (v1, v2, t2)(be friends) (v1, a1 ,t3)(post) v2 (v2, v1, t2)(be friends) (v2, a1, t4)(like)(v2, v3, t5)(unfriend) v3 (v3, v2, t5)(unfriend)(v3, a2, t6)(share) activities time a1 a2

(v1, a1 ,t3)(v2, a1, t4) To e ciently access temporal social network, we design a physical model for storing TSN. The user and activity information is separately stored, and users are clustered into pages on disk, which is similar for activities. In particular, each user page consists of several items, where each item represents a user, including user id, name and etc, to look up quickly, it also contains three pointers, the rst one references to a list of time intervals, each of which represents a pair of login and logout time stamps of the user, and the second one points to a list, each item of which contains the user's friend id, besides friend-making time and unfriending time, and the third one also points to a list, each item of which contains activity id and corresponding participating time related to the user.

The model for activity is similar to user's, i.e., each item in activity page contains activity id, the link to keywords and other objects in the activity, and a pointer references to a list, where each item represents the user participating in it as well as the time stamp. Additionally, the activity in user page also points to the corresponding item in activity page. Note that, the lists in the physical model do not belong to user page or activity page, they are sequentially stored on disk. Figure 3 illustrates a physical model example. <tin1, tout1> <uid2, tf1, *> users page uid1 ptrt_list ptrf_list ptra_list <a1, ta1> uid2 ptrt_list ptrf_list ptra_list ... ... ... ... uidn ptrt_list ptrf_list ptra_list

<tin2, tout2> <uid3, tf2, tuf2> <a2, ta2>

<tin3, *> <uidj, tfj, *> <a3, ta3> <a1, Wa1, obj> ptru_list <uid1, ta1> <a2, Wa2, obj> ptru_list <… … ...> ... activities page <ak, Wak, obj> ptru_list Figure 3: Physical Model Example <uid2, ta4>

4. INDEXES

The storage model works for storing temporal social network, when processing queries, the sequential scanning would be ine cient under large volume. Thus it is essential to develop indexing technique for accelerating query processing. However, it is di cult to build one almighty index structure to cover users, relationships and activities with temporal information. We design 2 structures, one is called Temporal Users and Relationships tree (TUR-tree), for temporally indexing users and relationships, the other is called Temporal Users and Activities tree (TUA-tree), for users and their activities. 4.1

TUR-tree

Considering that both users and relationships are associated with time interval, and queries are subject to time range type, thus, we propose to use MVB-tree[ 1 ] to index users and relationships. Our main contribution is to adapt temporal users and relationship to MVB-tree.

In MVB-tree, the entry of leaf node is formatted as <key, ts, te>, where key is the value to index, and [ts, te) is the valid time interval for key. On the other hand, the entry of non-leaf node is formatted as <key, ts, te, subnodeid>, where the function of key is for comparing and routing, and subnodeid points to the root of subtree.

In our scenario, user id is usually used as key to query, so we can directly use user id as the key in MVB-tree. However, relationship is di erent from user id, and can't be applied to MVB-tree straightforward. Hence, we use encoding technique to generate keys for MVB-tree. In particular, we use string concatenating to generate keys both for users and relationships. To generate key for a user vi identi ed by uidi, we combine 0 and uidi, resulting 0juidi (j represents concatenating operator), as the key in MVB-tree. For relationship of uidi and uidj , we use 1 as pre x, and concatenate uidi and uidj , resulting 1juidijuidj and 1juidj juidi, note that, for undirected graph, we generate two keys for one relationship. Additionally, for fast looking up relationships when given a user, links are made among user leaf node entry and its corresponding relationship leaf nodes.

Figure 4 depicts an example for construction of TUR-tree. It illustrates the process that three users joins social network and then make friends. Assuming that the length of user id is three, and it is only composed of 0-9 and a-z (this assumption is also used in the rest of the paper). In particular, three users with identi ers 001, 002 and 003 joins the network at time 1, 2 and 4, respectively, and 001 and 002 make friends at time 3, and 002 make friends with 003 at time 5. In this example, we set the capacity of a node is 3, and Figure 4(a) presents the result structure of TUR-tree after 001 and 002 join the network, i.e., there is only one node, namely node A. After 001 and 002 make friends, two entries representing the relationship are inserted into node A, which causes over ow of A, and according to the split rule of MVB-tree, a version split should be carried out, i.e., copying the current version of entries in node A, and combining them with the new entries to form a new node, if it is over ow, key split is carried out, after that for building links between user leaf node entry and its relation leaf nodes, entry <0j001, 1, *> points to the node containing <1j001j002, 3, *>, and similar for other user leaf node entries, thus the above sequential operations generate a structure in Figure 4(b). In node B, the two entries are copied from node A, while for node A, root R1's rst entry refers to it with time interval [1, 3). At time 4, user 003 joins the network, resulting insertion to node B, and Figure 4(c) describes the structure. At time 5, user 002 make friends with 003, two new entries are inserted, similar to the process in Figure 4(b), this causes node C over ow, and such over ow propagates to root R1, similar to leaf node split, R1 is split into R1 and R2, and Figure 4 shows the result. 4.2

TUA-tree

The temporal information associated with activities is different from that with users and relationships, i.e., when user participates an activity, the temporal type is time stamp not interval, hence it is not necessary to use MVB-tree. Another fact is for querying activities, activity identi ers are not usually used as searching keys, but user ids, time predicates and keywords. Thus, TUA-tree is designed to cover user id, time and keywords, in fact it is a hybrid structure of B+-tree and Bloom Filter[ 6 ]. We give the detail structure of TUA-tree as following.

TUA-tree is a B+-tree-like structure, the entry of leaf node is formatted as <key, ptr>, where ptr points to an activity, and key is concatenation of user id and time associated with the activity. Node split, merge and redistribution operations are carried out according to key. The internal node structure is similar to that of B+-tree, i.e., entries are distinguished 0|001 [1, *) 0|002 [2, *) 0|001 [1, *) 0|002 [2, *) by router and pointer, and the number of pointer entries is larger than that of router entries by one, di erent from B+tree, pointer entry also contains Bloom Filter of all keywords in the referred subtree.

Figure 5 illustrates a process of TUA-tree construction, and the capacity of tree node is set to 2. First, user 001 participates in activity a1 at time 1, then 002 get involved in a1 at time 2, resulting a leaf node showed in Figure 5(a). Then user 003 publishes activity a2 at time 3, which causes over ow, and node split operation follows, resulting a new root linking two leaf nodes, we can see from Figure 5(b), the left pointer entry in the root node contains Bloom Filters BF1 which is calculated from the keywords from a1, similarly, BF2 is calculated from the keywords from a2. After that, user 001 participates a2 at time 4 (Figure 5(c)), and user 004 posts activity a3 at time 5 (Figure 5(d)). At last, user 005 and 001 participates in a4 at time 6 and 7, respectively, resulting the root node split and tree level increased, and BF6 contains keywords from a1, a2 and a4, while BF7 contains keywords from a1, a2, a3 and a4.

QUERY PROCESSING

In this section, we present algorithms for processing FIA query, UTF query and GURD query. 5.1

FIA Query

Given a user vq, a keyword set Wq, and a time interval [ts, te], a FIA query (vq, Wq, ts, te) aims to nd a set of pairs f<vk, lak>g, in each of which vk is vq's friend valid during the period [ts, te], lak is a list of activities participated in by vk during [ts, te], and each activity in lak overlaps Wq (i.e., the intersection of activity's keyword set and Wq is not null).

For processing a FIA query (vq, Wq, ts, te), an ideal approach is using join algorithm to search TUR-tree and TUA tree simultaneously, however such concept can not be realized by our structures, due to combination of user's and his friend's id representing relationship key in the index. We propose a two-phase processing algorithm, i.e., rst TURtree is traversed to retrieve friends of vq satisfying the temporal predicate, then these friends as well as (Wq, ts, te) are used as query input to TUA-tree to look up nal results.

In particular, for the rst phase, a MVB-tree range query ([1jvqj000, 1jvqjzzz], [ts, te]) is generated, and submitted to TUR-tree to retrieve vq's friends, set F , valid between ts and te. The rst phase is a straightforward invocation of MVB-tree's searching method.

In the second phase, a query which consists of F and (Wq, ts, te) is submitted to TUA-tree to retrieve nal results. Note that, in such query, F is a set containing users, if TUAtree is traversed at one time for each element in F , it would be costly. Hence, we propose a batch search algorithm for query F with (Wq, ts, te) in TUA-tree. Initially, the root of TUA-tree is loaded into memory, including router entries (re) and pointer entries (pe), formatted as root=<pe0, re1, pe1, re2. pe2, ..., ren, pen>. For batch comparison, we generate an interval intvl=[fminjts, fmaxjte], where fmin and fmax are minimum and maximum user id in F , respectively. After that, for each pei, if intvl intersects with (rei, rei+1] (note that, for the rst pe, pe0, the interval should be ( 1, re1], and (ren, +1) for the last one, for simplicity, we omit the judgment in the algorithm presentation) as well as Bloom Filter in pei intersects with Wq, the child node referenced by pei is inspected by query (Fsub, Wq, ts, te), where Fsub is the intersection of F and the users set in [rei 1, rei+1]. When leaf node of TUA-tree is reached, each entry is examined whether it satisfy current query predicate, and the results are fetched into list.

Algorithm 1 presents the second phase of FIA query. Tree root is loaded in line 1, and then from line 3 to line 19, function F IA2nd-P () recursively traverses the tree, with continuously dividing F into subset. If the parameter node is an internal node (line 4), variable intvl is generated (line 5). From line 6 to 11, each pointer entry of the node is inspected to detect whether its child node satisfy the predicate, if it does, only the users who are contained by the interval [rei, rei+1] are preserved to descend the subtree (line 8 and 9). Otherwise, i.e., node is a leaf (line 12), each entry of the leaf is inspected (line 13 and 14), and the result is added into Rlist (line 15).

FIA query 2nd-P processing example. We use Figure 5 (e) as a running example, and assuming query is (001, 002, a2.Wa2 , 3, 5) and the keyword sets of each pair activities are not overlapped. First, root node <pe0, 001j7, pe1> is loaded, then for pe0, [001j3, 002j5]\( 1, 001j7]6= and BF6 contains keywords in a2, hence the subtree referenced by pe0 is traversed, then similarly for pe1, the subtree of which is also traversed. Then node A and B is loaded, the similar processing is carried out until descending the leaf node entries <001j1, P tra1 >, <001j4, P tra2 >, <002j2, P tra1 >, <003j3, P tra2 >, and then by function compare(), <001j4, P tra2 > is the result. 5.2

UTF Query

Given a keyword set Wq, and a time interval [ts, te], a 001|1 002|2 Ptr-a1 Ptr-a1 (e)

B 001|1 002|2 Ptr-a1 Ptr-a1 003|3 Ptr-a2 UTF query (Wq, ts, te) aims to nd a set of pairs f<vk, F Ak>g, in each of which vk is the user existing in the social network during [ts, te], and F Ak is a set f<vjk, lajk>g similar to the results of FIA query, i.e., vjk is vk's friend, and lajk is a list of activities participated in by vjk during [ts, te] as well as satisfying overlapping Wq.

The aim of previous query, FIA query, is to nd a list of friends of the given user, while for UTF query, the aim is di erent, which is to nd a set of users active during a given time period, whose friends take part in some certain activities. The processing for UTF query is a little similar to that of FIA query, i.e., there are also two phases, rst is using TUR-tree, then TUA-tree is traversed.

In the rst phase, a query formatted as ([0j000, 0jzzz], [ts, te]) is submitted to TUR-tree, using range query algorithm of MVB-tree, a set of leaf node entries which contain the satis ed users are retrieved, then following the links to leaf node of the corresponding relationships, a set of friends are fetched, speci cally, for each result user vi, a list friends Fi of vi is formed.

In the second phase, a query formatted as (Si(Fi), Wq, ts, te) is submitted to TUA-tree, similar to 2nd-P in FIA query, batch query is executed and a set F2nd of friends and corresponding activities are found. At this point, for each vi found in the rst phase, we check whether his friend list Fi fetched in the rst phase intersects with the friend set F2nd in the second phase, i.e., Finti =Fi \ F2nd, if Finti is not null, then vi and Finti as well as the corresponding activities are collected as results.

Algorithm 2 UTF Query Processing Input: q=(Wq, ts, te) Output: Rlist 1: LE=T U RRangeQuery([0jmin, 0jmax], [ts, te]) 2: f<vi, Fi>g=f etchRelations(LE) 3: F2nd=2nd-P (Si(Fi), Wq, ts, te) 4: for each Fi do 5: Finti =Fi \ F2nd 6: if Finti 6= then 7: Rlist <vi, Finti > 8: end if 9: end for 10: return Rlist

Algorithm 2 presents the processing of UTF query. In line 1, function T U RRangeQuery retrieves the users who are active between ts and te, and then through links, the corresponding friends are fetched in line 2. Next, similar to the second phase in FIA, in line 3, set Si(Fi) is re ned by condition (Wq, ts, te) using function 2nd-P . From line 4 to the end, for each Fi, intersection examination is made to re ne and then collect the results.

UTF query example. We use Figure 4 (d) and 5 (e) as a running example, and assuming query input is (a1.Wa1 , 1, 2). In the rst phase, through the TUR-tree, we nd leaf node <0j001, 1, *, P tr001>, <0j002, 2, *, P tr002> satisfying time condition [ 1, 2 ]. Then using P tr001, the set of friends F001=f002g of user 001 is found, similarly, F002=f001; 003g is returned. At the second phase, rstly a total friends set f001; 002; 003g is formed, then using algorithm 2 with input (f001; 002; 003g, a1.Wa1 , 1, 2), F2nd=f001; 002g is returned. Then because F001\F2nd6= and F002\F2nd6= , the result is f<001, f<002, a1>g>, <002, f<001, a1>g>g. 5.3

GURD Query

Given a number m, a time length (duration) td, and a keyword set Wq, a GURD query (m, td, Wq) aims to nd groups of users, in each group of which users form a connected graph, and the number of users is m, and average relationship duration (ARD) is not less than td, and for each user in the group, at least one activity participated in overlaps Wq. Here, ARD is de ned as following: assuming the number of users in a group is n, and let d(i; j) be the relationship duration between user vi and vj in the group (current time minus relationship establishing time, note that this query implies all the connections in a group are valid at current), if there is no relationship between vi and vj , d(i; j)=0, then

ARD =

Pn i6=j d(i; j) n(n 1) (1) Note that, in equation (1), for vi and vj , both d(i; j) and d(j; i) are taken into consideration.

The aim of GURD query is to nd groups of users, which is totally di erent from the previous two queries. In this query processing, TUA-tree is mainly used to accelerate the process, this is because calculating user group is costly using TUR-tree. The basic work ow is rst using TUA-tree to nd a set Uc of users who participate in the activities constrained by Wq, then we propose a group forming algorithm to construct satis ed groups iteratively. The goal of the algorithm is to terminate the iteration as early as possible.

In particular, Wq is submitted to TUA-tree, at this time, only Bloom Filter is used to route the query descending the candidate subtree, resulting Uc, each user of which satises the activity condition. Next is our group forming algorithm. We rst calculate relationship duration between each pair of users in Uc if they are friends, such step could be accomplished by searching TUR-tree or directly looking up the storage model. Then we sort the relationship duration of each pair of friends in Uc in descending order, resulting RD= f<(vi, vj ), dik;j > j vi, vj 2Uc ^ vi and vj are friends ^ 1 k jRDjg, where dik;j is the relationship duration between vi and vj , and is the kth one in descending order of jRDj durations. Next, top element is constantly removed from RD, and we compare the following two variables: Rumk =Pkn+=(k(m 1)m=2) din;j and Rmrs=td m(m 1)=2, if Rumk Rmrs, it implies possibility for nding groups satisfying the ARD requirement, and then vi and vj are used to nd their friends from the rest users of RD, and for each possible connected graph containing vi and vj with vertex amount being m, ARD is calculated and only the group with ARD td is collected into result list. Otherwise, i.e., Rumk < Rmrs, which means it is impossible for this edge to nd other connected vertices to form an m member group whose ARD is not less than td, hence, the rest elements in RD are discarded, and iteration is terminated.

Algorithm 3 GURD Query Processing Input: q=(m, td, Wq) Output: Rlist 1: Uc=searchT U Atree(Wq) 2: F D=sortF D(Uc) 3: while F D 6= do 4: F D =F D n f<(vi, vj ), dik;j >g 5: if Rumk < Rmrs then 6: break 7: else 8: Gi;j;m=constructGroups(vi, vj , F D, m) 9: for each g 2 Gi;j;m do 10: if g.ARD td then 11: Rlist g 12: end if 13: end for 14: end if 15: end while

Algorithm 3 presents GURD query processing. In line 1, Wq is searched in TUA-tree to nd quali ed users, which are organized as friend pairs sorted by their relationship duration (line 2). From line 3, top element in F D is constantly removed (line 4) and Rumk is calculated, if Rumk < Rmrs, iteration is terminated (line 6), otherwise, function constructGroups nds a set Gi;j;m of connected graphs, each of which contains m users including vi and vj (line 8), and each group in Gi;j;m is inspected (line 9) and the one with ARD td is added into Rlist (line 10 and 11).

Group forming example for GURD query. We use Figure 6 as a running example. Assuming Uc=fv1; v2; :::; v10g, m=3 and td=5, hence Rmrs=15. After sorting, F D=f<(v9, v10), 9>, ... ,<(v5, v8), 1>g. In iteration 1, <(v9, v10), 9> is removed from F D, due to Rum1 =9+8+7=24 > Rmrs, function constructGroups is invoked, resulting G9;10;3=f(v5, v9, v10)g, and ARD of (v5, v9, v10) is (3+9)/3<5, so the group is discarded. Similarly, in iteration 2, due to Rum2 =22 > Rmrs, so G1;5;3=f(v1, v2, v5), (v1, v3, v5), (v1, v5, v8), (v1, v5, v10)g, and at this time Rlist=f(v1, v2, v5)g. At iteration 5, due to Rum5 =14 < Rmrs, the iteration is terminated and nal result Rlist=f(v1, v2, v5)g is returned. 6.

EXPERIMENTAL EVALUATION

In this section, we experimentally evaluate our three queries on a hybrid dataset, where hybrid means that, we download a real dataset describing YouTube social network from KONECT1, however, this dataset only contains edges and their time stamps (friend-making time), hence, base on this dataset, we synthetically generate user login and logout time, activities and participation time. In particular, for each user, 0 to 10 login and logout pairs are randomly generated, and edges are randomly picked up to be assigned an unfriending time stamp (this is because there are already friend-making time stamps in the original dataset). Then we generate 5 million activities containing di erent keyword sets, and we assign users to activities in a Zipf distribution, and randomly assign them participating time stamps. Table 1 describes our dataset, the rst column means the amount of the items with temporal dimension, the second column means the amount without temporal dimension. users relationships activities

We implement TUR-tree and TUA-tree as well as query processing algorithms in Java. To make comparison, we use non-index query processing as a baseline, i.e., for the three kinds queries, the storage model is directly accessed and sequential scans are carried out. And then we use our two indexes to process the three queries according to the approaches in section 5. We vary the query parameters and at each testing point, 10 queries are issued to collect the average results. The experiments are conducted on a DELL server with Intel(R) Xeon(R) 2.40GHz processor, 8GB memory and 500GB disk. Table 2 describes the query parameters. 6.1

FIA Query

In this experiment, rst, we vary user's degree, and Figure 7(a) shows the results. Response time increases with 1http://konect.uni-koblenz.de/networks/youtube-u-growth

Iteration 3

v8 v5 1

3 v4 v10

2 v6 Rum=14 < Rmrs Rlist={(v1,v2,v5)} v8<v1,v2> 6

<v1,v3> 5 4 <v2,v5> 3 <v5,v10> 3 <v4,v6> 2 <v5,v8> 1 degree for both non-index and index approaches. This is because a user with a large degree (meaning he has lots of friends) would involve a wide search in TUR-tree and then in TUA-tree. However, we can see the performance of non-index degenerates quickly, while our two indexes accelerate the processing apparently, above 50% order of magnitude. Speci cally, response time for non-index increases linearly, while it is logarithmic for the index one, due to the fact TUR-tree adapts MVB-tree's architecture which is asymptotic to the performance of B-tree, and TUA-tree is similar to B-tree. Next, Figure 7(b) shows the results on varying the number of querying keywords, we can see the larger the number of keywords is, the more responding time it takes, due to more comparisons would happen when sequential scanning on disk or traversing TUA-tree. At this time, we can see the change of response time is slower than that in Figure 7(a), this can be explained that the keywords have less impact than user's degree to query performance. Figure 7(c) presents the results on varying time selectivity (length[ts;te]/temporal extent), still we can see our indexes accelerate processing logarithmically and have a good scalability. 6.2

UTF Query

Next, similar to FIA query experiment, we vary number of keywords and time selectivity for testing UTF query performance. From Figure 8, we can see processing time of UTF query is larger than that of FIA query, this is because there is no given user in UTF query, which causes a large cardinality query for searching storage model or TUR-tree, more items would be compared. However, we can see TUR-tree and TUA-tree greatly accelerate query processing. 6.3

GURD Query

In the following, we vary the number m of group members, Figure 9(a) shows that response time increases with m on both methods, this is because a larger m involves more combinations to be examined, however, we can see TUA-tree and group forming algorithm is e ective for the premium performance they show. Then we vary the number of keywords, and results are similar to that of previous two queries (see Figure 9(b)). Figure 9(c) shows the results when increasing 1 2 3 4 5

n u m b e r o f k e y w o r d s (b) E ect of keywords

Figure 7: Results of FIA Queries inn od ne -xin d e x

inn od ne -xin d e x 5 0 0 0 0

7. RELATED WORK

In recent years, plenty of works have focused on e cient methods for managing and querying social networks. One branch of works on social networks focus on e cient algorithms and data structures for searching users and friends of interest. However, these studies focus on evolution of graphs and topological characteristics of social network, ignoring examining how the social network links are being used by users to interact. Hence, some researchers propose activity network, a network that is based on the actual interaction between users rather than mere social network, and argue that this is more suitable to social network-based applications. B. Viswanath et al. study the activity network from 3 2 0 0 0 8 0 0 0 5 0 0 0 nin od ne -xin d e x ni n o d n e - xi n d e x 4 5 6 7

m (a) E ect of m 8 9 5 1 2 3 4 5

n u m b e r o f k e y w o r d s (b) E ect of keywords

Figure 9: Results of GURD Queries Facebook New Orleans network[ 9 ], and observe that the activity network is rapidly changing over time, but many of the global structural properties remain relatively constant.

P. Holme et al. [ 3 ] present the emergent eld of temporal networks, and discuss methods for analyzing topological and temporal structure, and models for elucidating their relations to the behaviors of dynamical systems. However, they only discuss some models proposed for temporal social networks and epidemiological contact networks without query processing method.

In [ 5 ], the authors present a model for capturing graph evolution through time based on snapshots and deltas, and introduce a general two-phase query plan based on snapshot reconstruction to evaluate any historical query. However, this work dose not provide e cient techniques and index structures for various types of queries.

U. Khurana et al.[ 4 ] build a graph data management system that focuses on optimizing snapshot retrieval queries over historical traces, and on supporting rich temporal analysis of large networks. Another storage approach was proposed in [ 7 ], namely, the historical evolving graph sequence (EGS). Various snapshots and deltas are explicitly stored, with temporally close snapshots being clustered together. Nevertheless, if the query asks for a single time point or a time interval (i.e., not the whole history), this approach will be ine ective.

In [ 8 ], a storage model maintains the current graph and deltas to previous time snapshots; as a result, the rst step of evaluating a historical query is to reconstruct the corresponding snapshot or snapshots that relate to the query's temporal predicate. However such a reconstruction phase is costly, and this is an issue also in other related works[ 5, 4 ]. Chronos[ 2 ] is a storage and execution engine designed and optimized speci cally for running in-memory iterative graph computation on temporal graphs, which takes further in locality, parallelism, and incremental computation.

However, in work[ 5, 4, 7, 2 ], the activity network is not considered. The work[ 8 ] de nes two kinds of entities in graph: users and objects. In our paper, we combine user social network and activity network, which consists of two kinds of entities: users and activities, and various temporal social queries can be satis ed. 8.

CONCLUSIONS

In this paper, we argue that time axis in social network is an important and useful tool to give insight into retrieval or statistics, and augmenting temporal query capability in such context is meaningful. We propose three kinds queries, namely FIA, UTF and GURD query, and model the temporal social network (TSN) with users, relationship and activity as well as corresponding temporal labels. A storage model is designed to logically and physically represent TSN, and then we propose two index structures, TUR-tree and TUA-tree for accelerating query process. We propose algorithms of query processing for the three queries, and evaluate our idea on a dataset which is synthetically generated from real dataset, and experiment results show that our indexes and query processing are e ective and scalable. In the future, we plan to extend this work to geo-location application, and solve spatio-temporal queries in social networks. 9.

ACKNOWLEDGMENTS

This work is supported by NSF of China grant 61303062 and 71331008. We would like to thank Prof. Dai and Dr. Hu for helping with the proof.

[1]

Becker ,

Gschwind ,

Ohler ,

Seeger , and

Widmayer . An asymptotically optimal multiversion b-tree . The VLDB Journal , 5 ( 4 ): 264 { 275 , Dec . 1996 .

[2]

Han ,

Miao ,

Li ,

Wu ,

Yang ,

Zhou ,

Prabhakaran ,

Chen , and

Chen . Chronos: A graph engine for temporal graph analysis . In Proceedings of the Ninth European Conference on Computer Systems , EuroSys ' 14 , pages 1:1 {1: 14 , New York, NY, USA, 2014 . ACM.

[3]

Holme and

Sarama ki. Temporal networks . Physics Reports , 519 ( 3 ): 97 { 125 , 2012 . Temporal Networks.

[4]

Khurana and

Deshpande . E cient snapshot retrieval over historical graph data . In Data Engineering (ICDE) , 2013 IEEE 29th International Conference on, pages 997 { 1008 , April 2013 .

[5]

Koloniari ,

Souravlias , and

Pitoura . On graph deltas for historical queries . CoRR, abs/1302.5549 , 2013 .

[6]

Moon ,

Jagadish ,

Faloutsos , and

Saltz . Analysis of the clustering properties of the hilbert space- lling curve. Knowledge and Data Engineering , IEEE Transactions on, 13 ( 1 ): 124 { 141 , Jan 2001 .

[7]

Ren ,

Lo ,

Kao ,

Zhu , and R. Cheng. On querying historical evolving graph sequences . PVLDB , 4 ( 11 ): 726 { 737 , 2011 .

[8]

Stefanidis and

Koloniari . Enabling social search in time through graphs . In Proceedings of the 5th International Workshop on Web-scale Knowledge Representation Retrieval & Reasoning , Web-KR ' 14 , pages 59 { 62 , New York, NY, USA, 2014 . ACM.

[9]

Viswanath ,

Mislove ,

Cha , and

K. P.

Gummadi . On the evolution of user interaction in facebook . In Proceedings of the 2Nd ACM Workshop on Online Social Networks, WOSN '09 , pages 37 { 42 , New York, NY, USA, 2009 . ACM.