-

Discovering Inter-Dimensional Rules in Dynamic Graphs

Kim-Ngan T. Nguyen

Lo¨ıc Cerf

Marc Plantevit

Jean-Franc¸ois Boulicaut

0 0 Universit ́e de Lyon , CNRS, INRIA INSA-Lyon, LIRIS Combining, UMR5205, F-69621 , France 1 Universit ́e de Lyon , CNRS, INRIA Universit ́e Lyon 1, LIRIS Combining, UMR5205, F-69622 , France

00 12

Data mining methods that exploit graph/network have become quite popular and a timely challenge is to consider the discovery of dynamic properties in evolving graphs or networks. In this paper, we consider the dynamic oriented graphs that can be encoded as n-ary relations with n ≥ 3 such that we have a least 3 dimensions: the dimensions of departure (tail) and arrival (head) vertices plus the time dimension. In other terms, it encodes the sequence of adjacency matrices of the graph. In such datasets, we propose a new semantics for inter-dimensional rules in dynamic graphs. We define rules that may involve subsets of any dimensions in their antecedents and their consequents and we propose the new objective interestingness measure called the exclusive confidence. We introduce a first algorithm for computing such inter-dimensional rules and we illustrate the added-value of exclusive confidence for supporting the discovery of relevant rules from a real-life dynamic graph.

Graph mining is a popular topic. Many researchers have considered pattern discovery from large collections of graphs while others focus the analysis of one large graph or network. In the latter case, we observe two complementary directions of research. On one hand, global properties of graphs are studied (e. g., power-law distribution of node degrees or diameters). On the other hand, it is possible to use data mining algorithms to identify local patterns in the graphs (e. g., frequent subgraphs, clique patterns). Such local techniques can indeed benefit from the huge research effort on 0/1 data analysis, i. e., a graphs is seen as particular 0/1 table (the two involved domains being identical): its adjacency matrix.

In this paper, we investigate local pattern discovery from dynamic directed graphs, i. e., from of collection of static directed graphs that all share the same set of uniquely identified vertices. For instance, Fig. 1 depicts a dynamic directed graph involving four nodes. Four snapshots of this graph are available. The dynamic graph can be represented as the sequence of its adjacency matrices underneath. It describes the relationship between the tail vertices in D1 = {d1, d2, d3, d4} and the head vertices in D2 = {a1, a2, a3, a4} at the timestamps 2 4 3 1 3 1 3

1 2 4 2 4 3 1 in D3 = {t1, t2, t3, t4}. Every ’1’, in the adjacency matrices is at the intersection of three elements (di, aj , tk) ∈ D1 ×D2 ×D3, which indicate a directed edge from di to aj at time tk. Therefore at least three dimensions are necessary to encode a dynamic graph, which can be seen as a ternary relation (the one depicted in Fig. 1 is called RE ). However, more dimensions may be used, for instance to encode information on edges and/or time aspects with different granularity. 2 4 t1

Studying descriptive rule mining from dynamic graphs is a rather new research topic and most of previous work impose severe restrictions on the form of the rules. The key contribution of this paper is the proposal of a quite general form of rules. These rules may involve any subset of dimensions in both the left-hand side and the right-hand side. In particular, the temporal dimensions can either explicitly appear in the rules or be used to measure the importance of the rules (i. e., the number of timestamps where the rule holds). Taking into account these different ways is complementary. It provides relevant patterns describing the evolution of a dynamic graph at a local level. Two examples of inter-dimensional rules that we want to extract are given in Fig. 2. Fig. 2a depicts a rule that is preserved at several timestamps. It intuitively means that if, at a time, the edges from vertices 2, 3 and 4 have the same heads then these heads are exclusively vertex 3. Rule in Fig. 2b means that if there are pairs of edges whose tails are nodes 3 and 4 and whose heads are the same vertex then it mainly occurs at times t2 and t3. To express the a priori relevancy of such rule, we use a straightforward extension of the classical frequency measure and an original extension of the confidence measure, the so-called exclusive confidence. The second contribution of this paper deals with the design of an algorithm that computes the a priori interesting rules. It exploits the principles (typically the enumeration strategy) of [ 7 ], i. e., the state-of-the-art algorithm for exploring the search space of multi-dimensional associations.

In Sect. 2, we provide the needed definitions to build the new pattern domain of inter-dimensional rules. Then, in Sect. 3, we define such rules and the exclusive 3 confidence semantics. Sect. 4 introduces the first algorithm that computes a priori interesting rules from a dynamic graph. Sect. 5 deals with the empirical validation and various experiments on a real-life dynamic graph. Sect. 6 discusses related work and, finally, Sect. 7 briefly concludes. 2

Preliminary Definitions

Given n finite domains D = {D1, . . . , Dn} and an n-ary relation R ⊆ ×i=1..nDi, the patterns of interest only involve some of the domains D′ ⊆ D. E. g., the analyst may want to focus on subgraph patterns (D′ = {D1, D2} in RE ). She may, instead, want to discover pattern involving temporal dimensions. Without loss of generality, the dimensions are assumed ordered such that D′ = {D1, . . . , D|D′|}. We now formally define an association on D′.

Definition 1 (Association). ∀D′ = {D1, . . . , D|D′|} ⊆ D, ×i=1..|D′|Xi is an association on D′ iff ∀i = 1..|D′|, Xi 6= ∅ ∧ Xi ⊆ Di. ×Di∈D\D′ Di is called support domain. The support of an association generalizes that of an itemset in a binary relation (n = 2 and |D′| = 1) [ 1 ]. Its formal definition uses the concatenation operator, denoted ’·’. E. g., (d2, a3) · (t1) = (d2, a3, t1).

Definition 2 (Support s). ∀D′ ⊆ D, let X an association on D′. Its support, denoted s(X), is s(X) = {u ∈ ×Di∈D\D′ Di | ∀x ∈ X, x · u ∈ R}.

The following definitions will ease the exposition of this paper.

Definition 3 (Projection π). ∀D′ = {D1, . . . , D|D′|} ⊆ D, let X = X1 × · · · × X|D′| an association on D′. ∀Di ∈ D, πDi (X) is Xi if Di ∈ D′, ∅ otherwise. Definition 4 (Union ⊔). ∀DX ⊆ D and ∀DY ⊆ D, let X an association on DX and Y an association on DY . X ⊔ Y is the association on DX ∪ DY for which ∀Di ∈ D, πDi (X ⊔ Y ) = πDi (X) ∪ πDi (Y ).

Definition 5 (Complement \). ∀DX ⊆ D and ∀DY ⊆ D, let X an association on DX and Y an association on DY . Y \ X is the association on {Di ∈ DY | πDi (Y ) 6⊆ πDi (X)} for which ∀Di ∈ D, πDi (Y \ X) = πDi (Y ) \ πDi (X).

In RE , depicted in Fig. 1, {d1, d2} × {a1, a2} is an association on {D1, D2}, whereas {a1, a2} is not because πD1 ({a1, a2}) = ∅. It is an association on {D2}. Their respective supports are s({d1, d2} × {a1, a2}) = {t1, t2} and s({a1, a2}) = {(d1, t1), (d1, t2), (d1, t3), (d2, t1), (d2, t2), (d3, t1), (d4, t4)}.

Inter-Dimensional Rules in Dynamic Graph

Given a dynamic graph, encoded as an n-ary relation R on D, the analyst chooses the domains DX ⊆ D and DY ⊆ D at, respectively, the left-hand side and the right-hand side of the rules to discover. E. g., to list rules involving tail vertices at their antecedents and timestamps at their consequents, DX only contains one dimension of the relation (the tail vertices) and so does DY (the timestamps). Notice that DX ∩ DY must be empty. An inter-dimensional rule on (DX , DY ) is a couple of associations3. The first one on DX , the second one on DY . Definition 6 (Inter-dimensional rule). ∀DX ⊆ D, ∀DY ⊆ D, X → Y is an inter-dimensional rule on (DX , DY ) iff X is an association on DX , Y is an association on DY and DX ∩ DY = ∅.

In RE , {d3} → {a3, a4} is an inter-dimensional rule on ({D1}, {D2}). The rule {d3} × {a3, a4} → {d4} is not an inter-dimensional rule because elements in D1 appear both at its left-hand side and at its right-hand side.

A rule is frequent if many “objects” verifies it. These objects are elements of a support domain for the rule, which is, in fact, ×Di∈D\(DX ∪DY )Di, i. e., that of the association (on DX ∪ DY ) union of its antecedent and its consequent. The rule can be trusted, i. e., has a large enough confidence, if there is a high conditional probability to observe the consequent when the antecedent holds. In the context of inter-dimensional rules in dynamic graphs, a natural definition of the frequency exists. On the contrary, it is hard to define a confidence measure.

The (relative) frequency of an inter-dimensional rule in a dynamic graph is, in the support domain, the proportion of elements in the support of the union of its antecedent and its consequent.

Definition 7 (Frequency). ∀DX ⊆ D, ∀DY ⊆ D, the frequency of an inter|s(X⊔Y )| dimensional rule X → Y on (DX , DY ) is f (X → Y ) = |×Di∈D\(DX ∪DY )Di| . In RE , recursively applying Definitions 7, 4 and 2 gives f ({d3} → {a3, a4}) = |s({d3}×{a3,a4})| = |{t1,t2,t3,t4}| = 34 .

|{t2,t3,t4}| |D3|

Is it sensible to directly generalize the confidence measure of association rules in binary relations to n-ary relations? Doing so, the confidence of a rule X → Y would be |s|(sX(X⊔Y)|)| . Unfortunately, this semantics is not satisfactory. Indeed, s(X ⊔ Y ) and s(X) are disjoint sets and the ratio of their cardinalities does not make any sense. For instance, in RE , consider the rule {d3} → {a3, a4}. We have s({d3}×{a3, a4}) = {t2, t3, t4} (i. e., a set of timestamps) while s({d3}) = {(a1, t1), (a2, t1), (a3, t1), (a3, t2), . . . } (i. e., a set of couples (head vertices,timestamps)). However, it is possible to introduce a factor such that |s(X)| and |s(X ⊔ Y )| become comparable. The idea is to multiply |s(X ⊔ Y )| by the cardinalities of its projections in the domains in DY . 3 The term “inter-dimensional association rule” often means, in the literature, a rule with one element per dimension. Our definition is more general.

Definition 8 (Confidence). ∀DX ⊆ D, ∀DY ⊆ D, the (exclusive) confidence of an inter-dimensional rule X → Y on (DX , DY ) is c(X → Y ) = |s(X⊔Y )|×||×s(DXi)∈|DY πDi (Y )| .

Roughly speaking, the remedial factor, applied to |s(X ⊔ Y )|, allows to count the elements in s(X ⊔ Y ) “in the same way at the numerator and at the denominator of the fraction”. For example, consider the rule {d3} → {a3, a4} in RE, its exclusive confidence is c({d3} → {a3, a4}) = |s({d3}×{a3,a4})|×|{a3,a4}| = 160 . |s({d3})| Fig. 3 depicts, at every timestamp, the dynamic graph in Fig. 1 but it only keeps the ten edges with the vertex 3 as a tail. This number, “10”, is found at the denominator of the fraction to compute the confidence. At its numerator, “6” actually is the count of those, among these 10 edges, that go to the vertices 3 and 4 at the same timestamp. They are thick in Fig. 3. At time t1, there is an edge from d3 to a3 but there is no edge from d3 to a4 at this time. This “lowers” the confidence of the rule because a4 is at its consequent too. At time t4, there is an edge from d3 to a2. This also “lowers” the confidence in the fact that if d3 is the tail of an edge then its head is either a3 or a4 (and not another vertex). That is why, this semantics of the confidence is said “exclusive”. If c({d3} → {a3, a4}) was 1, i. e., the maximal possible value, then, in every snapshot of the graph where the vertex 3 has a non-null output degree, it would always have two outgoing edges that would bind it with the vertex 3 and 4. Any other edge, with the vertex 3 as its tail, “lowers” the confidence.

2 4 t1 3 1 3 2 4 t2 2 4 t3 2 4 t4 1 3 1 3 1

Notice that the same speech applies to inter-dimensional rules involving the temporal dimension. E. g., Fig. 3 could illustrate, as is, the computation of c({d3} → {t2, t3, t4}), hence the same result 160 . This time however, the tick edges must be understood as those shared by the snapshots of the dynamic graph at t2, t3 and t4 (“edgewise and” operation between the three graphs). 4

Computing Rules

Given an n-ary relation R ⊆ ×Di∈DDi and the parameters (DX , DY ) (subsets of D and such that DX ∩ DY = ∅), μ ∈ [ 0, 1 ] and β ∈ [ 0, 1 ], the a priori interesting inter-dimensional rules X → Y are such that (i) X is an association on DX , (ii) Y is an association on DY , (iii) f (X → Y ) ≥ μ and (iv) c(X → Y ) ≥ β.

Our method, namely Gear, first rewrites the relation by combining the components which are neither in DX nor in DY . In other terms, it builds the support domain Dsupp = ×Di∈D\(DX ∪DY )Di. The resulting relation, RA is defined on the dimensions DA = DX ∪ DY ∪ {Dsupport}. Then, Gear extracts, in RA, every association A on DX ∪ DY satisfying |D|s(suAp)p|| ≥ μ. It entails that ⊔Di∈DX πDi (A) → ⊔Di∈DY πDi (A) is a frequent inter-dimensional rule (and reciprocally, hence the completeness). Its exclusive confidence is finally computed. If it exceeds β, the rule is output.

The actual extraction of every frequent association A (associated with its support Asupp ⊆ Dsupp), in RA, is now briefly detailed. A constraint-based approach is adopted, i. e., the problem is rewritten in terms of constraints and the patterns satisfying them all are the frequent associations. Here are the constraints: – Con-(DX ∪DY )(A ⊔ Asupp) ≡ ∀Di ∈ DX ∪ DY , πDi (A) 6= ∅; – Cconnected(A ⊔ Asupp) ≡ A ⊔ Asupp ⊆ RA; – Centire-supp(A ⊔ Asupp) ≡ Asupp = s(A); – C⌈μ×|Dsupp|⌉-freq(A ⊔ Asupp) ≡ |Asupp| ≥ ⌈μ × |Dsupp|⌉.

Thanks to the last constraint, the frequency of the rule ⊔Di∈DX πDi (A) → ⊔Di∈DY πDi (A) must reach or exceed μ. Indeed, |D|s(sAup)p|| ≥ μ is equivalent to |s(A)| ≥ ⌈μ × |Dsupp|⌉ and, because the third constraint (Asupp = s(A)) must be satisfied as well, it is equivalent to |Asupp| ≥ ⌈μ × |Dsupp|⌉. The third constraint, Centire-supp, forces a “closed” support. Indeed, by definition of the support (Definition 2), adding an element to Asupp (= s(A)) necessarily violates Cconnected. Thus, Centire-supp(A ⊔ Asupp) is equivalent to ∀t ∈ Dsupp \ Asupp, A ⊔ {t} 6⊆ RA.

The algorithm traverses the search space by recursively partitioning it into two complementary parts (“divide and conquer”). In this way, a binary tree represents the performed enumeration. At every node of this tree, two associations, namely U and V , are updated. U is the smallest association that may be discovered in the enumeration sub-tree rooted by the node, whereas U ⊔ V is the largest. That is why Gear is initially called with U = ∅ and V = ×Di∈DA Di. At every non-terminal node, an element e is chosen in ∪Di∈DA πDi (V ). In the enumeration sub-tree that derives from the first child, e is present in every U association (i. e., e is “moved” from V to U ). In the enumeration sub-tree that derives from the second child, e is absent from every U association (i. e., e is “removed” from V ). There are two reasons for an enumeration node to be a leaf of the enumeration tree. The first reason is that at least one of the four constraints is guaranteed to be violated by every U association in the sub-tree that would derive from the node. It happens when: – ∃Di ∈ DX ∪ DY such that πDi (U ⊔ V ) = ∅ (Con-(DX ∪DY ) is violated); – ∀Di ∈ DA, πDi (U ) 6= ∅ ∧ U 6⊆ RA (Cconnected is violated); – ∃t ∈ Dsupp \ πDsupp (U ⊔ V ) such that (U ⊔ V ) \ πDsupp (U ⊔ V ) ⊔ {t} ⊆ RA (Centire-supp is violated); – |πDsupp (U ⊔ V )| < ⌈μ × |Dsupp|⌉ (C⌈μ×|Dsupp|⌉-freq is violated). The proofs of these pruning properties are based on generalizations of monotone or anti-monotone properties that the four constraints have. The constraint Cconnected is monotone, i. e., if an association X violates the constraints then every larger association violates it as well. Since U is the smallest association in the sub-tree, ¬Cconnected(U ) is a safe pruning criterion. Dually, the three other constraints are anti-monotone, i. e., if an association X violates one of them then every smaller association violates it as well. That is why, to potentially prune the sub-tree rooted by the current enumeration node, their variables are replaced by the largest association in it: U ⊔ V . The second reason for an enumeration node to be a leaf is the actual discovery of a frequent association U . It happens when there is no more element to enumerate, i. e., when V = ∅.

An improved enumeration strategy avoids the generation of the nodes that violate Cconnected. To do so, in every first child (where an element e is “moved” to U ), every element in ∪Di∈DA πDi (V ) that would violate Cconnected if added to U ⊔ {e} is “removed” from V . Algorithm 1 sums up the extraction of every frequent inter-dimensional association rules with high enough confidences. Other performance improvements (e. g., pertaining to the enforcement of Centire-supp) were implemented. They actually are analog to what is done in [ 7 ] for the extraction of closed patterns in n-ary relations. The Choose function is that of [ 7 ] too. Another useful feature, inherited from [ 7 ], is the ability to additionally and efficiently enforce any piecewise (anti)-monotone constraint the associations must satisfy. In some of the following experiments, the constraint C(α1,...,α|DX ∪DY |)-min-sizes (where (α1, . . . , α|DX ∪DY |) ∈ N|DX ∪DY |) will be used:

C(α1,...,α|DX ∪DY |)-min-sizes(A ⊔ Asupp) ≡ ∀Di ∈ DX ∪ DY , |πDi (A)| ≥ αi . Algorithm 1: Algorithm Gear.

Input: (U, V ) Output: Every a priori interesting association rule involving every element in ∪Di∈DX ∪DY πDi (U ) and possibly some elements in ∪Di∈DX ∪DY πDi (V ) if Con-(DX ∪DY )(U ⊔ V ) ∧ Centire-supp(U ⊔ V ) ∧ C⌈μ×|Dsupp|⌉-freq(U ⊔ V ) then if V = ∅ then if c(⊔Di∈DX πDi (U ) → ⊔Di∈DY πDi (U )) ≥ β then

Output ⊔Di∈DX πDi (U ) → ⊔Di∈DY πDi (U ); else

Choose e ∈ ∪Di∈DA πDi (V ); Gear(U ⊔ {e},(V \ {e}) \ (f ∈ πDi (V ) | ¬Cconnected(U ⊔ {e} ⊔ {f }))Di∈DA );

Gear(U , V \ {e});

Experimental Study

Gear was implemented in C++ and compiled with GCC 4.2.4. This section reports experiments, which were performed on a GNU/LinuxTM system equipped with an Intel R Pentium R 4 processor cadenced at 3 GHz and 1 GB of RAM.

Ve´lo’v is a bicycle rental service run by the urban community of Lyon, France. 327 Ve´lo’v stations are spread over Lyon and its surrounding area. At any of these stations, the users can take a bicycle and bring it to any other station. Whenever a bicycle is rented or returned, this event is logged. Logs represent more than 13.1 million rides along 30 months. This dataset is seen as a dynamic directed graph evolving into two temporal dimension: the 7 days of the week and the 24 one-hour periods in a day. A significant amount of bicycles (local test inspired by the computation of a p-value), that are rented at the (departure) station ds on day d (e. g., Monday) at hour h (e. g., from 1pm to 2pm) and returned at the (arrival) station as, translates to an edge from ds to as in the graph timestamped with (d, h). In other terms, the (ds, as, d, h) in the related relation RVe´lov’v ⊆ Departure × Arrival × Day × Hour. In the end, 117,411 this contains 117, 411 4-tuples, hence a 7×24×327×327 = 0.7% density.

We analyze the results of the experiments with regard to the following questions: (a) Do the discovered graph rules make sense? (b) How to handle time in these rules? (c) What does the exclusive confidence definition capture?, and (d) How does Gear behave with respect to the parameter settings?

We first searched for rules with time periods and departure stations (tail vertices) at their antecedents; day information at their consequents. In this way, stations that, at some time, “emit” bicycles towards many other stations, but exclusively for some days, are discovered. With the minimum thresholds μ = 0.08 and β = 0.6, 35 rules are extracted. Fig. 4 reports three of them. The rule in Fig. 4a means that most of the departures from Station 6002 and between 11am and 12am occur on Sundays (c = 0.71). This makes sense: this station is at the main entrance of the most popular park, where people like to ride on Sundays. The rule in Fig. 4b means that there rarely are departures from Station 1002 between 1am and 3am except on Sundays (c = 0.62). This makes sense: this station is located in a district with many pubs and the favored evenings to party are on Saturdays. Furthermore the public transportation services stop at midnight and the Ve´lo’v is a good alternative to come back home. The rule in 4c describes another known behavior. Many people living outside Lyon arrive by train between 8am and 9am and use Ve´lo’v to finish their trips towards their working place. Indeed, Station 3001 is at the train station inside the main working district. This behavior is specific to the working days (c = 0.66).

To answer the question “which are the stations that often exchange bicycles? ”, we searched for rules whose antecedents are departure stations (i. e., tail vertices) and consequents are arrival stations (i. e., head vertices). The support domain of these rules are the Cartesian product of the seven days and the 24 hours. The constraint C2,2-min-sizes (see Sect. 4) is additionally enforced, i. e., every rule must involve at least two departure stations and two arrival stations. With μ = 0.03 and β = 0.8, Gear returns 27 rules. Fig. 5 reports some of them. 6002 {Sun.} 1002 {Sun.} 3001 {6002} × {11-12am} → {Sun.} (s = 0.09, c = 0.71) (a) {1002} × {1-2am,2-3am} → {Sun.} {3001} × {8-9am} → {working days} (s = 0.09, c = 0.62) (s = 0.13, c = 0.66)

(b) (c)

Do some stations exchange many bicycles at favored hours every day? To answer to this question, we search for rules whose antecedents consist of time periods and departure stations (i. e., tail vertices); their consequents are arrival stations (i. e., head vertices). To discover rules that hold every day, the minimal frequency threshold is set to 1. With β = 0.8, Gear returns 40 rules which contain at least one time period, two departure stations and two arrival stations. These rules mean that there are some known time periods in which set of stations maintain some privileged bicycle exchanges. Some of them are given in Fig. 6. This kind of knowledge is valuable for the data owner. For instance, if there is no available bicycle at a Ve´lo’v station then other Ve´lo’v stations that maintain strong exchanges with it may be impacted as well.

1002 2026 1002 2026 3001 10048 3001 10048 4:00pm to 8:00pm 4:00pm to 8:00pm 5:00pm to 6:00pm 5:00pm to 6:00pm {4-5pm, 5-6pm, 7-8(psm=} ×1, {c1=0002.,8220)26} → {1002, 2026} {5-6pm} × {30(0s1=,110,04c8=} →1) {3001, 10048} (a) (b) Fig. 6: Example of rules of the form Hours × Departures → Arrivals.

We now report a performance study of Gear discovering, in RVe´lo’v, every frequent inter-dimensional rule of the form Departures × Hours → Days. When the minimal frequency threshold increases, the number of frequent associations and the running time decrease (Fig. 7a obtained with β = 0). Indeed, Gear prunes large areas of the search space where every association violates the constraint C⌈μ×|Dsupp|⌉-freq. When the minimum confidence threshold increases, the number of rules decreases too (Fig. 7b obtained with μ = 0.08). Gear’s scalability was tested on the extraction of these rules (still with a frequency exceeding 0.08). To do so the nodes of the graphs were replicated, up to ten times, with their incoming edges only. It turns out that the algorithm scales linearly. More precisely a linear regression of R 7→ TR (where R is how many times the arrival

T1 stations are replicated; TR the running time on this replicated dataset) gives y = 0.88x + 0.08 with 0.05 as a standard error. Since 0.88 < 1, it can be written that Gear conforms to the proportions of the relation for faster extractions.

Number of rules 0.2 0.4 0.6 0.8

Minimum exclusive confidence (b) Number of rules w.r.t. β.

1 () s e m i tgn 180 i n n u R 190

NuRmubnenrinofgrtuimlees Mining graphs has recently received a lot of attention in the data mining community. Many different techniques (e. g., densification laws, shrinking diameter, factorization, clustering, evolution of communities, etc.) [ 2, 15, 9, 17, 13, 11, 3, 16, 5 ]. In this section, we focus on methods that mine local patterns. [ 6 ] extracts such patterns in labeled dynamic graphs. Frequent subgraph mining algorithms are adapted to time series of graphs. The approach aims at finding subgraphs that are topologically frequent and show an identical dynamic behavior over time, i. e., insertions and deletions of edges occur in the same order of time. Due to the complexity of the task, their algorithm is not complete. Computing the overlap-based support measure means solving a maximal independent set problem and this approach uses a greedy algorithm. [ 10 ] proposes a fast algorithm to mine frequent transformation subsequences from a set of dynamic labeled graphs (the labels on vertices and edges can change over time). Starting with the hypothesis that the changes in a dynamic graph are gradual, they propose to succinctly represent the dynamics with a graph grammar: each change between two observed successive graph states is interpolated by axiomatic transformation rules. [ 18 ] studies how a graph is structurally transformed through time. The proposed method computes graph rewriting rules that describe the evolution of two consecutive graphs. These rules are then abstracted into patterns representing the dynamics of a sequence of graphs. [ 12 ] introduces the periodic subgraph mining problem, i. e., identifying every frequent closed periodic subgraph. They empirically demonstrate the efficiency and the interest of their proposal on several real-world dynamic social networks. By showing that dynamic graphs can be represented as ternary relations, [ 8 ] describes a constraint-based mining approach to discover maximal cliques that are preserved over almost-contiguous timestamps. The constraints are pushed into a closed n-ary pattern mining algorithm. [ 14 ] proposes a constraint-based approach too. It the evolution of dense and isolated subgraphs defined by two user-parameterized constraints. Associating a temporal event type with each pattern captures the temporal evolution of the identified subgraph, i. e., the formation, dissolution, growth, diminution and stability of subgraphs between two consecutive timestamps. The algorithm incrementally processes the time series of graphs. [ 4 ] introduces the problem of extracting graph evolution rules satisfying minimal support and confidence constraints. It finds isomorphic subgraphs that match the timestamps associated with each edge, and, if present, the properties of the vertices and edges of the dynamic graph. Graph evolution rules are then derived with two different confidence measures. This approach is the closest to ours: it aims at describing a time-evolving graph with rules. Nevertheless, this work focuses on the dynamic changes in the graph whereas we provide a generic framework to discover interdimensional rules where the time is either in the rule or in its support. 7

Conclusion

We tackled the problem of describing dynamic graphs via rules that can involve subsets of any dimension (including temporal dimensions) at its antecedent or consequent. We proposed a new semantics for inter-dimensional rules in dynamic graphs. It relies on a relevant objective interestingness measure called the exclusive confidence. We introduced and implemented Gear, an effective solution for computing such rules. Experiments on a real-world dynamic graph demonstrated the interest of our proposal. A timely challenge is to look for primitive constraints that can support more sophisticated knowledge discovery processes in dynamic graphs. Some of these constraints would deal with the temporal dimension(s) (e. g., time contiguity [ 8 ]). Other constraints would deal with the “form” of the patterns to discover (e. g., cliques, dense subgraphs, etc.). Another challenge is to revisit, in our setting, important techniques developed for classical association rules, for instance, non redundancy aspects (see, e. g., [ 19 ]).

Acknowledgements. This work was partly funded by the ANR project Bingo2 (MDCO 2007) and by a grant from the Vietnamese government.

1. Agrawal , R. , Mannila , H. , Srikant , R. , Toivonen , H. , Verkamo , A.I. : Fast discovery of association rules . In: Advances in Knowledge Discovery and Data Mining , pp. 307 - 328 . AAAI/MIT Press ( 1996 )

2. Backstrom , L. , Huttenlocher , D. , Kleinberg , J. , Lan , X. : Group formation in large social networks: membership, growth, and evolution . In: KDD. pp. 44 - 54 . ACM Press ( 2006 )

3. Berger-Wolf , T.Y., Saia , J.: A framework for analysis of dynamic social networks . In: KDD . pp. 523 - 528 . ACM Press ( 2006 )

4. Berlingerio , M. , Bonchi , F. , Bringmann , B. , Gionis , A. : Mining graph evolution rules . In: ECML/PKDD . pp. 115 - 130 . Springer ( 2009 )

5. Berlingerio , M. , Coscia , M. , Giannotti , F. , Monreale , A. , Pedreschi , D. : As time goes by: Discovering eras in evolving social networks . In: PAKDD (1) . pp. 81 - 90 . Springer ( 2010 )

6. Borgwardt , K.M. , Kriegel , H.P. , Wackersreuther , P. : Pattern mining in frequent dynamic subgraphs . In: ICDM . pp. 818 - 822 . IEEE Computer Society ( 2006 )

7. Cerf , L. , Besson , J. , Robardet , C. , Boulicaut , J.F. : Closed patterns meet n-ary relations . ACM Trans. on Knowledge Discovery from Data 3 ( 1 ), 1 - 36 ( 2009 )

8. Cerf , L. , Nguyen , T.B.N. , Boulicaut , J.F. : Discovering relevant cross-graph cliques in dynamic networks . In: ISMIS . pp. 513 - 522 . Springer ( 2009 )

9. Chi , Y. , Zhu , S. , Song , X. , Tatemura , J. , Tseng , B.L. : Structural and temporal analysis of the blogosphere through community factorization . In: KDD . pp. 163 - 172 . ACM Press ( 2007 )

10. Inokuchi , A. , Washio , T. : A fast method to mine frequent subsequences from graph sequence data . In: ICDM . pp. 303 - 312 . IEEE Computer Society ( 2008 )

11. Kumar , R. , Novak , J. , Tomkins , A. : Structure and evolution of online social networks . In: KDD . pp. 611 - 617 . ACM Press ( 2006 )

12. Lahiri , M. , Berger-Wolf , T.Y.: Mining periodic behavior in dynamic social networks . In: ICDM . pp. 373 - 382 . IEEE Computer Society ( 2008 )

13. Leskovec , J. , Kleinberg , J. , Faloutsos , C. : Graphs over time: densification laws, shrinking diameters and possible explanations . In: KDD . pp. 177 - 187 . ACM Press ( 2005 )

14. Robardet , C. : Constraint-based pattern mining in dynamic graphs . In: ICDM . pp. 950 - 955 . IEEE Computer Society ( 2009 )

15. Sun , J. , Papadimitriou , S. , Yu , P.S. , Faloutsos , C. : Graphscope: Parameter-free mining of large time-evolving graphs . In: KDD . pp. 687 - 696 . ACM Press ( 2007 )

16. Tantipathananandh , C. , Berger-Wolf , T.Y., Kempe , D.: A framework for community identification in dynamic social networks . In: KDD . pp. 717 - 726 . ACM Press ( 2007 )

17. Tong , H. , Papadimitriou , S. , Sun , J. , Yu , P.S. , Faloutsos , C. : Colibri: fast mining of large static and dynamic graphs . In: KDD . pp. 686 - 694 . ACM Press ( 2008 )

18. You , C.H. , Holder , L.B. , Cook , D.J. : Learning patterns in the dynamics of biological networks . In: KDD . pp. 977 - 986 . ACM Press ( 2009 )

19. Zaki , M.J.: Mining non-redundant association rules . Data Min. Knowl. Discov . 9 ( 3 ), 223 - 248 ( 2004 )