<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Discovering Inter-Dimensional Rules in Dynamic Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kim-Ngan T. Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lo¨ıc Cerf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Plantevit</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Franc¸ois Boulicaut</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universit ́e de Lyon</institution>
          ,
          <addr-line>CNRS, INRIA INSA-Lyon, LIRIS Combining, UMR5205, F-69621</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universit ́e de Lyon</institution>
          ,
          <addr-line>CNRS, INRIA Universit ́e Lyon 1, LIRIS Combining, UMR5205, F-69622</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>00</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>Data mining methods that exploit graph/network have become quite popular and a timely challenge is to consider the discovery of dynamic properties in evolving graphs or networks. In this paper, we consider the dynamic oriented graphs that can be encoded as n-ary relations with n ≥ 3 such that we have a least 3 dimensions: the dimensions of departure (tail) and arrival (head) vertices plus the time dimension. In other terms, it encodes the sequence of adjacency matrices of the graph. In such datasets, we propose a new semantics for inter-dimensional rules in dynamic graphs. We define rules that may involve subsets of any dimensions in their antecedents and their consequents and we propose the new objective interestingness measure called the exclusive confidence. We introduce a first algorithm for computing such inter-dimensional rules and we illustrate the added-value of exclusive confidence for supporting the discovery of relevant rules from a real-life dynamic graph.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Graph mining is a popular topic. Many researchers have considered pattern
discovery from large collections of graphs while others focus the analysis of one large
graph or network. In the latter case, we observe two complementary directions of
research. On one hand, global properties of graphs are studied (e. g., power-law
distribution of node degrees or diameters). On the other hand, it is possible to
use data mining algorithms to identify local patterns in the graphs (e. g.,
frequent subgraphs, clique patterns). Such local techniques can indeed benefit from
the huge research effort on 0/1 data analysis, i. e., a graphs is seen as particular
0/1 table (the two involved domains being identical): its adjacency matrix.</p>
      <p>In this paper, we investigate local pattern discovery from dynamic directed
graphs, i. e., from of collection of static directed graphs that all share the same
set of uniquely identified vertices. For instance, Fig. 1 depicts a dynamic
directed graph involving four nodes. Four snapshots of this graph are available.
The dynamic graph can be represented as the sequence of its adjacency
matrices underneath. It describes the relationship between the tail vertices in D1 =
{d1, d2, d3, d4} and the head vertices in D2 = {a1, a2, a3, a4} at the timestamps
2
4
3
1
3
1
3</p>
      <p>1
2
4
2
4
3
1
in D3 = {t1, t2, t3, t4}. Every ’1’, in the adjacency matrices is at the intersection
of three elements (di, aj , tk) ∈ D1 ×D2 ×D3, which indicate a directed edge from
di to aj at time tk. Therefore at least three dimensions are necessary to encode
a dynamic graph, which can be seen as a ternary relation (the one depicted in
Fig. 1 is called RE ). However, more dimensions may be used, for instance to
encode information on edges and/or time aspects with different granularity.
2
4
t1</p>
      <p>
        Studying descriptive rule mining from dynamic graphs is a rather new
research topic and most of previous work impose severe restrictions on the form of
the rules. The key contribution of this paper is the proposal of a quite general
form of rules. These rules may involve any subset of dimensions in both the
left-hand side and the right-hand side. In particular, the temporal dimensions
can either explicitly appear in the rules or be used to measure the importance
of the rules (i. e., the number of timestamps where the rule holds). Taking into
account these different ways is complementary. It provides relevant patterns
describing the evolution of a dynamic graph at a local level. Two examples of
inter-dimensional rules that we want to extract are given in Fig. 2. Fig. 2a
depicts a rule that is preserved at several timestamps. It intuitively means that if,
at a time, the edges from vertices 2, 3 and 4 have the same heads then these
heads are exclusively vertex 3. Rule in Fig. 2b means that if there are pairs of
edges whose tails are nodes 3 and 4 and whose heads are the same vertex then it
mainly occurs at times t2 and t3. To express the a priori relevancy of such rule,
we use a straightforward extension of the classical frequency measure and an
original extension of the confidence measure, the so-called exclusive confidence.
The second contribution of this paper deals with the design of an algorithm that
computes the a priori interesting rules. It exploits the principles (typically the
enumeration strategy) of [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], i. e., the state-of-the-art algorithm for exploring the
search space of multi-dimensional associations.
      </p>
      <p>In Sect. 2, we provide the needed definitions to build the new pattern domain
of inter-dimensional rules. Then, in Sect. 3, we define such rules and the exclusive
3
confidence semantics. Sect. 4 introduces the first algorithm that computes a
priori interesting rules from a dynamic graph. Sect. 5 deals with the empirical
validation and various experiments on a real-life dynamic graph. Sect. 6 discusses
related work and, finally, Sect. 7 briefly concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminary Definitions</title>
      <p>Given n finite domains D = {D1, . . . , Dn} and an n-ary relation R ⊆ ×i=1..nDi,
the patterns of interest only involve some of the domains D′ ⊆ D. E. g., the
analyst may want to focus on subgraph patterns (D′ = {D1, D2} in RE ). She may,
instead, want to discover pattern involving temporal dimensions. Without loss of
generality, the dimensions are assumed ordered such that D′ = {D1, . . . , D|D′|}.
We now formally define an association on D′.</p>
      <p>
        Definition 1 (Association). ∀D′ = {D1, . . . , D|D′|} ⊆ D, ×i=1..|D′|Xi is an
association on D′ iff ∀i = 1..|D′|, Xi 6= ∅ ∧ Xi ⊆ Di.
×Di∈D\D′ Di is called support domain. The support of an association generalizes
that of an itemset in a binary relation (n = 2 and |D′| = 1) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Its formal
definition uses the concatenation operator, denoted ’·’. E. g., (d2, a3) · (t1) =
(d2, a3, t1).
      </p>
      <p>Definition 2 (Support s). ∀D′ ⊆ D, let X an association on D′. Its support,
denoted s(X), is s(X) = {u ∈ ×Di∈D\D′ Di | ∀x ∈ X, x · u ∈ R}.</p>
      <p>The following definitions will ease the exposition of this paper.</p>
      <p>Definition 3 (Projection π). ∀D′ = {D1, . . . , D|D′|} ⊆ D, let X = X1 × · · · ×
X|D′| an association on D′. ∀Di ∈ D, πDi (X) is Xi if Di ∈ D′, ∅ otherwise.
Definition 4 (Union ⊔). ∀DX ⊆ D and ∀DY ⊆ D, let X an association on
DX and Y an association on DY . X ⊔ Y is the association on DX ∪ DY for
which ∀Di ∈ D, πDi (X ⊔ Y ) = πDi (X) ∪ πDi (Y ).</p>
      <p>Definition 5 (Complement \). ∀DX ⊆ D and ∀DY ⊆ D, let X an
association on DX and Y an association on DY . Y \ X is the association on {Di ∈
DY | πDi (Y ) 6⊆ πDi (X)} for which ∀Di ∈ D, πDi (Y \ X) = πDi (Y ) \ πDi (X).</p>
      <p>In RE , depicted in Fig. 1, {d1, d2} × {a1, a2} is an association on {D1, D2},
whereas {a1, a2} is not because πD1 ({a1, a2}) = ∅. It is an association on {D2}.
Their respective supports are s({d1, d2} × {a1, a2}) = {t1, t2} and s({a1, a2}) =
{(d1, t1), (d1, t2), (d1, t3), (d2, t1), (d2, t2), (d3, t1), (d4, t4)}.</p>
    </sec>
    <sec id="sec-3">
      <title>Inter-Dimensional Rules in Dynamic Graph</title>
      <p>Given a dynamic graph, encoded as an n-ary relation R on D, the analyst chooses
the domains DX ⊆ D and DY ⊆ D at, respectively, the left-hand side and the
right-hand side of the rules to discover. E. g., to list rules involving tail vertices
at their antecedents and timestamps at their consequents, DX only contains one
dimension of the relation (the tail vertices) and so does DY (the timestamps).
Notice that DX ∩ DY must be empty. An inter-dimensional rule on (DX , DY ) is
a couple of associations3. The first one on DX , the second one on DY .
Definition 6 (Inter-dimensional rule). ∀DX ⊆ D, ∀DY ⊆ D, X → Y is
an inter-dimensional rule on (DX , DY ) iff X is an association on DX , Y is an
association on DY and DX ∩ DY = ∅.</p>
      <p>In RE , {d3} → {a3, a4} is an inter-dimensional rule on ({D1}, {D2}). The
rule {d3} × {a3, a4} → {d4} is not an inter-dimensional rule because elements in
D1 appear both at its left-hand side and at its right-hand side.</p>
      <p>A rule is frequent if many “objects” verifies it. These objects are elements
of a support domain for the rule, which is, in fact, ×Di∈D\(DX ∪DY )Di, i. e., that
of the association (on DX ∪ DY ) union of its antecedent and its consequent.
The rule can be trusted, i. e., has a large enough confidence, if there is a high
conditional probability to observe the consequent when the antecedent holds. In
the context of inter-dimensional rules in dynamic graphs, a natural definition of
the frequency exists. On the contrary, it is hard to define a confidence measure.</p>
      <p>The (relative) frequency of an inter-dimensional rule in a dynamic graph is,
in the support domain, the proportion of elements in the support of the union
of its antecedent and its consequent.</p>
      <p>Definition 7 (Frequency). ∀DX ⊆ D, ∀DY ⊆ D, the frequency of an
inter|s(X⊔Y )|
dimensional rule X → Y on (DX , DY ) is f (X → Y ) = |×Di∈D\(DX ∪DY )Di| .
In RE , recursively applying Definitions 7, 4 and 2 gives f ({d3} → {a3, a4}) =
|s({d3}×{a3,a4})| = |{t1,t2,t3,t4}| = 34 .</p>
      <p>|{t2,t3,t4}|
|D3|</p>
      <p>Is it sensible to directly generalize the confidence measure of association
rules in binary relations to n-ary relations? Doing so, the confidence of a rule
X → Y would be |s|(sX(X⊔Y)|)| . Unfortunately, this semantics is not satisfactory.
Indeed, s(X ⊔ Y ) and s(X) are disjoint sets and the ratio of their
cardinalities does not make any sense. For instance, in RE , consider the rule {d3} →
{a3, a4}. We have s({d3}×{a3, a4}) = {t2, t3, t4} (i. e., a set of timestamps) while
s({d3}) = {(a1, t1), (a2, t1), (a3, t1), (a3, t2), . . . } (i. e., a set of couples (head
vertices,timestamps)). However, it is possible to introduce a factor such that
|s(X)| and |s(X ⊔ Y )| become comparable. The idea is to multiply |s(X ⊔ Y )|
by the cardinalities of its projections in the domains in DY .
3 The term “inter-dimensional association rule” often means, in the literature, a rule
with one element per dimension. Our definition is more general.</p>
      <p>Definition 8 (Confidence). ∀DX ⊆ D, ∀DY ⊆ D, the (exclusive)
confidence of an inter-dimensional rule X → Y on (DX , DY ) is c(X → Y ) =
|s(X⊔Y )|×||×s(DXi)∈|DY πDi (Y )| .</p>
      <p>Roughly speaking, the remedial factor, applied to |s(X ⊔ Y )|, allows to count the
elements in s(X ⊔ Y ) “in the same way at the numerator and at the denominator
of the fraction”. For example, consider the rule {d3} → {a3, a4} in RE, its
exclusive confidence is c({d3} → {a3, a4}) = |s({d3}×{a3,a4})|×|{a3,a4}| = 160 .
|s({d3})|
Fig. 3 depicts, at every timestamp, the dynamic graph in Fig. 1 but it only
keeps the ten edges with the vertex 3 as a tail. This number, “10”, is found at
the denominator of the fraction to compute the confidence. At its numerator,
“6” actually is the count of those, among these 10 edges, that go to the vertices
3 and 4 at the same timestamp. They are thick in Fig. 3. At time t1, there is an
edge from d3 to a3 but there is no edge from d3 to a4 at this time. This “lowers”
the confidence of the rule because a4 is at its consequent too. At time t4, there is
an edge from d3 to a2. This also “lowers” the confidence in the fact that if d3 is
the tail of an edge then its head is either a3 or a4 (and not another vertex). That
is why, this semantics of the confidence is said “exclusive”. If c({d3} → {a3, a4})
was 1, i. e., the maximal possible value, then, in every snapshot of the graph
where the vertex 3 has a non-null output degree, it would always have two
outgoing edges that would bind it with the vertex 3 and 4. Any other edge, with
the vertex 3 as its tail, “lowers” the confidence.</p>
      <p>2
4
t1
3
1
3
2
4
t2
2
4
t3
2
4
t4
1
3
1
3
1</p>
      <p>Notice that the same speech applies to inter-dimensional rules involving the
temporal dimension. E. g., Fig. 3 could illustrate, as is, the computation of
c({d3} → {t2, t3, t4}), hence the same result 160 . This time however, the tick
edges must be understood as those shared by the snapshots of the dynamic
graph at t2, t3 and t4 (“edgewise and” operation between the three graphs).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Computing Rules</title>
      <p>
        Given an n-ary relation R ⊆ ×Di∈DDi and the parameters (DX , DY ) (subsets of
D and such that DX ∩ DY = ∅), μ ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] and β ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], the a priori interesting
inter-dimensional rules X → Y are such that (i) X is an association on DX , (ii)
Y is an association on DY , (iii) f (X → Y ) ≥ μ and (iv) c(X → Y ) ≥ β.
      </p>
      <p>Our method, namely Gear, first rewrites the relation by combining the
components which are neither in DX nor in DY . In other terms, it builds the
support domain Dsupp = ×Di∈D\(DX ∪DY )Di. The resulting relation, RA is
defined on the dimensions DA = DX ∪ DY ∪ {Dsupport}. Then, Gear extracts,
in RA, every association A on DX ∪ DY satisfying |D|s(suAp)p|| ≥ μ. It entails that
⊔Di∈DX πDi (A) → ⊔Di∈DY πDi (A) is a frequent inter-dimensional rule (and
reciprocally, hence the completeness). Its exclusive confidence is finally computed.
If it exceeds β, the rule is output.</p>
      <p>The actual extraction of every frequent association A (associated with its
support Asupp ⊆ Dsupp), in RA, is now briefly detailed. A constraint-based
approach is adopted, i. e., the problem is rewritten in terms of constraints and
the patterns satisfying them all are the frequent associations. Here are the
constraints:
– Con-(DX ∪DY )(A ⊔ Asupp) ≡ ∀Di ∈ DX ∪ DY , πDi (A) 6= ∅;
– Cconnected(A ⊔ Asupp) ≡ A ⊔ Asupp ⊆ RA;
– Centire-supp(A ⊔ Asupp) ≡ Asupp = s(A);
– C⌈μ×|Dsupp|⌉-freq(A ⊔ Asupp) ≡ |Asupp| ≥ ⌈μ × |Dsupp|⌉.</p>
      <p>Thanks to the last constraint, the frequency of the rule ⊔Di∈DX πDi (A) →
⊔Di∈DY πDi (A) must reach or exceed μ. Indeed, |D|s(sAup)p|| ≥ μ is equivalent to
|s(A)| ≥ ⌈μ × |Dsupp|⌉ and, because the third constraint (Asupp = s(A)) must be
satisfied as well, it is equivalent to |Asupp| ≥ ⌈μ × |Dsupp|⌉. The third constraint,
Centire-supp, forces a “closed” support. Indeed, by definition of the support
(Definition 2), adding an element to Asupp (= s(A)) necessarily violates Cconnected.
Thus, Centire-supp(A ⊔ Asupp) is equivalent to ∀t ∈ Dsupp \ Asupp, A ⊔ {t} 6⊆ RA.</p>
      <p>The algorithm traverses the search space by recursively partitioning it into
two complementary parts (“divide and conquer”). In this way, a binary tree
represents the performed enumeration. At every node of this tree, two associations,
namely U and V , are updated. U is the smallest association that may be
discovered in the enumeration sub-tree rooted by the node, whereas U ⊔ V is the
largest. That is why Gear is initially called with U = ∅ and V = ×Di∈DA Di.
At every non-terminal node, an element e is chosen in ∪Di∈DA πDi (V ). In the
enumeration sub-tree that derives from the first child, e is present in every U
association (i. e., e is “moved” from V to U ). In the enumeration sub-tree that
derives from the second child, e is absent from every U association (i. e., e is
“removed” from V ). There are two reasons for an enumeration node to be a leaf of
the enumeration tree. The first reason is that at least one of the four constraints
is guaranteed to be violated by every U association in the sub-tree that would
derive from the node. It happens when:
– ∃Di ∈ DX ∪ DY such that πDi (U ⊔ V ) = ∅ (Con-(DX ∪DY ) is violated);
– ∀Di ∈ DA, πDi (U ) 6= ∅ ∧ U 6⊆ RA (Cconnected is violated);
– ∃t ∈ Dsupp \ πDsupp (U ⊔ V ) such that (U ⊔ V ) \ πDsupp (U ⊔ V ) ⊔ {t} ⊆ RA
(Centire-supp is violated);
– |πDsupp (U ⊔ V )| &lt; ⌈μ × |Dsupp|⌉ (C⌈μ×|Dsupp|⌉-freq is violated).
The proofs of these pruning properties are based on generalizations of
monotone or anti-monotone properties that the four constraints have. The constraint
Cconnected is monotone, i. e., if an association X violates the constraints then
every larger association violates it as well. Since U is the smallest association in
the sub-tree, ¬Cconnected(U ) is a safe pruning criterion. Dually, the three other
constraints are anti-monotone, i. e., if an association X violates one of them then
every smaller association violates it as well. That is why, to potentially prune the
sub-tree rooted by the current enumeration node, their variables are replaced by
the largest association in it: U ⊔ V . The second reason for an enumeration node
to be a leaf is the actual discovery of a frequent association U . It happens when
there is no more element to enumerate, i. e., when V = ∅.</p>
      <p>
        An improved enumeration strategy avoids the generation of the nodes that
violate Cconnected. To do so, in every first child (where an element e is “moved”
to U ), every element in ∪Di∈DA πDi (V ) that would violate Cconnected if added
to U ⊔ {e} is “removed” from V . Algorithm 1 sums up the extraction of every
frequent inter-dimensional association rules with high enough confidences. Other
performance improvements (e. g., pertaining to the enforcement of Centire-supp)
were implemented. They actually are analog to what is done in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for the
extraction of closed patterns in n-ary relations. The Choose function is that of
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] too. Another useful feature, inherited from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], is the ability to
additionally and efficiently enforce any piecewise (anti)-monotone constraint the
associations must satisfy. In some of the following experiments, the constraint
C(α1,...,α|DX ∪DY |)-min-sizes (where (α1, . . . , α|DX ∪DY |) ∈ N|DX ∪DY |) will be used:
      </p>
      <p>C(α1,...,α|DX ∪DY |)-min-sizes(A ⊔ Asupp) ≡ ∀Di ∈ DX ∪ DY , |πDi (A)| ≥ αi .
Algorithm 1: Algorithm Gear.</p>
      <p>Input: (U, V )
Output: Every a priori interesting association rule involving every
element in ∪Di∈DX ∪DY πDi (U ) and possibly some elements in
∪Di∈DX ∪DY πDi (V )
if Con-(DX ∪DY )(U ⊔ V ) ∧ Centire-supp(U ⊔ V ) ∧ C⌈μ×|Dsupp|⌉-freq(U ⊔ V ) then
if V = ∅ then
if c(⊔Di∈DX πDi (U ) → ⊔Di∈DY πDi (U )) ≥ β then</p>
      <p>Output ⊔Di∈DX πDi (U ) → ⊔Di∈DY πDi (U );
else</p>
      <p>Choose e ∈ ∪Di∈DA πDi (V );
Gear(U ⊔ {e},(V \ {e}) \ (f ∈
πDi (V ) | ¬Cconnected(U ⊔ {e} ⊔ {f }))Di∈DA );</p>
      <p>Gear(U , V \ {e});</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Study</title>
      <p>Gear was implemented in C++ and compiled with GCC 4.2.4. This section
reports experiments, which were performed on a GNU/LinuxTM system equipped
with an Intel R Pentium R 4 processor cadenced at 3 GHz and 1 GB of RAM.</p>
      <p>Ve´lo’v is a bicycle rental service run by the urban community of Lyon,
France. 327 Ve´lo’v stations are spread over Lyon and its surrounding area. At
any of these stations, the users can take a bicycle and bring it to any other
station. Whenever a bicycle is rented or returned, this event is logged. Logs
represent more than 13.1 million rides along 30 months. This dataset is seen as
a dynamic directed graph evolving into two temporal dimension: the 7 days of
the week and the 24 one-hour periods in a day. A significant amount of bicycles
(local test inspired by the computation of a p-value), that are rented at the
(departure) station ds on day d (e. g., Monday) at hour h (e. g., from 1pm to
2pm) and returned at the (arrival) station as, translates to an edge from ds to
as in the graph timestamped with (d, h). In other terms, the (ds, as, d, h) in the
related relation RVe´lov’v ⊆ Departure × Arrival × Day × Hour. In the end,
117,411
this contains 117, 411 4-tuples, hence a 7×24×327×327 = 0.7% density.</p>
      <p>We analyze the results of the experiments with regard to the following
questions: (a) Do the discovered graph rules make sense? (b) How to handle time in
these rules? (c) What does the exclusive confidence definition capture?, and (d)
How does Gear behave with respect to the parameter settings?</p>
      <p>We first searched for rules with time periods and departure stations (tail
vertices) at their antecedents; day information at their consequents. In this way,
stations that, at some time, “emit” bicycles towards many other stations, but
exclusively for some days, are discovered. With the minimum thresholds μ = 0.08
and β = 0.6, 35 rules are extracted. Fig. 4 reports three of them. The rule in
Fig. 4a means that most of the departures from Station 6002 and between 11am
and 12am occur on Sundays (c = 0.71). This makes sense: this station is at the
main entrance of the most popular park, where people like to ride on Sundays.
The rule in Fig. 4b means that there rarely are departures from Station 1002
between 1am and 3am except on Sundays (c = 0.62). This makes sense: this
station is located in a district with many pubs and the favored evenings to
party are on Saturdays. Furthermore the public transportation services stop at
midnight and the Ve´lo’v is a good alternative to come back home. The rule in
4c describes another known behavior. Many people living outside Lyon arrive
by train between 8am and 9am and use Ve´lo’v to finish their trips towards
their working place. Indeed, Station 3001 is at the train station inside the main
working district. This behavior is specific to the working days (c = 0.66).</p>
      <p>To answer the question “which are the stations that often exchange
bicycles? ”, we searched for rules whose antecedents are departure stations (i. e., tail
vertices) and consequents are arrival stations (i. e., head vertices). The support
domain of these rules are the Cartesian product of the seven days and the 24
hours. The constraint C2,2-min-sizes (see Sect. 4) is additionally enforced, i. e.,
every rule must involve at least two departure stations and two arrival stations.
With μ = 0.03 and β = 0.8, Gear returns 27 rules. Fig. 5 reports some of them.
6002
{Sun.}
1002
{Sun.}
3001
{6002} × {11-12am} → {Sun.}
(s = 0.09, c = 0.71)
(a)
{1002} × {1-2am,2-3am} → {Sun.} {3001} × {8-9am} → {working days}
(s = 0.09, c = 0.62) (s = 0.13, c = 0.66)</p>
      <p>(b) (c)</p>
      <p>Do some stations exchange many bicycles at favored hours every day? To
answer to this question, we search for rules whose antecedents consist of time
periods and departure stations (i. e., tail vertices); their consequents are arrival
stations (i. e., head vertices). To discover rules that hold every day, the minimal
frequency threshold is set to 1. With β = 0.8, Gear returns 40 rules which
contain at least one time period, two departure stations and two arrival stations.
These rules mean that there are some known time periods in which set of stations
maintain some privileged bicycle exchanges. Some of them are given in Fig. 6.
This kind of knowledge is valuable for the data owner. For instance, if there is no
available bicycle at a Ve´lo’v station then other Ve´lo’v stations that maintain
strong exchanges with it may be impacted as well.</p>
      <p>1002
2026
1002
2026
3001
10048
3001
10048
4:00pm to 8:00pm 4:00pm to 8:00pm 5:00pm to 6:00pm 5:00pm to 6:00pm
{4-5pm, 5-6pm, 7-8(psm=} ×1, {c1=0002.,8220)26} → {1002, 2026} {5-6pm} × {30(0s1=,110,04c8=} →1) {3001, 10048}
(a) (b)
Fig. 6: Example of rules of the form Hours × Departures → Arrivals.</p>
      <p>We now report a performance study of Gear discovering, in RVe´lo’v, every
frequent inter-dimensional rule of the form Departures × Hours → Days. When
the minimal frequency threshold increases, the number of frequent associations
and the running time decrease (Fig. 7a obtained with β = 0). Indeed, Gear
prunes large areas of the search space where every association violates the
constraint C⌈μ×|Dsupp|⌉-freq. When the minimum confidence threshold increases, the
number of rules decreases too (Fig. 7b obtained with μ = 0.08). Gear’s
scalability was tested on the extraction of these rules (still with a frequency exceeding
0.08). To do so the nodes of the graphs were replicated, up to ten times, with
their incoming edges only. It turns out that the algorithm scales linearly. More
precisely a linear regression of R 7→ TR (where R is how many times the arrival</p>
      <p>T1
stations are replicated; TR the running time on this replicated dataset) gives
y = 0.88x + 0.08 with 0.05 as a standard error. Since 0.88 &lt; 1, it can be written
that Gear conforms to the proportions of the relation for faster extractions.</p>
      <p>Number of rules
0.2 0.4 0.6 0.8</p>
      <p>Minimum exclusive confidence
(b) Number of rules w.r.t. β.</p>
      <p>1
()
s
e
m
i
tgn 180
i
n
n
u
R
190</p>
      <p>
        NuRmubnenrinofgrtuimlees
Mining graphs has recently received a lot of attention in the data mining
community. Many different techniques (e. g., densification laws, shrinking diameter,
factorization, clustering, evolution of communities, etc.) [
        <xref ref-type="bibr" rid="ref11 ref13 ref15 ref16 ref17 ref2 ref3 ref5 ref9">2, 15, 9, 17, 13, 11, 3, 16,
5</xref>
        ]. In this section, we focus on methods that mine local patterns. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] extracts
such patterns in labeled dynamic graphs. Frequent subgraph mining algorithms
are adapted to time series of graphs. The approach aims at finding subgraphs
that are topologically frequent and show an identical dynamic behavior over
time, i. e., insertions and deletions of edges occur in the same order of time. Due
to the complexity of the task, their algorithm is not complete. Computing the
overlap-based support measure means solving a maximal independent set
problem and this approach uses a greedy algorithm. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] proposes a fast algorithm
to mine frequent transformation subsequences from a set of dynamic labeled
graphs (the labels on vertices and edges can change over time). Starting with
the hypothesis that the changes in a dynamic graph are gradual, they propose to
succinctly represent the dynamics with a graph grammar: each change between
two observed successive graph states is interpolated by axiomatic
transformation rules. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] studies how a graph is structurally transformed through time. The
proposed method computes graph rewriting rules that describe the evolution of
two consecutive graphs. These rules are then abstracted into patterns
representing the dynamics of a sequence of graphs. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] introduces the periodic subgraph
mining problem, i. e., identifying every frequent closed periodic subgraph. They
empirically demonstrate the efficiency and the interest of their proposal on
several real-world dynamic social networks. By showing that dynamic graphs can
be represented as ternary relations, [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] describes a constraint-based mining
approach to discover maximal cliques that are preserved over almost-contiguous
timestamps. The constraints are pushed into a closed n-ary pattern mining
algorithm. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] proposes a constraint-based approach too. It the evolution of dense
and isolated subgraphs defined by two user-parameterized constraints.
Associating a temporal event type with each pattern captures the temporal evolution
of the identified subgraph, i. e., the formation, dissolution, growth, diminution
and stability of subgraphs between two consecutive timestamps. The algorithm
incrementally processes the time series of graphs. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduces the problem of
extracting graph evolution rules satisfying minimal support and confidence
constraints. It finds isomorphic subgraphs that match the timestamps associated
with each edge, and, if present, the properties of the vertices and edges of the
dynamic graph. Graph evolution rules are then derived with two different
confidence measures. This approach is the closest to ours: it aims at describing a
time-evolving graph with rules. Nevertheless, this work focuses on the dynamic
changes in the graph whereas we provide a generic framework to discover
interdimensional rules where the time is either in the rule or in its support.
7
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        We tackled the problem of describing dynamic graphs via rules that can involve
subsets of any dimension (including temporal dimensions) at its antecedent or
consequent. We proposed a new semantics for inter-dimensional rules in dynamic
graphs. It relies on a relevant objective interestingness measure called the
exclusive confidence. We introduced and implemented Gear, an effective solution for
computing such rules. Experiments on a real-world dynamic graph demonstrated
the interest of our proposal. A timely challenge is to look for primitive constraints
that can support more sophisticated knowledge discovery processes in dynamic
graphs. Some of these constraints would deal with the temporal dimension(s)
(e. g., time contiguity [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]). Other constraints would deal with the “form” of the
patterns to discover (e. g., cliques, dense subgraphs, etc.). Another challenge is
to revisit, in our setting, important techniques developed for classical association
rules, for instance, non redundancy aspects (see, e. g., [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]).
      </p>
      <p>Acknowledgements. This work was partly funded by the ANR project Bingo2
(MDCO 2007) and by a grant from the Vietnamese government.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannila</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srikant</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toivonen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verkamo</surname>
            ,
            <given-names>A.I.</given-names>
          </string-name>
          :
          <article-title>Fast discovery of association rules</article-title>
          .
          <source>In: Advances in Knowledge Discovery and Data Mining</source>
          , pp.
          <fpage>307</fpage>
          -
          <lpage>328</lpage>
          . AAAI/MIT Press (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Backstrom</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huttenlocher</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Group formation in large social networks: membership, growth, and evolution</article-title>
          . In: KDD. pp.
          <fpage>44</fpage>
          -
          <lpage>54</lpage>
          . ACM Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berger-Wolf</surname>
          </string-name>
          , T.Y.,
          <string-name>
            <surname>Saia</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A framework for analysis of dynamic social networks</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>523</fpage>
          -
          <lpage>528</lpage>
          . ACM Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Berlingerio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bringmann</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gionis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mining graph evolution rules</article-title>
          .
          <source>In: ECML/PKDD</source>
          . pp.
          <fpage>115</fpage>
          -
          <lpage>130</lpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Berlingerio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coscia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannotti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monreale</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedreschi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>As time goes by: Discovering eras in evolving social networks</article-title>
          .
          <source>In: PAKDD (1)</source>
          . pp.
          <fpage>81</fpage>
          -
          <lpage>90</lpage>
          . Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Borgwardt</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wackersreuther</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Pattern mining in frequent dynamic subgraphs</article-title>
          .
          <source>In: ICDM</source>
          . pp.
          <fpage>818</fpage>
          -
          <lpage>822</lpage>
          . IEEE Computer Society (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cerf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Besson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Closed patterns meet n-ary relations</article-title>
          .
          <source>ACM Trans. on Knowledge Discovery from Data</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cerf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.B.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Discovering relevant cross-graph cliques in dynamic networks</article-title>
          .
          <source>In: ISMIS</source>
          . pp.
          <fpage>513</fpage>
          -
          <lpage>522</lpage>
          . Springer (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tatemura</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tseng</surname>
            ,
            <given-names>B.L.</given-names>
          </string-name>
          :
          <article-title>Structural and temporal analysis of the blogosphere through community factorization</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>163</fpage>
          -
          <lpage>172</lpage>
          . ACM Press (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Inokuchi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Washio</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A fast method to mine frequent subsequences from graph sequence data</article-title>
          .
          <source>In: ICDM</source>
          . pp.
          <fpage>303</fpage>
          -
          <lpage>312</lpage>
          . IEEE Computer Society (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novak</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomkins</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Structure and evolution of online social networks</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>611</fpage>
          -
          <lpage>617</lpage>
          . ACM Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lahiri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berger-Wolf</surname>
          </string-name>
          , T.Y.:
          <article-title>Mining periodic behavior in dynamic social networks</article-title>
          .
          <source>In: ICDM</source>
          . pp.
          <fpage>373</fpage>
          -
          <lpage>382</lpage>
          . IEEE Computer Society (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Leskovec</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faloutsos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Graphs over time: densification laws, shrinking diameters and possible explanations</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>177</fpage>
          -
          <lpage>187</lpage>
          . ACM Press (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Constraint-based pattern mining in dynamic graphs</article-title>
          .
          <source>In: ICDM</source>
          . pp.
          <fpage>950</fpage>
          -
          <lpage>955</lpage>
          . IEEE Computer Society (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papadimitriou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faloutsos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Graphscope:
          <article-title>Parameter-free mining of large time-evolving graphs</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>687</fpage>
          -
          <lpage>696</lpage>
          . ACM Press (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Tantipathananandh</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berger-Wolf</surname>
          </string-name>
          , T.Y.,
          <string-name>
            <surname>Kempe</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A framework for community identification in dynamic social networks</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>717</fpage>
          -
          <lpage>726</lpage>
          . ACM Press (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Tong</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papadimitriou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faloutsos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Colibri: fast mining of large static and dynamic graphs</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>686</fpage>
          -
          <lpage>694</lpage>
          . ACM Press (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>You</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holder</surname>
            ,
            <given-names>L.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cook</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          :
          <article-title>Learning patterns in the dynamics of biological networks</article-title>
          .
          <source>In: KDD</source>
          . pp.
          <fpage>977</fpage>
          -
          <lpage>986</lpage>
          . ACM Press (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.:</given-names>
          </string-name>
          <article-title>Mining non-redundant association rules</article-title>
          .
          <source>Data Min. Knowl. Discov</source>
          .
          <volume>9</volume>
          (
          <issue>3</issue>
          ),
          <fpage>223</fpage>
          -
          <lpage>248</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>