<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Time Traveling in Graphs using a Graph Database</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Konstantinos Semertzidis</string-name>
          <email>ksemer@cs.uoi.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evaggelia Pitoura</string-name>
          <email>pitoura@cs.uoi.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science &amp; Engineering Department, University of Ioannina</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Most graph structured data, such as data created from the web, social, citation and computer networks, evolve over time. In this paper, we assume that we are given the history of a graph in the form of a sequence of graph snapshots. Our goal is to de ne the different types of queries that one can ask regarding the graph history and present an initial approach to storing graph snapshots and processing historical queries in a native graph database. We de ne three general types of historical queries, namely, historical graph queries, historical time queries and historical top-k queries. We present two representations of graph snapshots that use either a single or a multi-edge approach. We evaluate the two approaches experimentally for various types of historical reachability queries.</p>
      </abstract>
      <kwd-group>
        <kwd>Graph Database</kwd>
        <kwd>Historical Queries</kwd>
        <kwd>Reachability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>In recent years, increasing amounts of graph structured
data are made available from a variety of sources, such as
social, citation, computer and hyperlink networks. Almost
all such real-world networks evolve over time. In this paper,
we focus on structural evolution, that includes node and
edge additions and removals.</p>
      <p>
        There has been a lot of interest on analytical processing
and mining of evolving graphs, including among others
developing models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], discovering communities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and
computing measures such as PageRank [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There has been also
research on building graph engines tailored to supporting
analytical processing in dynamic graphs, such as Kineograph
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Chronos [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Instead in this paper, we focus on
online query-based processing of dynamic graphs.
      </p>
      <p>We assume that we are given an evolving graph, in the
form of a sequence of graph snapshots corresponding to the
state of the graph at different time points. Our goal is to
study the types of queries that one can ask on these sequence
of snapshots. We call such queries historical queries.
Although graph query processing has received a lot of interest,
querying an evolving graph has received limited attention.</p>
      <p>First, we provide a taxonomy of historical queries. We
introduce three categories of historical queries, namely,
historical graph queries, historical time queries and historical
topk queries. Historical graph queries are graph queries applied
at past snapshots. For example, a historical graph
reachability query may ask whether two nodes were reachable at
some time interval in the past. Historical time queries are
graph queries that ask about the time point or the time
interval that a query had a speci c result. For example, a
historical time reachability query may ask for the time point
at which two nodes become reachable for the rst time.
Finally, historical top-k queries ask for the nodes that had a
property for the longest period of time. For example, a
historical top-k reachability query may ask for the k pairs of
nodes that remained connected for the longest interval.</p>
      <p>We then address the problem of processing historical
reachability queries in a native graph database. We study two
approaches of storing graph snapshots, using either a single
edge or multiple edges to represent connections that appear
at different time points. We present algorithms for
processing all different types of historical reachability queries using
these approaches and experimentally compare their
performance. Due to the native support of an edge-type traversal
by the graph database, the multiple-edge approach
outperforms the single-edge approach. Indexing boosts the
performance of both approaches.</p>
      <p>
        Previous work on historical queries is limited and includes
only historical graph queries [
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref15 ref16 ref4 ref9">1, 4, 9, 10, 11, 12, 15, 16</xref>
        ] with
the exception of [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] that studied one instance of a historical
time query and the recent work in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] that studies historical
top-k graph patterns. Also, none of the previous approaches,
with the exception of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], is built on top of a native graph
database. The work in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposes a general approach for
storing an evolving graph in a native graph database,
specifically Neo4j, and experimentally tests a number of general
graph queries that include past time instances. Our focus
here is on structural updates and reachability queries. We
include an experimental comparison with their model.
      </p>
      <p>The rest of the paper is structured as follows. In Section
2, we formally de ne historical reachability queries. In
Section 3, we introduce two approaches to storing the evolution
of graphs in graph databases and algorithms for processing
historical reachability queries. In Section 4, we evaluate our
approaches by presenting experimental results. In Section
5, we present related work. Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>HISTORICAL QUERIES</title>
      <p>Most real world graphs evolve over time. In this paper, we
focus on structural evolution, that includes node and edge
additions and removals. We assume that time is discrete and
use successive integers to denote successive points in time.
Let G = (V; E) be a graph where V is the set of nodes and
E the set of edges. We use Gt = (Vt; Et) to denote the graph
snapshot at time point t, that is, the set of nodes and edges
that exist at time point t.</p>
      <p>De nition 1 (evolving graph).</p>
      <p>An evolving graph G[ti;tj] in time interval [ti; tj] is a sequence
fGti , Gti+1, : : : ; Gtj g of graph snapshots.</p>
      <p>One can think of two interpretations of time points. One
interpretation is that of actual time, for example time point
t may correspond to say February 9, 2015, 5:00am PDT.
Another view is operational. In this case, a new time point
is created, after a speci c number of inserts and deletes of
nodes and edges has occurred. We use the term time
granularity to refer to how often a new time point and the
corresponding graph snapshot are created. In the case of actual
time, granularity may range for example from milliseconds
to years, whereas, in the case of operational time,
granularity may be at the level of one or more operations.</p>
      <p>For a node u (or, edge e), its lifespan denotes the set of
time intervals during which u (resp. e) existed in an evolving
graph. More formally, given an evolving graph GI = fGti ,
Gti+1, : : : , Gtj g, the lifespan, L(u), (resp. L(e)) of a node
u (resp. edge e) is a set of intervals such that an interval
[ti; tj] I belongs to L(u), (resp. L(e)), if and only if, for
all ti tm tj, u 2 Vtm (resp. e 2 Etm ).</p>
      <p>Let us now discuss the type of queries that one can pose
on evolving graphs. We call such queries historical to
distinguish them from queries that consider only the current graph
snapshot, Gcurr. In the following, we rst present various
types of general historical queries and then formally de ne
these types for the case of historical reachability queries.
Historical Graph Queries. The rst category of
historical queries include queries that are similar to current graph
queries but refer to past snapshots. Let Q be any type of
graph query, e.g., a reachability, shortest distance, or graph
pattern query. The corresponding historical query QH on
an evolving graph G[ti;tj] is a pair (Q; IQ), where IQ is an
interval [tl; tm], ti tl tm tj. Query QH is executed
by applying query Q at all graph snapshots Gt, tl t tm
of G[ti;tj], and returns as result an appropriately de ned
aggregation of these results. For example, let Q be a query
that asks for the shortest path distance between nodes u
and v and let QH = (Q; [tl; tm]). The shortest path distance
between nodes u and v is computed at all graph snapshots
Gtl , Gtl+1, : : : Gtm and these tm - tl + 1 distances are
appropriately combined to produce the result of QH .</p>
      <p>There are three general ways of combining the results:
(a) use all snapshots, (b) use only one of the snapshots, and
(c) an intermediate case, in which we use r of the involved
snapshots. For example, in the case of shortest path queries,
we may combine the results by returning in case (a) the
average shortest path distance, in case (b) the minimum
shortest path distance and in case (c) the minimum shortest
path distance in at least r of the snapshots. Let us now
de ne formally the three ways of combining results in the
case of reachability queries.</p>
      <p>De nition 2 (historical reachability query).
Given an evolving graph G[ti;tj], a time interval IQ = [tl; tm],
ti tl tm tj and a pair of nodes v, u:
(a) a conjunctive historical reachability query (Conj)
returns true, if there exists a path from u to v in all
graph snapshots Gt, tl t
tm of G[ti;tj].
(b) a disjunctive historical reachability query (Disj)
returns true, if there exists a path from u to v in at
least one graph snapshot Gt, tl t
tm, of G[ti;tj],
(c) an at least r historical reachability query (Least)
returns true, if there exists a path from u to v in at least
r graph snapshots Gt, tl t
tm, of G[ti;tj].</p>
      <p>In the special case in which tl = tm, we just apply Q on
the single past snapshot Gtl of G[ti;tj]. We call such queries
stab (Stab) queries.</p>
      <p>Historical Time Queries. Another type of queries
pertinent to evolving graphs are queries that focus on the timing
aspect. Such queries ask when an event happened. For
example, depending on the type of query, we may ask when
a speci c graph pattern occurred, when two nodes become
reachable, or when their shortest path distance was equal to
a given value.</p>
      <p>We make a distinction between queries that ask (a) when
is the rst time that an event happened, (b) what is the
longest continuous interval that the event lasted, or (c) what
is the total time that the event occurred. For
reachability queries, we have the following types of historical time
queries.</p>
      <p>De nition 3 (historical time reachability query).
Given an evolving graph G[ti;tj], a time interval IQ = [tl; tm],
ti tl tm tj and a pair of nodes v, u:
(a) a rst time reachability query (First) returns the
smallest time point t, tl t tm, such that, there exists a
path from u to v in graph snapshot Gt and there is no
path from u to v in any Gt′ , tl t′ &lt; t,
(b) a longest continuous time reachability query (Intv)
returns an interval [tk1 ; tk2 ], tl tk1 tk2 tm, such
that, there exists a path from u to v in all graph
snapshots Gt, tk1 t tk2 of G[ti;tj] and there is no longer
interval that this holds,
(c) a longest total time reachability query (Total) returns
the time points t, tl t tm, such that there exists a
path from u to v in Gt.</p>
      <p>Historical Top-k Queries. The last type of historical
queries are queries that ask for the top-k nodes that satisfy
a condition for the longest time period. Depending on the
query, this may mean what is the graph pattern that appears
in the majority of graph snapshots, what are the pairs of
nodes that remained reachable the longest, or what are the
nodes whose distance was below some given value for most
of the time points. Again, we make a distinction between
queries that ask for nodes that satisfy the property for the
longest duration, either (a) continuously or (b) in total. For
reachability queries, this gives us the following two types of
top-k queries.</p>
      <p>De nition 4 (historical top-k reachability query).
Given an evolving graph G[ti;tj], a time interval IQ = [tl; tm],
ti tl tm tj and a pair of nodes v, u and an integer k,
k &gt; 0:
(a) a top-k continuous reachability query (Topk i) returns
a set of k pairs (u; v) of nodes u and v such that there
exists a path from u to v in all graphs Gt in an interval
of size d and there is no pair of nodes u′, v′, for which
a path from u′ to v′ exists in all graphs in an interval
of size larger than d,
(b) a top-k total reachability query (topk t) returns the
k pairs (u; v) of nodes u and v such that there exists
a path from u to v in the largest number of graph
snapshots Gt, tl t tm.</p>
    </sec>
    <sec id="sec-3">
      <title>HISTORICAL QUERIES ON A GRAPH</title>
    </sec>
    <sec id="sec-4">
      <title>DATABASE</title>
      <p>In this section, we rst present two different approaches of
representing an evolving graph within a graph database and
then algorithms for processing historical reachability queries
using these representations.</p>
      <p>As a running example, we use a bibliographical database
where nodes correspond to AU T HORS and there is a
PUBLISH edge between two AUTHORS if they are co-authors.
Time granularity corresponds to years.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Single and Multi Edge Representation</title>
      <p>In the single-edge approach, the lifespan of a node or edge
is modeled using a label (i.e., an attribute or property) of
the corresponding node or edge. Figure 1 shows an
example of edge lifespans represented by a single label of type
String. For example, the co-authorship between authors A,
C in 2010, and 2012, is represented by the edge with label
"2010,2012". Graph evolution can be tracked by processing
the node and edge lifespan labels. Thus, to obtain the graph
snapshot Gt for a time point t, we get all edges and nodes
and then keep only those edges and nodes whose lifespan
label contains t.</p>
      <p>While in the single-edge approach there is at most one
edge between two nodes, the multi-edge approach uses a
different edge type between two nodes for each time point of
the lifespan of the edge. For instance, in order to represent
co-authorship between authors A, B, C in 2010, 2012, and
2014, we use three different labeled types of edges to connect
A, B, and C. An example is shown in Figure 2 which depicts
co-authorship between A, B and C in 2010, 2012, and 2014
using multiple type of edges. Multiple type of edges provide
an efficient way of retrieving graph snapshots, since a graph
database supports an efficient traversal of edges of a given
type. Note that when a node u is connected by an edge of a
type that corresponds to a time point t, it is obvious that u
existed in t. Thus to get information about the lifespan of
u, we access its edges, which is a fast process, since graph
databases provide efficient indexing for this process.
Author: A
Author: B</p>
      <p>PUBLISH
Lifespan: 2010, 2012</p>
      <p>PUBLISH</p>
      <p>Lifespan: 2010, 2012</p>
      <p>For both the single-edge and the multi-edge approaches,
we explore whether an index on the lifetime property of each
node improves the performance of retrieving a snapshot of
the graph. We build an index within the graph database
(shown in Figure 3) by creating a new node type TIME
where each node of the given type has a unique value that
corresponds to a speci c time point. A TIME node that
denotes a time point t has relations to all AUTHOR nodes
that existed at time t. To obtain all nodes that exist in
an interval, we get the neighbors of the TIME nodes that
correspond to this interval.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Historical Reachability Query Processing</title>
      <p>All graph databases provide a BF Straversal method that
visits all edges of a given type. This method starts from a
source node, visits all its neighbors at distance 1 by
traversing edges of the given type, then all its neighbors at distance
2, and so on.</p>
      <p>TIME
Year: 2010</p>
      <p>TIME
Year: 2011</p>
      <p>TIME
Year: 2012</p>
      <p>TIME
Year: 2013</p>
      <p>TIME
Year: 2014
Author: A
PUBLISH
Type: 2010</p>
      <p>PUBLISH
Type: 2012</p>
      <p>PUBLISH
Type: 2014
Author: C</p>
      <p>Author: B
PUBLISH</p>
      <p>Type: 2010
PUBLISH
Type: 2012
For answering if two nodes are reachable in the
multiedge approach, we use this method. For example, to nd if
two nodes are connected in a given interval I, we call the
BF Straversal method for each time instant ti in I passing
as an argument the edge type that corresponds to ti.</p>
      <p>However, we cannot use the BF Straversal method for
the single-edge approach, since we need to post-process the
lifespan label of each edge to nd the time instants where the
nodes were reachable. The BF Straversal method provided
by the deployed graph database does not return the path
visited but just the reachable nodes. Thus, we implemented
our own BFStraversal algorithm which processes the edge
lifespans.</p>
      <p>The time index can be used similarly in both single-edge
and multi-edge approaches to prune some computations. For
example, for the Least query that asks whether nodes u and
v are reachable at least r time points in I, we can rst check
using the index whether both nodes were active at the same
time points at least r times. If they were not active, we do
not need to traverse the graph.</p>
      <p>To summarize, (a) in the single-edge approach, we use
our own implementation of the BF Straversal method and
(b) in the multiple-edge approach, we invoke the provided
BF Straversal method once for each edge type
corresponding to the time points in the query interval.</p>
      <p>Let us now discuss in more detail how to process the
different types of historical reachability queries that ask for a
path from u to v in interval I. A Conj query returns true,
when there is no ti in I where v is not reachable from u. A
Disj query returns true, when the rst ti in I is found in
which v is reachable. Finally, for a Least query, we keep
a counter c of the time points in I that v is reachable. If
the counter reaches the given r, the query returns a positive
answer, otherwise it stops when the sum of the counter and
the remaining time instants is less than r.</p>
      <p>Historical time reachability queries return time points or
time intervals. For a First query, we return the rst ti when
v is reachable. For a Intv query, we keep a counter c for the
consecutive times that v is reachable and a max variable of
the current maximum c. For each ti, that v is not reachable,
c is reset and max is updated if c &gt; max. The query stops
and returns the interval corresponding to the max value,
when c and the remaining time instants are less than max.
Finally, for a Total query, we return all time points in I
where v is reachable.</p>
      <p>For the top-k historical reachability queries, we use the
time index to obtain the top-k (active) nodes utop, that is
the k nodes that exist for the longest period in I. In
particular, for each node u we found in a time instant of I
we increase a counter that denotes the number of instants
that u is active. We also use a Min Heap structure to keep
the top-k pairs of nodes that were connected for the longest
interval (Topk i query) or for the largest number of time
points (topk t query). The Min Heap stores each pair as
a triple (u; v; value) where value is a counter that keeps the
longest interval or the largest number of times that u and v
were connected.</p>
      <p>For a Topk i query, we start traversing from each top
active node utop for each time instant in I. Intuitively, top
active nodes have more active paths to other nodes. For
each node v reachable from utop, we increase a counter C(v),
each time v is reachable. We also keep a max(v) variable
for the maximum C(v). Each time v is not a reachable from
utop, C(v) is reset and max(v) is updated if C(v) &gt; max(v).
In the end of the traversal from utop, we insert in the Min
Heap the triple (utop, v, max(v)) if max(v) is larger than the
smallest element of the Min Heap. The query stops when the
Min Heap has size k and its minimum element is larger or
equal to the lifetime in I of the remaining top active nodes.
Processing of a topk t query is similar. The only difference
is that in the Min Heap, we store the number of time points
instead of the duration of the interval.
4.</p>
    </sec>
    <sec id="sec-7">
      <title>EVALUATION</title>
      <p>
        As our graph database, we use Sparksee [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that supports
fast loading of the graph data and efficient operations that
scan all edges in the graph. Sparksee is based on a compact
representation that uses bitmaps and highly compressible
data structures [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As our dataset, we used the whole DBLP
dataset 1 for the interval [1958, 2015]. Each graph snapshot
corresponds to a year in this interval.
      </p>
      <p>We ran all queries on a system with a quad-core Intel Core
i7-4770 3.4 GHz processor and 32 GB memory. We only use
one core for all experiments.</p>
      <p>Storage. We created two different graph database instances
(GDBs). The rst graph database instance stores our dataset
following the single-edge approach, whereas the second
follows the multi-edge approach.</p>
      <p>Table 1 shows the characteristics of each graph database
instance. Single-edge differs from multi-edge in the number
of edge types, since the second one uses a different edge type
for each time point, which leads to a larger size. The index
nodes size states the number of time points which in our
case is 58 years. The edge types for single-edge are two, one
type represents the PUBLISH edge and the other one the
index edge. Multi-edge has 59 types, one type for the index
edge and 58 types for each year in [1958, 2015].</p>
      <p>Historical Query Processing. To evaluate the
performance of both true and false queries, we generated for each
query type 250 true and 250 false queries.</p>
      <p>
        For Conj, Disj, Least, First, Topk i, and topk t, the
query interval is I = [2005; 2014], for Stab, the time point
is a random year within I, for Intv, Topk i the interval is
I = [1958; 2015]. For Least, r was randomly chosen from
[
        <xref ref-type="bibr" rid="ref2 ref9">2, 9</xref>
        ], while in Topk i, and topk t, k is equal to 10.
      </p>
      <p>Table 2 reports the average time of true and false queries,
on both graph instances. Also, we report the query time
when the time index is used, as well the average execution
time of Topk i, and topk t on the multi-edge GDB
instance.</p>
      <p>Queries that ask for events that have the longest
duration, either continuously or in total require the most time
to be processed, (since we need to check long time
intervals) followed by queries that seek for the largest number of
time points that something holds or the longest continuous
interval that something holds.</p>
      <p>Comparing the GDB instances, we notice that queries are
faster when using multiple types of edges to represent the
time points. This can be explained by the fact that using a
single edge type requires the processing of the edge to nd
if a time point is contained in the lifespan label.</p>
      <p>A general remark is that false conjunctive queries are
faster than true conjunctive queries, since processing stops
as soon as a time point is found at which the two nodes
1http://dblp.uni-trier.de/
are not reachable. Analogously, true disjunctive queries are
faster than false disjunctive queries, since processing stops
as soon as a time point is found at which the two nodes are
reachable. Also an observation that holds independently of
the graph GDB used to evaluate queries is that the time
index boosts query processing. We gain more speed in false
queries than true ones, since we can prune traversals from
nodes that are not active in a given time point or in the
whole interval. For example, if we seek to nd the longest
interval during which A and B were connected and there is
not any time point that both authors were active, then a
false answer is returned without executing any traversal.</p>
      <p>Table 3 shows the top 10 authors pairs returned from
Topk i, and topk t. Both queries return the same pairs
because the authors of each pair were reachable in the whole
interval (10 years). Thus, the authors that were connected
for the longest interval are the same with the authors that
are connected the most time in the past. We clarify that
the top-k processing steps that were followed in Topk i,
and topk t stop when they nd the rst k pairs that meet
the requirement. Hence, ties, i.e., pairs that have the same
property may not be reported.</p>
      <p>
        Comparison with the Time-Varying Approach.
Finally, we implemented the data model introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
tested its performance for the Stab query. The approach
100000
10000
)s 1000
m
(
e
iTm 100
10
1
      </p>
      <p>M+I</p>
      <p>M</p>
      <p>S</p>
      <p>TMV M+I</p>
      <p>M</p>
      <p>S</p>
      <p>TMV
S+I
TRUE</p>
      <p>
        S+I
FALSE
in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduces a speci c node to model the interaction
between two nodes at a speci c time point. They also use
a hierarchical index to support different time granularities,
which is an issue that we do not address here, thus, we do
not implement such an index.
      </p>
      <p>We created a new type of node PAPER which denotes
the interaction of publishing a paper and an AUTHOR node
type for the authors. We connect with each PAPER its
authors using an edge type PUBLISH. For the time index, we
connect each AUTHOR and PAPER node to the time index
nodes to which they belong. For example, if authors A and
B wrote a paper P together in t then from the time index
node that corresponds to t, we create edges that connect
the time index node with the A, B and P nodes. To nd
if two authors A and B are reachable, we have to obtain
the authors from each PAPER node that A is connected
and from them to obtain their co-authors. We repeat this
process until we nd B. This process is costly for nding
PAPER nodes that were active at a speci c time point, since
we check the time index for each PAPER node to see if it
was active in that time.</p>
      <p>Running a Stab query using this approach requires 67.8
seconds for true queries and 58.5 seconds for false queries
(shown in Figure 4), while our best approach requires only
0.43 and 8 milliseconds for answering true and false queries
respectively. This can be explained by the fact that their
model has not been designed for answering historical
reachability queries but for querying the presence of objects in a
number of given time points.</p>
    </sec>
    <sec id="sec-8">
      <title>RELATED WORK</title>
      <p>Although, graph data management has been the focus of
much current research, work in processing historical queries
is rather limited. The main focus of research on query
processing in evolving graphs has been on efficiently storing
and retrieving graph snapshots. In this paper, our focus is
on de ning different types of historical queries and on
processing them in a graph database.</p>
      <p>
        Historical query processing in these approaches requires
as a rst step reconstructing the relevant snapshots. Then,
queries are processed through an online traversal on each of
the snapshots. Various optimizations for reducing the
storage and snapshot reconstruction overheads have been
proposed. Optimizations include the reduction of the number of
snapshots that need to be reconstructed by minimizing the
number of deltas applied [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], using a hierarchical index of
deltas and a memory pool [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], avoiding the reconstruction
of all snapshots [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and improving performance by parallel
query execution and proper snapshot placement and
distribution [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Recent work also addresses indexing for historical
reachability queries through an index that contains information
about strongly connected components membership at
various time points [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], indexing for historical shortest path
distance queries [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ] and indexing for durable pattern queries
on historical graphs [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        Finally, the only work on evolving graphs using a native
graph database we are aware of is the work in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] where the
authors use Neo4j. Their evolving graph besides structural
updates, also includes time-varying attributes. They do not
provide a taxonomy of historical graph queries, instead they
implement a number of historical queries that also include
attributes.
      </p>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSIONS</title>
      <p>In this paper, we studied queries on evolving graphs. We
have proposed a taxonomy of historical graph queries, that
are queries that involve past graph snapshots. We presented
different approaches of storing and retrieving an evolving
graph in a graph database by either using a single-edge with
a lifespan attribute or multiple-edge types where each type
corresponds to a different time point. We have proposed
algorithms for evaluating historical reachability queries and
evaluated our approaches using the DBLP dataset.</p>
      <p>There are many possible directions for future work. One
such direction is exploiting our algorithms towards
answering other types of historical queries, such as shortest path
ones. Another direction concerns historical queries that take
into account attributes on the nodes and edges as well
evolving graphs with time-varying attributes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Takuya</given-names>
            <surname>Akiba</surname>
          </string-name>
          , Yoichi Iwata, and
          <string-name>
            <given-names>Yuichi</given-names>
            <surname>Yoshida</surname>
          </string-name>
          .
          <article-title>Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling</article-title>
          .
          <source>In WWW</source>
          , pages
          <volume>237</volume>
          {
          <fpage>248</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Lars</given-names>
            <surname>Backstrom</surname>
          </string-name>
          , Daniel P. Huttenlocher,
          <string-name>
            <surname>Jon M. Kleinberg</surname>
          </string-name>
          , and Xiangyang Lan.
          <article-title>Group formation in large social networks: membership, growth, and evolution</article-title>
          .
          <source>In KDD</source>
          , pages
          <volume>44</volume>
          {
          <fpage>54</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Bahman</given-names>
            <surname>Bahmani</surname>
          </string-name>
          , Abdur Chowdhury, and
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Goel</surname>
          </string-name>
          .
          <article-title>Fast incremental and personalized pagerank</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <volume>173</volume>
          {
          <fpage>184</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Ciro</given-names>
            <surname>Cattuto</surname>
          </string-name>
          , Marco Quaggiotto, Andre Panisson, and
          <string-name>
            <given-names>Alex</given-names>
            <surname>Averbuch</surname>
          </string-name>
          .
          <article-title>Time-varying social networks in a graph database: a neo4j use case</article-title>
          .
          <source>In GRADES, page 11</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Raymond</surname>
            <given-names>Cheng</given-names>
          </string-name>
          , Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang,
          <string-name>
            <surname>Lidong Zhou</surname>
            ,
            <given-names>Feng</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>and Enhong</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Kineograph: taking the pulse of a fast-changing and connected world</article-title>
          .
          <source>In EuroSys</source>
          , pages
          <volume>85</volume>
          {
          <fpage>98</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Sparksee</given-names>
            <surname>Graph</surname>
          </string-name>
          <article-title>Database</article-title>
          . http://www.sparsity-technologies.com/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>David</given-names>
            <surname>Dominguez-Sal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Urbon-Bayes</surname>
          </string-name>
          , Aleix Gimenez-Van~o,
          <string-name>
            <surname>Sergio</surname>
          </string-name>
          Gomez-Villamor,
          <article-title>Norbert Mart nez-</article-title>
          <string-name>
            <surname>Bazan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Josep-Lluis</surname>
          </string-name>
          Larriba-Pey.
          <article-title>Survey of graph database performance on the HPC scalable graph analysis benchmark</article-title>
          .
          <source>In WAIM</source>
          , pages
          <volume>37</volume>
          {
          <fpage>48</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Wentao</given-names>
            <surname>Han</surname>
          </string-name>
          , Youshan Miao,
          <string-name>
            <given-names>Kaiwei</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Ming Wu</given-names>
            , Fan Yang,
            <surname>Lidong Zhou</surname>
          </string-name>
          , Vijayan Prabhakaran, Wenguang Chen, and
          <string-name>
            <given-names>Enhong</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Chronos: a graph engine for temporal graph analysis</article-title>
          .
          <source>In EuroSys, page 1</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Wenyu</given-names>
            <surname>Huo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Vassilis J.</given-names>
            <surname>Tsotras</surname>
          </string-name>
          .
          <article-title>Efficient temporal shortest path queries on evolving social graphs</article-title>
          .
          <source>In SSDBM, page 38</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Udayan</given-names>
            <surname>Khurana</surname>
          </string-name>
          and
          <string-name>
            <given-names>Amol</given-names>
            <surname>Deshpande</surname>
          </string-name>
          .
          <article-title>Efficient snapshot retrieval over historical graph data</article-title>
          .
          <source>In ICDE</source>
          , pages
          <volume>997</volume>
          {
          <fpage>1008</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Georgia</surname>
            <given-names>Koloniari</given-names>
          </string-name>
          , Dimitris Souravlias, and
          <string-name>
            <given-names>Evaggelia</given-names>
            <surname>Pitoura</surname>
          </string-name>
          .
          <article-title>On graph deltas for historical queries</article-title>
          .
          <source>WOSS</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Alan</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Labouseur</surname>
          </string-name>
          , Jeremy Birnbaum, Paul W. Olsen Jr,
          <string-name>
            <surname>Sean R. Spillane</surname>
          </string-name>
          , Jayadevan Vijayan,
          <string-name>
            <surname>Jeong-Hyon Hwang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Wook-Shin Han</surname>
          </string-name>
          .
          <article-title>The g* graph database: efficiently managing large distributed dynamic graphs</article-title>
          .
          <source>Distributed and Parallel Databases</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Alan</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Labouseur</surname>
          </string-name>
          , Paul W. Olsen, and
          <string-name>
            <surname>Jeong-Hyon Hwang</surname>
          </string-name>
          .
          <article-title>Scalable and robust management of dynamic graph data</article-title>
          .
          <source>In VLDB</source>
          , pages
          <volume>43</volume>
          {
          <fpage>48</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jure</surname>
            <given-names>Leskovec</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jon M. Kleinberg</surname>
            , and
            <given-names>Christos</given-names>
          </string-name>
          <string-name>
            <surname>Faloutsos</surname>
          </string-name>
          .
          <article-title>Graphs over time: densi cation laws, shrinking diameters and possible explanations</article-title>
          .
          <source>In KDD</source>
          , pages
          <volume>177</volume>
          {
          <fpage>187</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Chenghui</surname>
            <given-names>Ren</given-names>
          </string-name>
          , Eric Lo, Ben Kao, Xinjie Zhu, and Reynold Cheng.
          <article-title>On querying historical evolving graph sequences</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>11</issue>
          ):
          <volume>726</volume>
          {
          <fpage>737</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Konstantinos</surname>
            <given-names>Semertzidis</given-names>
          </string-name>
          , Kostas Lillis, and
          <string-name>
            <given-names>Evaggelia</given-names>
            <surname>Pitoura</surname>
          </string-name>
          . Timereach:
          <article-title>Historical reachability queries on evolving graphs</article-title>
          .
          <source>In EDBT</source>
          , pages
          <volume>121</volume>
          {
          <fpage>132</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Konstantinos</given-names>
            <surname>Semertzidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>Evaggelia</given-names>
            <surname>Pitoura</surname>
          </string-name>
          .
          <article-title>Durable graph pattern queries on historical graphs</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2016</year>
          (to appear).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>