<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Backlogs and Interval Timestamps: Building Blocks for Supporting Temporal Queries in Graph Databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriel Campero Durand</string-name>
          <email>gabrielcampero@acm.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcus Pinnecke, David Broneske,</string-name>
          <email>firstname.lastname@ovgu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gunter Saake, University of Magdeburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Magdeburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The analysis of networks, either at a single point in time or through their evolution, is an increasingly important task in modern data management. Graph databases are uniquely suited to improve static network analysis. However, there's still no consensus on how to best model data evolution with these databases. In our work we propose an elementary concept to support temporal analysis with property graph databases, using a single-graph model limited to structural changes. We manage the temporal aspects of items with interval timestamps and backlogs. To include backlogs in the model we examine two alternatives: (1) global indexes, and (2) using the graph as an index by resorting to timestamp denormalization. We evaluate density calculation and time slice retrieval over successive days from a SNAP dataset, on an Apache Titan prototype of our model, observing from 2x to 100x response time gains by comparing di erential vs. snapshot methods; and no conclusive di erence between the backlog alternatives.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>•Information systems ! Graph-based database
models; Temporal data;
Graph-based database models, Temporal analysis of
information networks, Di erential graph algorithms</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>There is a common saying that time changes everything.
While it might be a trivial observation, it seems to apply
particularly well to information systems: the semantics and
value of any data item might not be the same if it was
recorded today rather than several years ago.
2017, Copyright is with the authors. Published in the Workshop
Proceedings of the EDBT/ICDT 2017 Joint Conference (March 21, 2017, Venice,
Italy) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is
permitted under the terms of the Creative Commons license CC-by-nc-nd
4.0</p>
      <p>The timestamp of when a data item is created, updated
or deleted, adds meaningful context to this item; a context
that in turn enables better auditing, security and historical
understanding. Clearly, storing the time dimension of data
can make a di erence.</p>
      <p>Capitalizing on the declining cost of storage, present-day
information systems can go beyond the simple timestamping
of the latest item transactions, by employing databases to
additionally track all historic versions of business data items.
This amounts to dismissing in-place updates and adopting
an accumulate-only approach, where any change in an item
is appended as a new item. As a result, the whole history of
data evolution can be mined for valuable business insights.</p>
      <p>
        The database community has produced a plethora of
concepts and tools for supporting the temporal dimension in
information systems under the relational model [
        <xref ref-type="bibr" rid="ref21 ref8">8, 21</xref>
        ].
However, pure relational models make navigation of real world
networks a cumbersome endeavor, thus they do not support
network analysis as well as graph databases. For this task
relational systems might gain from specialized indexes [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Although temporal analysis has proven to be important
for understanding networks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], spawning taxonomies for its
ne-grained sub-tasks [
        <xref ref-type="bibr" rid="ref11 ref2 ref23">2, 11, 23</xref>
        ], the support for temporal
features in graph databases is in a patently less developed
state as for their relational counterparts.
      </p>
      <p>
        The common work ow for ad hoc temporal network
analysis revolves around large sequences of separate snapshots.
Processing is performed in an embarrassingly parallel way,
usually without combining the snapshots. A few specialized
systems have been proposed for such work ow [
        <xref ref-type="bibr" rid="ref10 ref16">10, 16</xref>
        ].
      </p>
      <p>With the increasing adoption of graph databases another
work ow can be considered, based on the continuous
collection of snapshots in an accumulate-only graph. Scarce
studies exist tackling this work ow or, more broadly, how
general graph databases could aid temporal analysis.</p>
      <p>
        In this short paper we present our ongoing studies on the
support of general graph databases for temporal network
analysis. We begin by brie y presenting the long-established
concepts of interval timestamps and backlogs (Section 2).
Afterwards we continue with our contributions as follows:
1. We outline a concept for including interval
timestamps and backlogs into a property graph model
(Section 3).
2. We give empirical evidence illustrating the bene ts
of our approach with a prototype based on Apache
Titan, Cassandra, and Gremlin [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] (Section 4).
Subsequently we discuss our ndings (Section 4.2). We
conclude this paper by presenting related work (Section 5) and
proposing future directions for our research (Section 6).
      </p>
    </sec>
    <sec id="sec-3">
      <title>FUNDAMENTALS</title>
      <p>
        Historical data has been managed for decades in relational
database systems. For a broader background, we refer
readers to reference work entries on Temporal Data Models[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]
and other work [
        <xref ref-type="bibr" rid="ref21 ref7 ref8">8, 21, 7</xref>
        ]. In this section, we describe two
major temporal data representation models that have been
used for this purpose, interval timestamps and backlogs.
2.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Running example</title>
      <p>As a running example we consider a group of devices and
management services which monitor these devices and track
their interconnections, similar to the as-733 SNAP dataset.
Devices and the connections between them, can be added
or removed (which corresponds to events). The connections
between devices can be represented by a connects-to
relationship. If a device is removed, its connections to other
devices are removed as well.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Interval timestamps</title>
      <p>The time-dependent existence of entities can be expressed
using interval timestamps.</p>
      <p>An interval timestamp contains two bits of information:
the start date (including) and the end date (excluding).
Using an interval timestamp as metadata, temporal objects
can be annotated to express their validity and existence
inside a certain range of time between the start date and the
end date. For instance, consider two temporal objects A
and B with timestamps t(A) = [dstart; dend) and t(B) =
[d0start; d0end). The object A exists before the object B if
(dstart &lt; d0start) holds. Both objects A and B exist together
in a certain time span if also (dend &gt; d0start) holds.</p>
      <p>By adding interval timestamps, temporal queries can be
presented in a straightforward and e cient way, such as
asking for all items that exist at a certain point in time, an
operation known as time slice retrieval.</p>
      <p>Physically, interval timestamps add two columns to
historical data, which correspond to the start and end date.
For a better understanding, we give an example of interval
timestamping in Figure 1 by using traditional tabular
representation. The entries record a sequence of events,
happening on 3 successive days (d1; d2; d3): On the rst day,
devices 1 and 2 are added to the database, in addition to a
connects-to relation from device 1 to 2, with a as the id of
this relation. On the second day, device 3 is added, next to
a connects-to relation from device 2 to 3, with b being the
id of such relation. At the third day, both the device 1 and
its connects-to relation (tuple a) are deleted/archived.</p>
      <p>Note that a distinction can be made between tuple interval
timestamps (a ecting the whole tuple) and attribute
interval timestamps (a ecting only a certain tuple attribute). For
the purpose of our discussion we will limit ourselves to
tuple interval timestamps. Furthermore, we consider a model
where items only exist within a given interval, if an item is
deleted and re-inserted, it would be considered a di erent
item.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Backlogs</title>
      <p>An alternative to timestamped intervals is a backlog. A
backlog is basically a list of events ordered by time.</p>
      <p>Backlogs are a useful concept to recap the evolution of a
system under observation. Each entry (di; Ei) in a backlog is
a record containing the actual timestamp di of the recorded
event Ei.</p>
      <p>For instance, we can monitor a system and catch its stream
of events (d1; E1); (d2; E2); :::; (dn; En) in a protocol, namely
the event backlog.</p>
      <p>For a better understanding of the di erences between
interval timestamps and backlogs, we illustrate in Figure 2 the
same succession of events presented previously with interval
timestamps (see Section 2.2), but in a traditional backlog
representation, with E = fcreation; deletiong.</p>
      <p>In direct comparison to interval timestamps, the entries
in a backlog representation are immutable and instead of
intervals each entry is only valid at one speci c timepoint
rather than during a time span. To reconstruct the state of
an entity at a given moment, this representation requires to
replay the sequence of events a ecting the entity (either from
scratch or over a previously validated state of this entity).
3.</p>
    </sec>
    <sec id="sec-7">
      <title>TIME IN PROPERTY GRAPHS</title>
      <p>After presenting the necessary background about backlogs
and interval timestamps in the previous section, we now
provide our proposal to incorporate both concepts into a
property graph database system(Section 3.1). Following this,
we introduce an example of how both time representations
could be used for the commonplace task of density
calculation. To illustrate how to exploit the e ective availability of
backlogs, we include an incremental version of density
calculation. Our examples are given using a state-of-the-art
graph query language, Gremlin (Section 3.2).
3.1</p>
    </sec>
    <sec id="sec-8">
      <title>Including time in a property graph</title>
      <p>In Figure 3, we display possible ways to embed temporal
information in a property graph model, continuing with our
running example from Section 2.1.</p>
      <p>While the interval timestamps can be easily recorded as
properties of the items themselves, native support for
backlogs could be accomplished in 2 di erent ways:
1. Global Indexing. In this alternative, global indexes
covering creation and deletion times of items can be
used at run-time to e ciently reconstruct ordered sets
of backlogs for given periods (Figure 3, with only the
timestamped nodes and edges).
2. Graph as an index. Another alternative is to use the
graph as an index in itself (GRAIN), by using a
special class of edges {backlog edges, connected to logger
meta-vertexes (Figure 3, considering all entities). The
collection of backlog edges (going out from the logger
vertexes) serves as a backlog table (Figure 2).</p>
      <p>Assuming that indexes can exist solely over a given entity
type (vertices or edges), the alternative that capitalizes on
global indexes might imply the need for 4 di erent indexes
to capture the di erent events (creation or deletion). On
the other hand, if we adopt GRAIN, then when any item is
created or deleted, a new backlog edge must be created and
connected from a logger vertex to the given item (or, in the
case of edges, to the vertex at the head of the item). These
backlog edges record a timepoint (the denormalized
timestamps), two possible edge labels to distinguish the events
(i.e., either create or archive events) and a type property to
distinguish the type of logged element.</p>
      <p>Through its outgoing connections, the logger vertexes can
act as single entry point to query about structural changes
in the graph, accelerating the access to elements associated
with speci c change events. In this strategy, queries might
further bene t from the existence of local indexes in the
logger vertexes, covering the relevant properties from the
backlog edges and granting e cient access to the backlogs.</p>
      <p>Regardless of the approach selected for providing
backlogs, a fundamental bene t that can be expected from
having both temporal representations in the same graph model,
is the possibility of combining di erential (supported by the
backlogs) with snapshot (supported by the interval
timestamps) processing in a single operation, thus providing
adjustable means to optimize temporal analysis.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Density calculation using backlogs and interval timestamps</title>
      <p>To illustrate the di erent processing approach that
backlogs could enable, we consider density calculation1. This
task requires to count the number of vertices and edges
valid at a given timepoint. Such counting can be naively
implemented with incremental processing by calculating the
actual values on the rst day and incrementally updating
them with backlog records, for the following days.</p>
      <p>Next are some examples of di erent Gremlin traversals
that could be used for this task. We start with the basic
case, using simple traversals to count the number of
elements active at a given date, based on interval timestamps
(with the predicate: deletedAt &gt; date ^ createdAt date).
The examples cover the vertices, similar approaches could
be followed for the edges:
titanGraph.V().</p>
      <p>hasLabel("device").
has("deletedAt", P.gt(date)).
has("createdAt", P.lte(date)).</p>
      <p>count().next();
Next we include traversals for incremental density
calculation, nding out the number of elements changed on a given
date using equality predicates.</p>
      <p>The following example uses global indexes over createdAt
and deletedAt. Note the need for accessing two indexes for
vertices and (not shown) two additional indexes for edges:
titanGraph.V().</p>
      <p>hasLabel("device").
has("createdAt", date).</p>
      <p>count().next();
titanGraph.V().</p>
      <p>hasLabel("device").
has("deletedAt", date).</p>
      <p>count().next();</p>
      <p>Finally, we illustrate the GRAIN strategy. In the example
we count all backlog edges labeled as created on a given
date, and group them by the type of entity created.The same
approach can be used for deletions.</p>
      <p>Map&lt;Object,Long&gt; creationCount =
(Map&lt;Object, Long&gt;) titanGraph.V(logId).</p>
      <p>outE("created").
has("createdAt", date).
groupCount("created").by("createdType").</p>
      <p>cap("created").next();
numCreatedVertices= creationCount.get(Vertex.class);
numCreatedEdges= creationCount.get(Edge.class);
4.</p>
    </sec>
    <sec id="sec-10">
      <title>EVALUATION</title>
      <p>To provide an early assessment of possible improvements
from backlog-mediated di erential processing, we
implemented a prototype using Apache Titan 1.02, a property graph
database using the TinkerPop 33 framework.
1For the purposes of our study, we de ne the density of a
graph as jEj=(jV j jV j 1))j, with E being the number of
edges and V the number of vertices of the graph.
2http://thinkaurelius.github.io/titan/
3http://tinkerpop.incubator.apache.org/</p>
      <p>
        We formulated our queries in Gremlin [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. As storage
backend for Titan we used Cassandra 2.1.11.
      </p>
      <p>In Titan the properties of a vertex and all the
information of its connected edges, are stored within each vertex
storage space. Among other options, performance tuning
is made possible by de ning global indexes (over
combinations of vertex/edge properties), or local vertex-level indexes
(that map from edge properties and values to matching edges
stored in the same vertex row).</p>
      <p>So as to adequately represent Titan's capabilities in our
prototype, we de ned global indexes covering the exact
combination of properties that we queried against. Vertex-level
indexes were also created in the logger vertex, to speed up
access to its edges along the lines of the properties we queried
against.</p>
      <p>Given that currently the basic Titan indexes (known as
composite indexes) only address exact matches, interval
queries (which tend to appear in temporal analyses) are notably
lacking support in our implementation. This amounts to a
fundamental limitation in our prototype. In future work
we will use Titan's full-text indexes (referred to as mixed
indexes) or exogenous interval-speci c indexes, to provide
support for this type of queries.</p>
      <p>The experiments were conducted on a commodity
multicore machine running Ubuntu 14.04 and Java SDK
8u111linux-x64, with an Intel® CoreTM i7-2760QM CPU @ 2.40
GHz processor (8 cores in total) and 7.7 GiB of memory.
4.1</p>
    </sec>
    <sec id="sec-11">
      <title>Data and workload characteristics</title>
      <p>
        For the dataset we selected the SNAP as-733 dataset [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
a collection of 733 daily instances from November 8 1997
to January 2 2000, of peer relationship networks (de ned
by having tra c ows between them) among autonomous
systems. Vertices on this dataset only possess one property
{their nodeId. Unlike other types of networks, vertices and
edges appear and disappear on a daily basis, following a
super-linear growth over time in the edge to vertex ratio;
with a densi cation exponent of a=1.18 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        We decided on testing two basic functions, stemming from
the original research on this dataset [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]: density calculation
and a whole-graph time slice retrieval over successive days.
For density calculation we used the same strategies described
in the Gremlin examples on Section 3.2. The incremental
version of time slice retrieval followed a similar approach,
starting by reconstructing the complete graph as of the rst
day, and for the successive days limiting itself to updating
the daily subset of items marked as changed in the backlogs.
4.2
      </p>
    </sec>
    <sec id="sec-12">
      <title>Performance results</title>
      <p>In our studies the basic approach proved to be less e cient
than its incremental counterparts, a point made markedly
clear as the number of successive days increased.Figure 4
and 5 display the observed average response times for
density calculation and time slice retrieval requests, based on
instrumented application-level timers. In both tasks, the basic
approach was observed to scale almost linearly with the
progression of days. The scalability pro le of the incremental
strategies could not be deduced from our observations. To
understand this, one should remember that the performance
of di erential methods does not depend on the number of
days but on the amount of elements changing across the
days, a factor that averages low in the as-733 dataset4.
4To give some perspective, the number of elements valid
104
s
inm103
e
m
i
T
102
104
s
m
in103
e
m
i
T
102
2 days 4 days 8 days 16 days 32 days 64 days 128 days
2 days 4 days 8 days 16 days 32 days 64 days 128 days
In the case of density calculation, an average speedup of
2.1x was observed at 2 days between the incremental and
the basic cases; at 128 days it increased to 95x. For time
slice retrieval the average speedup went from 1.9x at 2 days,
to 100x at 128 days.</p>
      <p>Titan's lack of index support for the interval queries in
the basic method5, paired with good index usage from the
incremental methods (exclusively asking for exact queries,
e.g. createdAt==today), contributed to widen the
performance gap between the basic and incremental cases.</p>
      <p>A closer look at the daily traversals, enabled by
Gremlin's pro le step, revealed that the basic methods spent more
than 95% of their time in a global TitanGraphStep, a step
tasked with nding vertices or edges valid at a given
timepoint. Such a step could be deemed the equivalent of a
fulltable scan in relational databases. The global index solution
required 4 daily traversals, each of them consuming 95% of
their time in the TitanGraphStep, however the total time for
these traversals was considerably less than in the basic case
given that the underlying access was mediated by a global
index, and the number of elements to retrieve was smaller
(thousands vs. tens). The remaining time for the traversals
in the basic and global incremental cases corresponded to a
count step (for densities) and an id step (for time slices).</p>
      <p>The GRAIN solution required only 2 traversals per day
(see code snippets in section 4), each of them taking
approximately 2% of their time for accessing the logger vertex,
4.7% on selecting the backlog edges marking the changes for
a given timepoint (in a TitanVertex step), and the rest of
the time in groupCount or branch steps, required to group
the properties of edges and vertices.
on a given day was in the lower thousands ( 3-6k vertices,
10-26k edges), whereas the number of elements changed
was generally less than 200.
5i.e. deletedAt &gt; today ^ createdAt today.
4.3</p>
    </sec>
    <sec id="sec-13">
      <title>Memory consumption</title>
      <p>Consistent with the observed traversal pro les, portraying
the GRAIN solution as more compute-bound than others,
heap usage was observed (via JConsole) to be the lowest for
that approach, peaking at 173 Mb throughout a single run
over increasing number of successive days. This can also be
explained because of the smaller number of indexes used and
cached (only 2 vertex-local indexes) and reduced reloading
of physical entities, thanks to the use of denormalized
properties (i.e., ids). In contrast, the basic method evidenced the
highest memory footprint, peaking at 460 Mb, followed by
the global index method, which peaked at 203 Mb. The high
footprint of the basic method was expected, as this approach
sifts without indexes over the whole data. The di erence
between global and local functions could be attributed to the
memory costs of loading several global indexes, without any
gain from denormalization.</p>
      <p>Finally, while through all of our studies incremental
strategies led conclusively to better performance than the
basic implementation (Figure 4 and 5); it was not
statistically clear over several runs that one approach to providing
backlogs would be better than the other in most scenarios.
Within the limited scope of our tests, the GRAIN
strategy consistently outperformed the global alternative only on
very large number of days (64 and more), possibly because
of the former's lower memory footprint and reduced access
to physical entities by using denormalized values. Further
tests are needed to establish the distinguishing performance
characteristics of the alternatives, their tradeo s and
opportunities.</p>
    </sec>
    <sec id="sec-14">
      <title>RELATED WORK</title>
      <p>
        Industry studies have found that temporal database
technologies are able to signi cantly reduce the cost of
developing business products that model time6. This is re ected in
changes to the SQL standard, increasing support for
temporal operations in mainstream database systems. SQL:2011
incorporates temporal features through system versioning
and validity intervals associated with tuples [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Relational
database system vendors are nowadays shipping temporal
support to end users (e.g., IBM DB2, Microsoft
ImmortalDB,and SAP HANA [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]).
      </p>
      <p>To our knowledge, only a handful of proposals exist for
augmenting a general graph database with temporal
functionality.</p>
      <p>
        TGraph [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] builds upon Neo4j and targets speci c types
of graphs for which changes in vertex/edge properties are
frequent and structural changes happen seldom. The authors
develop a specialized storage system to manage the dynamic
properties, but do not address the structural changes that
are the focus of our work. Cattuto et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] also base their
research on Neo4j, addressing time-varying social networks.
They present a data model that -like our study- uses the
graph as an index in itself, with the concept of user-level
frame nodes which are stringed together to form a
timeline, each frame pointing to the elements valid within it, in
such a way that any element can be part of several frames.
Contrasting to our approach, the frame node constitutes an
index over the validity of items whereas our logger vertex
acts as an index over change events on items.
6https://www.ibm.com/developerworks/data/library/
techarticle/dm-1204db2temporaldata/
      </p>
      <p>
        Semertzidis and Pitoura [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] describe a model following
the frame node concept, with Sparksee as a backend. They
o er algorithms for historic reachability queries, comparing
two graph representations: one where each edge can record
all the times in which it was active by having several disjoint
intervals of validity (single-edge case), and another where
each edge can only have one interval of validity, forcing
copies of the edge to be created for each new validity
interval (multi-edge case, as our study). The authors nd
that the second case is better supported by the native graph
storage, leading to reduced latencies.
      </p>
      <p>
        The improvement of speci c temporal queries, while using
interval timestamped elements, has also been researched.
Indexing and algorithmic changes have been shown to provide
meaningful contributions [
        <xref ref-type="bibr" rid="ref22 ref24 ref6">6, 22, 24</xref>
        ].
      </p>
      <p>The proposals listed above do not include di erential
processing, hence our approach could be complementary to them.</p>
      <p>
        Studies that do not use general graph databases often
address a work ow de ned by the loading of separate
snapshots. Mo tt and Stoyanovich [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] introduce a distributed
processing framework for time graphs running on GraphX.
They examine 3 physical representations for time graphs,
and the kind of locality favoured by each: SnapshotGraph,
OneGraph and HybridGraph. The rst one keeps separate
snapshots (favoring structural locality), the second one
compacts all changes in a single graph with interval timestamps
(favoring temporal and structural locality), the third
combines both, trading compactness for better structural
locality. While the OneGraph case is close to our solution, the
authors use a single-edge strategy for tracking changes, and
do not discuss backlogs or di erential processing.
      </p>
      <p>
        Chronos [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and ImmortalGraph [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] are storage and
execution engines designed to compute over a series of
snapshots. The authors present strategies to improve locality
when laying out multiple snapshots of a graph in memory.
The processing is further tuned to the data layout with a
locality-aware batch scheduling method, and additional
incremental computation powered by pre-computing the
intersection or union of snapshots.
      </p>
      <p>
        Graph deltas, or the set of changes between two
snapshots, have been at the core of studies seeking to exploit
them to reduce the number of physical snapshots, improving
processing and storage. Among them, Khurana and
Deshpande [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] document a solution for time slice retrieval based
on the concept of DeltaGraph: a hierarchical distributed
index of deltas that, once overlaid to a snapshot or an
intermediate materialized view, can be used to e ciently
reconstruct other snapshots of the graph. The authors also
show that di erent parameters and materialization choices
help control the query response times, suggesting possible
improvements to their solution. Koloniari et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] present
a model that only keeps a physical copy of the current graph
and stores the backlogs externally as append-only log les.
Ren et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] advocate for a di erent alternative to the
problem of reducing the number of snapshots, by
eschewing di erential approaches in favour of forwarding queries
to pre-computed representative models of snapshot clusters,
subsequently verifying and xing the results.
      </p>
      <p>By aggregating all changes in one graph and natively
supporting backlogs, our study amounts to a di erent
perspective for powering di erential computing in graph databases,
with possible applications to large scale scenarios de ned by
loading separate snapshots, as the ones described above.</p>
      <p>In this paper we document an elementary inquiry into
supporting temporal queries with an existing property graph
database. For simplicity, we limited our scope to a
singlegraph model under an accumulate-only approach, displaying
structural changes rather than changes on graph properties.
We identi ed interval timestamps and backlogs as promising
building blocks to model graph evolution and aid
computations over the graph. For embedding the backlogs we
considered two fundamental alternatives: the use of global indexes
to reconstruct ordered sets of backlogs on run-time, or the
storage of the backlogs as graph elements themselves. The
later solution was observed to enable further optimizations
with the use of local indexes over the backlog edges.</p>
      <p>We performed an empirical study, backed by an
existing graph database, to compare the alternatives and assess
the impact of di erential processing as a query optimization
mechanism. Early results show that backlog-mediated
incremental processing can contribute signi cantly to improving
temporal queries. These ndings make the case that the
e cient retrieval of backlogs should be a building block for
temporal graph models, considering the ease of adoption and
that it opens up opportunities for di erential processing.</p>
      <p>Further studies are needed to characterize the operations
more likely to gain from backlogs. As future directions we
aim to validate our observations with other datasets, queries
and databases, improving the baseline case with special
indexes for range queries. We will specially carry out more
tests to characterize the pros and cons of the alternatives
for backlog support. We will pacakage the temporal
functionality into a thin layer that can be included over an
existing graph database. We would also like to address property
updates, hierarchies of logger vertices, query language
integration, the role of backlogs for e cient snapshot reduction
and the adjustable tuning of operations by combining
differential with point-in-time processing.
7.</p>
    </sec>
    <sec id="sec-15">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank Thomas Spatzier and
the IBM Bluemix Availability Monitoring team, for their
generous mentoring in prototyping with graph technologies
for real-world use cases. This work was partially funded by
the DFG (grant no.: SA 465/50-1).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Subbian</surname>
          </string-name>
          .
          <article-title>Evolutionary network analysis: A survey</article-title>
          .
          <source>CSUR</source>
          ,
          <volume>47</volume>
          (
          <issue>1</issue>
          ):
          <fpage>10</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.-w.</given-names>
            <surname>Ahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Plaisant</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Shneiderman</surname>
          </string-name>
          .
          <article-title>A task taxonomy for network evolution analysis</article-title>
          .
          <source>TVCG</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <volume>365</volume>
          {
          <fpage>376</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cattuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Quaggiotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panisson</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          .
          <article-title>Time-varying social networks in a graph database: A neo4j use case</article-title>
          .
          <source>In GRADES, page 11. ACM</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Chronos: A graph engine for temporal graph analysis</article-title>
          .
          <source>In EuroSys, page 1. ACM</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Huai</surname>
          </string-name>
          . Tgraph:
          <article-title>A temporal graph data management system</article-title>
          .
          <source>In CIKM</source>
          , pages
          <volume>2469</volume>
          {
          <fpage>2472</fpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Huo</surname>
          </string-name>
          and
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Tsotras</surname>
          </string-name>
          .
          <article-title>E cient temporal shortest path queries on evolving social graphs</article-title>
          .
          <source>In SSDBM, page 38. ACM</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          and
          <string-name>
            <surname>L. Mark.</surname>
          </string-name>
          <article-title>Queries on change in an extended relational model</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <volume>192</volume>
          {
          <fpage>200</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaufmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Manjili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vagenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Farber, and</article-title>
          <string-name>
            <given-names>N.</given-names>
            <surname>May</surname>
          </string-name>
          .
          <article-title>Timeline index: A uni ed data structure for processing queries on temporal data in SAP HANA</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>SIGMOD</given-names>
          </string-name>
          , pages
          <volume>1173</volume>
          {
          <fpage>1184</fpage>
          . ACM,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaufmann</surname>
          </string-name>
          , P. Vagenas,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Fa</surname>
          </string-name>
          <article-title>rber. Comprehensive and interactive temporal query processing with SAP HANA</article-title>
          . VLDB,
          <volume>6</volume>
          (
          <issue>12</issue>
          ):
          <volume>1210</volume>
          {
          <fpage>1213</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>U.</given-names>
            <surname>Khurana</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          .
          <article-title>E cient snapshot retrieval over historical graph data</article-title>
          .
          <source>In ICDE</source>
          , pages
          <volume>997</volume>
          {
          <fpage>1008</fpage>
          . IEEE,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Koloniari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Souravlias</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Pitoura</surname>
          </string-name>
          .
          <article-title>On graph deltas for historical queries</article-title>
          .
          <source>In WOSS</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.-E.</given-names>
            <surname>Michels</surname>
          </string-name>
          .
          <article-title>Temporal features in sql: 2011</article-title>
          . SIGMOD Rec.,
          <volume>41</volume>
          (
          <issue>3</issue>
          ):
          <volume>34</volume>
          {
          <fpage>43</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          .
          <article-title>Graphs over time: Densi cation laws, shrinking diameters and possible explanations</article-title>
          .
          <source>In SIGKDD</source>
          , pages
          <volume>177</volume>
          {
          <fpage>187</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Krevl</surname>
          </string-name>
          . fSNAP Datasetsg:
          <article-title>fStanfordg large network dataset collection</article-title>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu and M. T. O</surname>
          </string-name>
          <article-title>zsu</article-title>
          .
          <source>Encyclopedia of database systems</source>
          , volume
          <volume>6</volume>
          . Springer Berlin, Heidelberg, Germany,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miao</surname>
          </string-name>
          , W. Han,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          , E. Chen, and
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Immortalgraph: A system for storage and analysis of temporal graphs</article-title>
          .
          <source>TOS</source>
          ,
          <volume>11</volume>
          (
          <issue>3</issue>
          ):
          <fpage>14</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>V. Z.</surname>
          </string-name>
          <article-title>Mo tt and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          .
          <article-title>Towards a distributed infrastructure for evolving graph analytics</article-title>
          .
          <source>In WWW Companion</source>
          , pages
          <volume>843</volume>
          {
          <fpage>848</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinnecke</surname>
          </string-name>
          .
          <article-title>E cient single step traversals in main-memory graph-shaped data</article-title>
          .
          <source>Master's thesis</source>
          , School of Computer Science, University of Magdeburg,
          <volume>6</volume>
          <fpage>2016</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , and R. Cheng.
          <article-title>On querying historical evolving graph sequences</article-title>
          .
          <source>VLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>11</issue>
          ):
          <volume>726</volume>
          {
          <fpage>737</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          .
          <article-title>The gremlin graph traversal machine and language (invited talk)</article-title>
          .
          <source>In DBPL</source>
          , pages
          <volume>1</volume>
          {
          <fpage>10</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Salzberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Tsotras</surname>
          </string-name>
          .
          <article-title>Comparison of access methods for time-evolving data</article-title>
          .
          <source>CSUR</source>
          ,
          <volume>31</volume>
          (
          <issue>2</issue>
          ):
          <volume>158</volume>
          {
          <fpage>221</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K.</given-names>
            <surname>Semertzidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Pitoura</surname>
          </string-name>
          .
          <article-title>Durable graph pattern queries on historical graphs</article-title>
          .
          <source>In ICDE</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>K.</given-names>
            <surname>Semertzidis</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Pitoura</surname>
          </string-name>
          .
          <article-title>Time traveling in graphs using a graph database</article-title>
          .
          <source>In GraphQ Workshop</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K.</given-names>
            <surname>Semertzidis</surname>
          </string-name>
          , E. Pitoura, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Lillis</surname>
          </string-name>
          . Timereach:
          <article-title>Historical reachability queries on evolving graphs</article-title>
          .
          <source>In EDBT</source>
          , pages
          <volume>121</volume>
          {
          <fpage>132</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>