<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Management of Streaming Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alejandro Rodr guez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert McGrath</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yong Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Myers</string-name>
          <email>jimmyersg@ncsa.uiuc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Center for Supercomputing Applications University of Illinois Urbana-Champaign</institution>
          ,
          <addr-line>IL</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <fpage>63</fpage>
      <lpage>82</lpage>
      <abstract>
        <p>One of the fundamental challenges facing the unprecedented data deluge produced by the sensor networks is how to manage time-series streaming data so that they can be reasoning-ready and provenance-aware. Semantic web technology shows great promise but lacks adequate support for the notion of time. We present a system for the representation, indexing and querying of time-series data, especially streaming data, using the semantic web approach. This system incorporates a special RDF vocabulary and a semantic interpretation for time relationships. The resulting framework, which we refer to as Time-Annotated RDF, provides basic functionality for the representation and querying of time-related data. The capabilities of Time-Annotated RDF were implemented as a suite of Java APIs on top of Tupelo, a semantic content management middleware, to provide transparent integration among heterogeneous data, as present in streams and other data sources, and their metadata. We show how this system supports commonly used time-related queries using TimeAnnotated SPARQL introduced in this paper as well as an analysis of the TA-RDF data model. Such prototype system has already seen successful usage in a virtual sensor project where near-real-time radar data streams need to be fetched, indexed, processed and re-published as new virtual sensor streams.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In this paper we present a semantic stream manager aimed at the problem of
integrating streaming data into RDF and semantic-web enabled infrastructure
and applications. We introduce TA-RDF, a formal model to semantically
represent streams and a software implementation to efficiently access those streams.
This has enabled integration of stream data with other data and metadata,
including documents, imagery, workflows, provenance, and annotations. The
TA-RDF model remains translatable/compatible with regular RDF by means
of an special RDF vocabulary, which allows the use of existing RDF tools and
technologies.</p>
      <p>
        The necessity for such model becomes evident when observing the rapid
development and deployment of various scales of sensor networks for various
observation purposes, which have resulted in an unprecedented data deluge.
One of the major challenges is to support various temporal relationship queries.
In addition, sensor data streams publishing and re-publishing after certain
processing require provenance support for validation and verification [
        <xref ref-type="bibr" rid="ref17">16</xref>
        ]. It would
be advantageous to manage streaming data and their provenance (meta data
about the causal relationship, history linkage etc.) coherently.
      </p>
      <p>Semantic web technology has emerged to be a potential tool that might serve
such purpose. The Resource Description Framework (RDF) was introduced by
the W3C as a model and language to represent semi-structured metadata in the
semantic web. In this model, information is represented via
subject-predicateobject assertions, or triples, that indicate the value of user-defined properties
for given resources. This essentially gives RDF an expressive power equivalent
to the binary existential-conjunctive subset of first order logic. Although this
might be sufficient for many applications, RDF omits all considerations of time
and time-varying information from its model.</p>
      <p>Thus, it is not surprising that this limitation becomes noticeable in sensor
network data management, where time-varying data is at the core. In this paper,
we present a semantic stream manager, both a model to semantically represent
streams and a software implementation to efficiently access those streams, aimed
at the problem of integrating this type of streaming data into RDF with the
purpose of exploiting it in semantic-web applications and applications that benefit
the management of provenance of data through semantic annotation. This will
effectively enable the realization of managing streaming data and provenance
coherently. Note that these data usually have three main characteristics:
They are updated frequently, possibly at a high throughput.</p>
      <p>They can be modeled as a time-changing property or set of properties for
given resources. For instance, the air temperature of a certain location.
The values, or instances, of these properties can be serialized. That is,
they can be ordered according to some timestamp taken from a discrete,
totally ordered domain.</p>
      <p>In practice it is sometimes the case that several of these properties are related
to the same resources and share timestamps as sensors are adjusted to acquire
data at the same regular intervals. For example, the temperature and humidity
of a location can be modeled as properties of the same resource, measured at
the same regular intervals. Our system is specifically designed to handle these
characteristics by facilitating the representation of streams in RDF, but it is
general enough to incorporate other time-related data, e.g. time-ordered events.</p>
      <p>
        One typical requirement of streaming data management systems is the
processing of continuous queries [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], i.e., queries which answer is itself a stream,
often produced in real time as an aggregating computation performed over a
moving window of data on one or more streams. As our focus is in the
representation, storage and retrieval of time-series information, and consider the
summarization and continuous processing of streams to be at a higher layer in
the application stack, we do not consider this type of queries and instead focus
on one-time queries, or queries which (conceptually) produce an answer
immediately that depends only on the data currently in the system. For example:
What was the average temperature in Chicago during December 2008?
      </p>
      <p>The remainder of the paper is organized as follows. Section 2 gives a brief
review of related work. Section 3 states a motivating example. Section 4 and 5
develop Time-Annotated RDF (TA-RDF), a formal extension to RDF. Section
6 develops TA-SPARQL, extending SPARQL. Section 7 discusses the prototype
implementation. Section 8 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recent years have seen an increase in the interest to bring temporal
semantics into the Semantic Web. One of the earlier works is that by Buraga et
al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which introduces an XML-based language for the use of Interval
Temporal Calculus to express temporal relations between web-sites. Gutierrez et
al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] provides a complete syntax and semantics, along with a query language
specification, for temporal RDF, an RDF extension where every statement is
timestamped with the point in time in which it is semantically valid. This work
was later extended by a different group [
        <xref ref-type="bibr" rid="ref16">15</xref>
        ] with indexing and query processing
algorithms specifically suited to temporal RDF.
      </p>
      <p>
        On the specific topic of streaming data, Bolles et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] represents a stream
as sequence of RDF graphs, similar to other works [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ], and presents a SPARQL
extension for streams of RDF. This approach reduces the problem of querying
a stream to querying over a user-specified (sliding) window on the stream, but
has the disadvantage of ignoring the richer time semantics of timestamped but
non-simultaneous events. Other work [
        <xref ref-type="bibr" rid="ref14 ref9">8, 13</xref>
        ] has focused on the ontological
representation of time series sensor data, but has not included a model that
allows the efficient evaluation of time-related queries.
      </p>
      <p>Our work introduces a new semantic and indexing strategy for time in RDF
conceived to minimize impact on current RDF storage and query engines, and
tailored to streaming applications where actual, arbitrary data is stored outside
of the scope of RDF (and indicated in RDF metadata simply as URIs), and
these data change rapidly over time while metadata annotations might change
at a different, lower pace. Our approach allows the seamless integration of both
types of data into a single indexing system.
Location
Chicago O'Hare
Chicago O'Hare
Chicago O'Hare Temperature
Chicago O'Hare Daily Rainfall Accumulation
Chicago O'Hare Daily Rainfall Accumulation
.
.</p>
      <p>.</p>
      <p>Chicago O'Hare Daily Rainfall Accumulation
Location Timestamp Sensor 1 Reading 1
Champaign 2008-06-15 Temp. (C) 30
Champaign 2008-06-16 Temp. (C) 25
Celsius
mm
mm</p>
      <p>Reading
20
25</p>
      <p>Champaign
2008-07-16</p>
      <p>Temp. (C)
25</p>
      <p>Rainfall Acumm. (mm)
10
Consider the example on Figure 1. It shows a sample of time-series data from
several sensors1. Typical queries that could arise over such information could
be:
1. What was the total amount of rainfall accumulation in Chicago in January
2009?
2. What was the average temperature in the month of December 2008?
3. What was the temperature at Chicago at sunrise July 20th 2008?
4. When was the temperature at Champaign IL over 80 degrees?
5. What is the current temperature at O’Hare airport?
6. How many heavy rains occurred within a week of a heat wave?
These queries cover the three main categories of temporal queries identified
by Yuan et al. [17], being those range queries (1,2), point queries (3,4,5), and
time-relationship queries (6). Although it is possible to readily represent this
information in RDF, the nature of RDF (where data are separated from their
semantics) implies that the special meaning of all time markers as an ordering
domain for related resources will be obscured from an underlying RDF
management system and query engine, making the queries above especially expensive
to perform.</p>
      <p>One way to ease this problem is by incorporating into the representation an
indication that certain resources are related by virtue of being different instances
1Chicago and Champaign are considered here simply as named concepts. The problem of
modeling geo-spatial semantics is out of the scope of this paper.
over time of the same entities, and that these instances are to be differentiated
(and indexed) by their ordered time markers. In our system, this is achieved
by means of a special RDF vocabulary, the data stream vocabulary (dsv), and
associated semantics, that allows a query engine to correctly interpret the time
markers as indexing keys.</p>
      <p>However, the use of this vocabulary introduces an extra overhead in storage
by increasing the number of triples that need to be employed for representing
the information. For this reason, we introduce an RDF extension, dubbed
TimeAnnotated RDF, where the additional semantic restrictions of the data stream
vocabulary are implicitly represented without the requirement for extra triples.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Time-Annotated RDF</title>
      <p>
        As reference and way of comparison with our model, we first briefly present some
RDF basic concepts (found in larger detail elsewhere [
        <xref ref-type="bibr" rid="ref10 ref8">7, 9</xref>
        ]). RDF represents
subsets of known information about a domain in RDF graphs, which are
collections of triples. A triple is a tuple of the form &lt; Subject; P redicate; Object &gt;
and asserts a known property of a resource or a relationship between two
resources, where Subject is a URI reference or a blank node, Predicate is a URI
reference, and Object is a URI reference, blank node or literal. URI references
name resources that can be directly identified by their URI. Additionally, RDF
allows the use of blank nodes. Syntactically, a blank node is a resource identified
by a local identifier instead of URI reference. Semantically, a blank node is an
existential variable that allows to express information about a resource without
identifying it.
      </p>
      <p>Time-annotated RDF (TA-RDF) is an extension of the RDF model where
resources are optionally annotated with a time value, i.e, a time-annotated
resource is a pair of the form resource[time], where resource 2 U RI Ref erences[
Blank N odes [ Literal and time 2 T = T [ fN ilg. The time domain T is a
discrete, totally ordered infinite set intended to represent discrete time and from
now on will be assumed to be the set of natural numbers. Intuitively, a
timeannotated resource r[t] represents the state of the resource r at the point t in
time. The time marker N il indicates a time invariant, used to express a property
of the resource that does not change. All literals are considered to be annotated
with a time of N il as it goes against the spirit of the model to consider that
literals change over time.</p>
      <p>Notice that this allows for easily expressing not only that the relationship
between entities change over time, but that the entities themselves change, and
that they might do so in different time scales.</p>
      <p>The resource r[ ], or the state of r for each instant, represents a data stream,
composed by the ordered sequence of frames r[t] for all t 2 T . For simplicity we
restrict the model to point timestamps on this paper. This definition of streams
accommodates for representing sequential or time-series data with arbitrary
metadata and payload data, frames sampled at different or irregular intervals,
missing data points, etc.</p>
      <p>The inspiration for time-annotating resources instead of triples comes from
the fact that RDF is often employed for metadata annotations of data stored
outside of RDF. These data or resources may be time varying which makes
useful to possess a mechanism to refer to them according to their position in
time.</p>
    </sec>
    <sec id="sec-4">
      <title>Model Theory</title>
      <p>
        TA-RDF Interpretation
The model theory for TA-RDF extends directly of that one for RDF, as defined
in RDF Semantics [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ]. This extension consists of modifying interpretations
to consider a time-annotated resource as a new resource, and triples as being
composed of these time-annotated resources. An RDF interpretation I of an
RDF graph is a function that maps URI references, literals and blank nodes in
V , the vocabulary of the graph, to objects in a set of resources IR. It also maps
triples in V V V to ftrue; f alseg.
      </p>
      <p>We extend an interpretation as follows; if T is the time domain, then the
interpretation of r[t] where r is a URI reference or literal and t 2 T , is I(r[t]) =
IS(&lt; r; t &gt;), with IS being a mapping from (U RI [L) T to IR that forms part
of the interpretation I. Mapping of blank nodes to URI references are extended
in the obvious way to map to (U RI [ L) T instead. All other aspects of
an interpretation remain as in an RDF interpretation, except for an additional
semantic restriction to indicate that N il represents all points in time for a given
resource. That restriction is formalized in the following definitions.
De nition 1 (Coverage of resources). Given two time-annotated resources
r1[t1] and r2[t2], we say r1[t1] covers r2[t2] (r1[t1] r2[t2]) iff I(r1) = I(r2) and
either t1 = t2 or t1 = N il.</p>
      <p>De nition 2 (Coverage of triples). Given two time-annotated triples triple1 =&lt;
s1[ts1]; p1[t1p]; o1[to1] &gt; and triple2 =&lt; s2[ts2]; p2[t2p]; o2[to2] &gt;, then triple1 covers
triples2 (triple1 triples2) iff s1[ts1] s2[ts2] ^ p1[t1p] p2[t2p] ^ o1[to1] o2[to2].
It is possible to translate between TA-RDF and RDF by explicit use of the data
stream vocabulary, presented at Table 1, as long as the additional semantic
restrictions are preserved. Semantically, two sets of restrictions are needed:
1. A property of a stream holds for all its frames:
(i)
(ii)</p>
      <p>I(&lt; s; p; o &gt;) ^ I(&lt; r; dsv:belongsTo; s &gt;) ) I(&lt; r; p; o &gt;)
I(&lt; r; p; o &gt;) ^ I(&lt; r; dsv:belongsTo; s &gt;) ^</p>
      <p>I(&lt; r; dsv:hasTimestamp; dsv:Nil &gt;) ) I(&lt; s; p; o &gt;)</p>
      <p>Analogous for &lt; s; s; o &gt; and &lt; s; p; s &gt;
2. All frames have a timestamp and are unique, in the sense that no frame
belongs to more than one stream, no frame has more than one timestamp and
no two frames of the same stream has the same timestamp:
(i)
(ii)
(iii)
(iv)
) I(s0) = I(s1)
) t0 = t1
I(&lt; r0; dsv : belongsT o; s0 &gt;) ) 9t 2 T [I(&lt; r0; dsv : hasT imestamp; t &gt;)]
(I(&lt; r0; dsv : belongsT o; s0 &gt;) ^ I(&lt; r0; dsv : belongsT o; s1))
(I(&lt; r0; dsv : hasT imestamp; t0 &gt;) ^ I(&lt; r0; dsv : hasT imestamp; t1 &gt;))
(I(&lt; r0; dsv : hasT imestamp; t &gt;) ^ I(&lt; r1; dsv : hasT imestamp; t &gt;) ^
I(&lt; r0; dsv : belongsT o; s &gt;) ^ I(&lt; r1; dsv : belongsT o; s))
) I(r0) = I(r1)</p>
      <p>De nition 3 (Syntactic Translation). Let GT A be a TA-RDF graph, and G an
RDF graph, then G is the translation of GT A (written as GT A G) iff
&lt; s[ts]; p[tp]; o[to] &gt;2 GT A , 9rs; rp; ro
[(&lt; rs; dsv:belongsTo; s &gt;2 G^ &lt; rs; dsv:hasTimestamp; ts &gt;2 G ^ rs 2 B)_
(ts = dsv:Nil ^ rs = s)] ^
(tp = dsv:Nil ^ rp = p)] ^
(to = dsv:Nil ^ ro = o)] ^
&lt; rs; rp; ro &gt;2 G
[(rp &lt; rp; dsv:belongsTo; p &gt;2 G^ &lt; rp; dsv:hasTimestamp; tp &gt;2 G ^ rp 2 B)_
[(ro &lt; ro; dsv:belongsTo; o &gt;2 G^ &lt; ro; dsv:hasTimestamp; to &gt;2 G ^ ro 2 B)_
Where B is the set of blank nodes. Notice that defining translation as a relation,
rather than a function, allows triples such as &lt; r[N il]; ; &gt; (omitting the
required translations for predicate and object), to be represented in RDF in
two different ways: &lt; r; ; &gt; and f&lt; r0; ; &gt;; &lt; r0; dsv : belongsT o; r &gt;; &lt;
r0; dsv : hasT imestamp; dsv : N il &gt;g.</p>
      <p>De nition 4 (Semantic Translation). Let I be an RDF interpretation and
IT A a TA-RDF interpretation. Then I is the translation of IT A (written as
IT A I) iff
IT A(&lt; s[ts]; p[tp]; o[to] &gt;) , 9rs; rp; ro
[(I(&lt; rs; dsv:belongsTo; s &gt;) ^ I(&lt; rs; dsv:hasTimestamp; ts &gt;))
_ (I(ts) = I(dsv:Nil) ^ I(rs) = I(s))] ^
_ (I(ts) = I(dsv:Nil) ^ I(rp) = I(p))] ^
_ (I(to) = I(dsv:Nil) ^ I(ro) = I(o))] ^</p>
      <p>I(&lt; rs; rp; ro &gt;)
[(I(&lt; rp; dsv:belongsTo; p &gt;) ^ I(&lt; rp; dsv:hasTimestamp; tp &gt;))
[(I(&lt; ro; dsv:belongsTo; o &gt;) ^ I(&lt; ro; dsv:hasTimestamp; to &gt;))
Lemma 1. All TA-RDF graphs have a translation. All valid RDF
interpretations and TA-RDF interpretations have a translation.</p>
      <p>The following theoretical results establish the translation process to be unique
up to simple entailment, reversible, and consistent between semantic translation
and syntactic translation. Proofs are omitted from this paper due to space
constraints.</p>
      <p>The first theorem establishes that the translation from TA-RDF to RDF is
unique up to simple entailment.</p>
      <p>Theorem 1. Let GT A be a TA-RDF graph and G1; G2 RDF graphs. Then
(GT A</p>
      <p>G1 ^ GT A</p>
      <p>G2) ) (G1 j= G2 ^ G2 j= G1)</p>
      <p>The translation from RDF to TA-RDF is also unique as shown by the
following theorem. Remember that in TA-RDF as in RDF, equality of graphs ignores
the renaming of blank nodes.</p>
      <p>Theorem 2. Let G1T A; G2T A be TA-RDF graphs and G1 an RDF graph. Then
(G1T A</p>
      <p>G1 ^ GT A
2</p>
      <p>G1) ) (G1T A = G2T A)</p>
      <p>Finally, the two following results guarantee that translated graphs are
semantically equivalent.</p>
      <p>Theorem 3. Let GT A; IT A be a TA-RDF graph and a TA-RDF interpretation,
respectively, and G; I be an RDF graph and an RDF interpretation, such that
GT A G and IT A I. Then IT A satisfies GT A iff I satisfies G.
Corollary 1. Let G1; G2 be RDF graphs and G1T A; G2T A TA-RDF graphs, such
that G1T A G1 and G2T A G2, then</p>
      <p>G1 j= G2 , GT A = GT A</p>
      <p>1 j 2</p>
      <p>The translation process for graphs requires the differentiation of resources
representing frames, and to guarantee the reversibility of the translation new
semantics restrictions have been imposed over RDF for the TA-RDF vocabulary.
However, in practical applications, constructing the translation of a TA-RDF
graph could create “new”, globally unused URI references for each frame,
eliminating the need for the additional semantic rules in the vocabulary regarding
uniqueness of frames, which increases the compatibility with existent RDF
storage and query systems.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Time-Annotated SPARQL</title>
      <p>
        Along with TA-RDF, we present a query language based on SPARQL [
        <xref ref-type="bibr" rid="ref15">14</xref>
        ],
a W3C recommendation for querying RDF. Like TA-RDF, Time-annotated
SPARQL (TA-SPARQL) is designed to stay as similar as possible to its
preexisting counterpart in syntax and semantics while introducing the capabilities
required to easily express the one-time time-dependent queries common when
handling streams and time-varying data. We present the language through
several examples before listing all the constructs that are new in TA-SPARQL vs
SPARQL. Using the example of Section 3, we write in TA-SPARQL some of the
queries introduced in that section:
      </p>
      <p>What was the total amount of rain fell on Chicago in January 2009?
SELECT sum(?rain)
WHERE {
&lt;urn:OHARE&gt; &lt;urn:hasRainSensor&gt;
?x["2009-01-01Z-06:00"^^xsd:date ..</p>
      <p>"2009-01-31Z-06:00"^^xsd:date] .
?x &lt;urn:hasReading&gt;</p>
      <p>?rain .</p>
      <p>What was the temperature on Chicago at sunrise of July 20th 2008?
}
}
}
}
SELECT ?temperature
WHERE {
&lt;urn:OHARE&gt; &lt;urn:hasTemperatureSensor&gt;</p>
      <p>?x["2008-07-20T05:34:00Z-06:00"^^xsd:dateTime] .
?x &lt;urn:hasReading&gt; ?temperature .</p>
      <p>SELECT ?x.t
WHERE {
&lt;urn:Champaign&gt; &lt;urn:hasSensor&gt; ?x[*] .
?x &lt;urn:hasTemperatureReading&gt; ?y .</p>
      <p>FILTER (?y&gt;=80)</p>
      <p>What is the current temperature at O’Hare?
SELECT ?temperature
WHERE {
&lt;urn:OHARE&gt; &lt;urn:hasTemperatureSensor&gt; ?x[LAST] .
?x &lt;urn:hasReading&gt; ?temperature .</p>
      <p>When was the temperature at Champaign, IL over 80 degrees?
How many heavy rains occurred within a week of a heat wave?
TA-SPARQL
f
f
f</p>
      <p>: : :
?x[t0] predicate object :
.
.</p>
      <p>.</p>
      <p>Continued on Next Page. . .</p>
      <p>SELECT count(?y)
WHERE {
{
&lt;urn:Champaign&gt; &lt;urn:hasSensor&gt; ?x[*] .
?x &lt;urn:hasTemperatureReading&gt; ?temp .</p>
      <p>FILTER(?temp &gt; 35)
}
{ &lt;urn:Champaign&gt; &lt;urn:hasSensor&gt; ?y[*] .</p>
      <p>?y &lt;urn:hasRainReading&gt; ?rain .</p>
      <p>FILTER (?rain &gt; 25 &amp;&amp; abs(?x.t - ?y.t) &lt; 7*86400000)
}
}</p>
      <p>Notice that times are converted to UTC in milliseconds for comparisons and
numeric operations (hence 7 days ! 7 days*86400000 ms/day).</p>
      <p>The semantics of TA-SPARQL queries are informally described in Table 2
by providing translations from TA-SPARQL queries (or segments of queries) to
equivalent versions in SPARQL, assuming the SPARQL queries are performed
over the translation of the TA-RDF graph. The translations in do not represent
the algebraic or physical operations required to evaluate the queries, they just
show a semantically equivalent form of the query in order to describe its
meaning. TA-SPARQL also provides a simple form of aggregation for functions SUM,
AVG, MIN, MAX and COUNT implicitly grouped by all the projected variables
not present in an aggregation function, meaning SU M (?y) is the summation of
variable ?y for all results in the query for which the remaining variables have
identical values.</p>
      <p>: : :
resource[t0] predicate object :
: : :</p>
      <p>: : :
resource[t0::t1] predicate object :
: : :
f resource[LAST ] : p:r:edicate object :</p>
      <p>: : :
Similarly for resource[F IRST ]</p>
      <p>SPARQL
Most recent frame in a stream
f : : :
?F predicate object :
?F dso : belongsT o resource :
?F dso : hasT imestamp ?F T :</p>
      <p>OP T IONALf
g ??FF 22 ddssoo :: bhealsoTnigmsTesotarmespou?rFce2 :T :</p>
      <p>F ILT ER(F T &lt; jF 2 T )
g
: : :</p>
      <p>F ILT ER(!BOUND(F 2 T ))
All frames in a stream
f : : :
?F predicate object :
?F dso : belongsT o resource :
gUNIONf
g resource predicate object :
g
f
f</p>
      <p>: : :
resource[ ] predicate object :
: : :</p>
      <p>: : :
resource[ ] &gt;?x predicate object :
.
.</p>
      <p>.</p>
      <p>F ILT ER(?x:t &gt;?z)</p>
      <p>: : :
Referring to the timestamp of a frame
f :?:x: predicate object :
?x dso : belongsT o resource :
?x dso : hasT imestamp ?x T :
: : :</p>
      <p>F ILT ER(?x T &gt;=?z)
g
Aggregation
g
g
g
SELECT ?x ?y SUM(?z)
W HERE : : :
Similarly for MIN; MAX; AV G; COUNT
Results grouped by (?x,?y), no SPARQL equivalent.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Example Implementation</title>
      <p>We developed an implementation of TA-RDF and TA-SPARQL based on
Tupelo semantic middleware2. Tupelo is an extensible semantic content repository
framework for the storage and retrieval of RDF data that transparently
interfaces RDF stores such as Sesame or Mulgara. The Semantic Data Stream
Manager (SDSM) is built on top of Tupelo to provide an interface and broker
between Tupelo and streams. Figure 4 shows the general architecture of the
system.</p>
      <p>The SDSM incorporates the semantics of TA-RDF on top of preexisting
RDF storage managers to allow for indexing of streams and time related data,
in particular the system is especially optimized to answer range and point queries
as those exemplified by queries 1 through 5 in Section 3. This is achieved by
indexing all time annotated resources by their timestamp and the stream to
which they belong. The time annotation index implicitly stores all the triples
associated with the data stream vocabulary, so there is no extra overhead in the
amount of triples stored by Tupelo.</p>
      <p>The SDSM defines a Java API for writing and querying stream data. An
application writes data to the SDSM, which extracts time stamps and other
metadata, and writes the data to the stream. The time stamps are indexed by
the SDSM, and other metadata written to Tupelo. Tupelo manages the data
2Tupelo. http://tupeloproject.ncsa.uiuc.edu
as a blobs, identified by unique URIs. TA-SPARQL queries are sent to the
SDSM, which resolves them. Other queries are sent to Tupelo. Data blobs
corresponding to frames of arbitrary data in a stream are grouped together in
pages in a user-defined granularity that can be set independently for any group
of incoming frames. Missing and out-of-sequence frames can still be handled and
indexed correctly, although the performance degrades as frames out of sequence
increase the segmentation of the pages.</p>
      <p>Tupelo allows the integration of data and metadata by handling not only
triples but also resources as blobs of data that can be independently stored and
retrieved. Time-annotating and indexing resources through TA-RDF on top of
Tupelo provides a mechanism to handle time-varying data that possess little
or no time-varying metadata, while still allowing the extensive use of metadata
when needed.</p>
      <p>
        Given the real-time nature of streaming data in many real world
applications, the data stream manager integrates high throughput streaming systems
like Open Source DataTurbine [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and publish/subscribe systems like Java
Messaging System (JMS). DataTurbine is a high performance streaming system that
allows consumers and producers to process data at different speeds without data
loss. DataTurbine also abstracts the exact nature of the possibly heterogeneous
data sources. SDSM allows fetching and posting data transparently through
these systems with the purpose of interfacing with a variety of real-time sources.
To read from DataTurbine, an application opens a connection via the SDSM,
and issues TA-SPARQL queries. The SDSM and Tupelo fetch the correct data
via DataTurbine. To post data to DataTurbine, the application writes data to
the SDSM, which creates the appropriate RDF metadata and posts the data to
DataTurbine.
      </p>
      <p>Similarly, JMS is Java standard for messaging systems that include facilities
for event-based communication through the publish/subscribe paradigm. This
interface allows client applications of SDSM to continuously fetch updates on
streams. Notice that this does not provide a general continuous query systems
as arbitrary TA-SPAQRL queries are not allowed through this interface.</p>
      <p>In addition, Open Geospatial Consortium (OGC) Sensor Web Enablement
(SWE) standards have seen great adoption among many sensor network
communities. We provide a web interface that returns the supported temporal queries
in SWE Observation and Measurement XML-compliant format, which allows
interoperability with other SWE-compliant tools.</p>
      <p>The TA-SPAQRL implementation is a proof of concept implementation that
does not incorporate query optimization. Therefore we do not provide
experimental results on the performance of query evaluation. However, Tupelo
provides programmatic access to RDF (and TA-RDF) data, which makes the
implementation completely functional.</p>
      <p>
        This system can easily be integrated into workflow managers as an
input/output mechanism for data streams and time series data, for readily
access to these data independently of the possibly heterogeneous data sources
and linking to their corresponding metadata. We have incorporated the system
into Cyberintegrator [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a scientific workflow engine, highly extensible through
the use of user-provided workflow tasks, that maintains provenance of all data
employed by extensive use of semantic annotations in RDF.
      </p>
      <p>
        An early version of the SDSM has already been deployed and is currently
in use in the context of the NCSA digital watersheds [
        <xref ref-type="bibr" rid="ref12 ref13">11, 12</xref>
        ] project, which
aims to process NEXRAD radar data in almost real time, and make it available
through streams of virtual sensors which present data obtained by processing
the original streams, all the while maintaining provenance of this processes. It
has also being integrated into other semantic technologies developed at NCSA,
such as a semantic data center in a Cybercollaboratory to allow the exploration
of data streams. These applications require performing simple range queries
over millions of frames in time frames acceptable for web applications.
8
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>In this paper we have introduced a system for the storage, indexing and querying
of time-varying data in the context of semantic web technologies, and
accompanied it with a formal model theory that makes explicit the compatibility and
differences with RDF, the standard for semantic data in the web. Although
they are general enough for the representation of any time-varying data, both
the model and the system are especially tailored for the management of
streaming data, such as it is typically found in sensor networks. Combining data of
arbitrary nature, be it text, numeric, still images, arbitrary binary data, etc.,
with metadata in a single, integrated system allows easy access to the data while
leveraging semantic representations for the purpose of maintaining and
providing provenance of said data, a necessary condition to ensure interoperability,
particularly among scientific communities.</p>
      <p>We have also provided a query language, TA-SPARQL, based on the
standard query language for the Semantic Web, SPARQL, to facilitate expressing
typical time-related queries in a natural way for users familiar with SPARQL.
TA-SPARQL goes beyond other query languages for streams in RDF presented
previously and based on sliding windows, and allows to exploit the richer
temporal relationships representable by the semantic model.</p>
      <p>The TA-RDF and TA-SPARQL have been implemented in a prototype
service, integrated with the NCSA Tupelo semantic middleware. This has enabled
integration of stream data with other data and metadata, including documents,
imagery, workflows, provenance, and annotations.
9</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This material is based upon work supported by the National Science
Foundation (NSF) under Award No. BES-0414259, BES-0533513, and SCI-0525308
and the Office of Naval Research (ONR) under award No. N00014-04-1-0437.
Any opinions, findings, and conclusions or recommendations expressed in this
publication are those of the author(s) and do not necessarily reflect the views
of NSF and ONR.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Babu</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          .
          <article-title>Continuous queries over data streams</article-title>
          .
          <source>ACM Sigmod Record</source>
          ,
          <volume>30</volume>
          (
          <issue>3</issue>
          ):
          <fpage>109</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bajcsy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Minsker</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Myers.</surname>
          </string-name>
          <article-title>CyberIntegrator: A meta-workflow system designed for solving complex scientific problems using heterogeneous tools</article-title>
          .
          <source>In The Geoinformatics Conference</source>
          , pages
          <fpage>10</fpage>
          -
          <lpage>12</lpage>
          , May
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bolles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grawunder</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Jacobi. Streaming SPARQL - Extending</surname>
          </string-name>
          <string-name>
            <surname>SPARQL</surname>
          </string-name>
          <article-title>to process data streams</article-title>
          .
          <source>Lecture Notes in Computer Science</source>
          ,
          <volume>5021</volume>
          :
          <fpage>448</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Buraga</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciobanu</surname>
          </string-name>
          .
          <article-title>A RDF-based model for expressing spatiotemporal relations between web sites</article-title>
          .
          <source>In Proceedings of the Third International Conference on Web Information Systems Engineering</source>
          <year>2002</year>
          , pages
          <fpage>355</fpage>
          -
          <lpage>361</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fountain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tilak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hubbard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Freudinger</surname>
          </string-name>
          . The Open Source DataTurbine Initiative:
          <article-title>Streaming data middleware for environmental observing systems</article-title>
          .
          <source>In International Symposium on Remote Sensing of Environment</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaisman</surname>
          </string-name>
          .
          <article-title>Introducing time into RDF</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>19</volume>
          (
          <issue>2</issue>
          ):
          <fpage>207</fpage>
          -
          <lpage>218</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>Proc. Semantic Sensor Networks</source>
          <year>2009</year>
          , page
          <issue>94</issue>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hayes</surname>
          </string-name>
          . RDF semantics. http://www.w3.org/TR/rdf-mt/,
          <year>February 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Henson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Neuhaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thirunarayan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Buyya</surname>
          </string-name>
          .
          <article-title>An ontological representation of time series observations on the Semantic Sensor Web</article-title>
          .
          <source>In 1st International Workshop on the Semantic Sensor Web</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Klyne</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Carroll</surname>
          </string-name>
          .
          <article-title>Resource description framework (RDF): Concepts and abstract syntax</article-title>
          . http://www.w3.org/TR/rdf-concepts/,
          <year>February 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cameron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xie</surname>
          </string-name>
          , and
          <string-name>
            <surname>B. Arpinar.</surname>
          </string-name>
          <article-title>ES3N: A semantic approach to data management in sensor networks</article-title>
          .
          <source>In Semantic Sensor Networks Workshop</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Marini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kooper</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Rodr´ıguez,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Myers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Minsker</surname>
          </string-name>
          .
          <article-title>Virtual sensors in a Web 2.0 virtual watershed</article-title>
          .
          <source>In IEEE Fourth International Conference on eScience</source>
          , pages
          <fpage>386</fpage>
          -
          <lpage>387</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hill</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Rodr´ıguez, L. Marini,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Myers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Minsker</surname>
          </string-name>
          .
          <article-title>A new framework for on-demand virtualization, repurposing and fusion of heterogeneous sensors</article-title>
          .
          <source>In Sensor Web Enablement workshop 2009</source>
          , The 2009
          <source>International Symposium on Collaborative Technologies and Systems</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Perry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hakimpour</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>Analyzing theme, space, and time: an ontology-based approach</article-title>
          .
          <source>Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems</source>
          , pages
          <fpage>147</fpage>
          -
          <lpage>154</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          .
          <article-title>SPARQL query language for RDF</article-title>
          . http://www.w3.org/TR/rdf-sparql-query/,
          <year>January 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pugliese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Udrea</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Subrahmanian</surname>
          </string-name>
          .
          <article-title>Scaling RDF with time</article-title>
          .
          <source>In 17th International World Wide Web Conference</source>
          , pages
          <fpage>605</fpage>
          -
          <lpage>614</lpage>
          , Beijing, China,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Reddy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fulkerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          , U. Park,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hansen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Heidemann</surname>
          </string-name>
          .
          <article-title>Sensor-Internet share and search: Enabling collaboration of citizen scientists</article-title>
          .
          <source>In Proceedings of the ACM Workshop on Data Sharing and Interoperability on the World-wide Sensor Web</source>
          , pages
          <fpage>11</fpage>
          -
          <lpage>16</lpage>
          , Cambridge, Mass,
          <year>April 2007</year>
          . ACM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>