<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>K eywords: Provenance Management Framework</institution>
          ,
          <addr-line>provenir ontology, Provenance, Lineage, Linked Data, Semantic Sensor Web, Sensor Data, Sensor Web Enablement, Dataset Generation, Resource Description Framework, RDF</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>A bstr act. 3QHURDFWKYP)IGZ³ provenir ULE´GVFHWKJQDRO ory of a data entity. Provenance is critical information in the sensors domain to identify a sensor and analyze the observation data over time and geographical space. In this paper, we present a framework to model and query the provenance information associated with the sensor data exposed as part of the Web of Data using the Linked Open Data conventions. This is accomplished by developing an ontology-driven provenance management infrastructure that includes a representation model and query infrastructure. This provenance infrastructure, called Sensor Provenance Management System (PMS), is underpinned by a domain specific provenance ontology called Sensor Provenance (SP) ontology. The SP ontology extends the Provenir upper level provenance ontology to model domain-specific provenance in the sensor domain. In this paper, we describe the implementation of the Sensor PMS for provenance tracking in the Linked Sensor Data.</p>
      </abstract>
      <kwd-group>
        <kwd>5*0/2/$+$'12*)2#</kwd>
        <kwd>'10467)2#'-8+2*82'"*9'</kwd>
        <kwd>*</kwd>
        <kwd>+*22#+*</kwd>
        <kwd>'&lt;26"#)42*)</kwd>
        <kwd>'=#+</kwd>
        <kwd>%)'-)")2'&gt;*+?2#$+)</kwd>
        <kwd />
        <kwd>' &lt;"</kwd>
        <kwd>)0*</kwd>
        <kwd>'@!'ABACB'&gt;-3' !"#$%"#&amp;'(%#)*#'(+</kwd>
        <kwd>$*'(#-</kwd>
        <kwd>)/012</kwd>
        <kwd>3%</kwd>
        <kwd>%4</kwd>
        <kwd>$5(</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This is an example of a sensor discovery query. Sensor discovery has been identified
as a top-priority use case by the W3C Semantic Sensor Network Incubator Group2,
which is tasked with development of sensor ontology. In the sensors domain, the
capabilities of the sensor, observation location (spatial parameter), time of
observation (temporal parameter), and phenomenon measurement (domain parameter) are
important to answer discovery queries. This data related to the sensor is the
provenance metadata about the sensor. Provenance describes the history or the lineage of an
entity and is GHULY URIP HWK F)UHQK RUGZ ³ provenir ´ HDQPLJ W³R HFRP U´RIP .
Provenance informatioQHDEOV DSOLFWRQ QDHUVZKW D´³Z KHU´ ³Z\
KR³Z´LFKH³Z´QGDRK´³ZHUTLVXRWUDFHOX\LWSQGDSURFHVGDW
entities.</p>
      <p>
        Provenance has been studied from multiple perspectives, including (a) workflow
provenance and (b) database provenance as discussed in Tan [1]. Workflow
proveHFDQ UHSVWQ WH³K HWLUQ WRLVUK\ RI WHK GDWRHUYLQ RI WHK DOILQ RWSX R´I [1] a
workflow. Davidson et al. [2] addresses issues related to provenance in workflow
systems. In contrast, database provenance refers to the process of tracing and
recording the origins of data and its movement between databases [
        <xref ref-type="bibr" rid="ref1">3</xref>
        ]. In Sahoo et al. [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ], we
introduced the notion of semantic provenance to define provenance information that
incorporates domain semantics to closely reflect the knowledge of an application
domain.
      </p>
      <p>In this paper, we use the observations from the 20,000 sensors within the United
States (Figure 1) in the context of a blizzard as a running example.</p>
      <p>F ig.1.The distribution of 20,000 Sensors constituting the Semantic Sensor Web (SensorMap</p>
      <p>
        Image [
        <xref ref-type="bibr" rid="ref3">5</xref>
        ])
We use the definition of a blizzard provided by the NOAA3, which describes it as:
B L I Z Z A R D = High WindSpeed (exceeding 35 mph) AND Snow
Precipitation AND Low Visibility (less than ! mile), for at minimum 3 hours.
      </p>
      <p>
        F ig.2. Blizzard Composition
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
"!#$$%&amp;''((()(*)+,-'"../'012345$+,'661'(787'9571:;5-&lt;!!
*!#$$%&amp;''((()1+55)-+='!!
A blizzard exists if the above conditions hold true for at least 3 hours within some
geospatial region. Hence, the provenance of sensor observations describing the
geospatial information of the sensors that record the observations, the time stamp of the
observations, and the attributes of the sensor itself (for example, a motion sensor is
not useful in context of a blizzard) are important for a sensor discovery query.
With a view of capturing the provenance information related to a sensor, the main
objective of this paper is to implement a Sensor Provenance Management System
(Sensor PMS). In this paper, we describe the creation of this infrastructure using the
theoretical underpinning of the Provenance Management Framework (PMF) [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ]. The
key contributions of the paper are described below:
1. Implementing Sensor PMS to track provenance in the linked sensor data
2. Developing a domain specific ontology for Sensor PMS called Sensor
Provenance (SP) ontology. The SP ontology uses concepts within the Provenir upper
level ontology defined in PMF [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ] to add provenance information within the
sensors domain.
3. An evaluation of the Sensor PMS capabilities to answer provenance queries over
the sensor datasets generated is provided.
      </p>
      <p>The rest of the paper is organized as follows: Section 2 discusses background
concepts. In section 3, we describe current infrastructure for generating sensor datasets
and section 4 discusses the sensor datasets generated. Section 5 integrates the current
infrastructure described in section 3 with the provenance management system and
describes the architecture of Sensor PMS. Section 6 introduces the SP ontology and
section 7 discusses the kind of queries that can be answered with the help of
provenance information. Section 8 gives related work and section 9 concludes with
summary and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. B ac kground</title>
      <p>
        !
In this section, we describe the resources used in our work including the Sensor
ontology and the Linked Open Data initiative.
2.1 O ntology M odel of Sensor Data ± In computer science and information science,
ontology is a formal representation of the knowledge by a set of concepts within a
domain and the relationships between those concepts. It is used to reason about the
properties of that domain, and may be used to describe the domain. [
        <xref ref-type="bibr" rid="ref4">6</xref>
        ] Our sensors
ontology uses the concepts within the O&amp;M standard to define sensor observations.
Within the O&amp;M standard, an observation (om: Observation) is defined as an act of
observing a property or phenomenon, with the goal of producing an estimate of the
value of the property, and a feature (om: F eature ) is defined as an abstraction of real
world phenomenon. (Note: om is used as a prefix for Observations and
Measurements). The major properties of an observation include feature of interest
(om:featureOfInterest ), observed property (om:observedProperty), sampling time
(om:samplingTime ), result (om:result ), and procedure (om:procedure ). Often these
properties can be complex entities that may be defined in an external document. For
example, om: F eatureOfInterest could refer to any real-world entity such as a
coverage region, vehicle, or weather-storm, and om:Procedure often refers to a sensor or
system of sensors defined within a SensorML4 document. Therefore, these properties
are better described as relationships of an observation. Concepts described above and
their relationships within the sensor ontology can be found in figure 2. The Sensor
ontology can be found at [
        <xref ref-type="bibr" rid="ref5">7</xref>
        ]. Section 5 extends the Sensor Ontology with provenance
related concepts found in the Provenir upper level ontology defined in the Provenance
Management Framework (PMF) [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ].
      </p>
      <p>!</p>
      <p>
        F ig.2. Concepts and their relationships within the Sensor Ontology
!
2.2 Semantic W eb ± The Semantic Web is an evolving development of the World
Wide Web5 derived from the World Wide Web consortium (W3C)6 in which the
meaning of information and services on the web is defined, making it possible for the
web to understand and satisfy the request of people and machines that use the web
content. [
        <xref ref-type="bibr" rid="ref6">8</xref>
        ] Resource Description Framework (RDF) is a publishing language within
the Semantic Web, specially designed for data. RDF has now come to be used as a
general method for conceptual description or modeling of information that is
implemented in web resources, using a variety of syntax formats. [
        <xref ref-type="bibr" rid="ref7">9</xref>
        ]. It is also a standard
model for data interchange on the web. [
        <xref ref-type="bibr" rid="ref8">10</xref>
        ] SPARQL7 is a protocol and a query
language for semantic web data sources. [
        <xref ref-type="bibr" rid="ref6">8</xref>
        ] In its usage, SPARQL is a
syntacticallySQL-like language for querying RDF graphs. [
        <xref ref-type="bibr" rid="ref9">11</xref>
        ] Since Semantic Web is not just
about putting data on the web but also linking the data, Linked Data is used to connect
the Semantic Web8. Wikipedia defines Linked Data as "a term used to describe a
recommended best practice for exposing, sharing, and connecting pieces of data,
information, and knowledge on the Semantic Web using URIs and RD F ." [
        <xref ref-type="bibr" rid="ref10">12</xref>
        ] Linked
Data is a large and growing collection of interlinked public datasets encoded in RDF
spanning diverse areas such as: life sciences, nature, science, geography and
entertainment.
      </p>
      <p>
        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
"!#$$%&amp;''((()*%+,-+*.%/$0/1)*2-'.$/,3/23.'.+,.*241!!
5!#$$%&amp;''+,)(060%+30/)*2-'(060'7*2138703+87+9!!
:!#$$%&amp;''((()(;)*2-'!
&lt;!http://www.w3.org/TR/rdf-sparql-query/ !
=!#$$%&amp;''((()(;)*2-'&gt;+.0-,?..@+.'A0,6+3&gt;/$/)#$41!!
3. C u r rent I nf r ast r uctu r e
!
The lifespan of sensor data starts as observable properties of objects and events in the
real-world which are detected by sensors through observation. These observation
values are then encoded in several formats of varying degrees of expressivity, as
needed by applications that may utilize the data. The data generation workflow is
comprised of four main parts, as shown in figure 3. The workflow begins with sensors
deployed across the United States measuring environmental phenomena. Observations
generated from these sensors are aggregated at MesoWest [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ] which provides access
to past sensor observations encoded as comma separated numerical values. These
sensor observations are then converted to Observations and Measurements (O&amp;M).
O&amp;M is an encoding standard and a technical framework that defines an abstract
model and an XML schema encoding for sensor descriptions and observations. It is
one of OGC9 Sensor Web Enablement (SWE)10 suite of standards that is widely
accepted within the sensors community for encoding sensor observations. [
        <xref ref-type="bibr" rid="ref12">14</xref>
        ] In
order to add semantics to the sensor descriptions and observations the O&amp;M is
converted to RDF. O&amp;M is converted to RDF using the O &amp;M2RD F -Converter API
described in [
        <xref ref-type="bibr" rid="ref13">15</xref>
        ]. Two RDF datasets, !"#$%&amp;'%#()*+,-,. "#$! !"#$%&amp;/0(%*1,-")#+,-,.
%&amp;#'"(#(#)! &amp;*+,! "! -(..(&amp;#! ',(/.+0. 1+,+! )+#+,"'+$2! 34+! $"'"0+'0! ",+! $+0%,(-+$! (#! '4+!
0+%'(&amp;#!52 The RDF generated is then stored in a Virtuoso RDF knowledgebase [
        <xref ref-type="bibr" rid="ref14">16</xref>
        ].
The RDF datasets are made available on the Linked Open Data Cloud to provide
public access. The data generation workflow is the main component of the
Provenance Capture phase discussed in Section 5.
      </p>
      <p>
        F ig.3. Data Generation Workflow. The O&amp;M to RDF conversion (dotted portion) forms the
main part of the workflow that uses the O &amp;M2RD F-C ONVERTER API .
3.1 Phase 1 ± The first phase is comprised of querying MesoWest [
        <xref ref-type="bibr" rid="ref11">13</xref>
        ] for
observational data and parsing the result. MesoWest provides a service to access past sensor
data and returns an HTML page with the observational values encoded within a
comma-separated list. The resulting HTML page is then parsed to extract the sensor
observations.
3.2 Phase 2 ± The second phase consists of converting the raw textual data retrieved
from MesoWest into O&amp;M. The sensor observations parsed from the HTML page in
phase 1 are fed to an XML parser. We used the SAX (Simple API for XML) parser11
to generate the O&amp;M. Here we also query GeoNames [
        <xref ref-type="bibr" rid="ref15">17</xref>
        ] with the sensor coordinates
to get GeoNames location that is closest to the sensor. The O&amp;M generated in this
phase is the input for the O &amp;M2RD F -Converter API .
      </p>
      <p>
        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
6!4''/7881112&amp;/+#)+&amp;0/"'(".2&amp;,)8!
9:!4''/7881112&amp;/+#)+&amp;0/"'(".2&amp;,)8/,&amp;;+%'08),&amp;&lt;/080+#0&amp;,1+-!!
99!4''/78811120"=/,&amp;;+%'2&amp;,)8!!
!
3.3 Phase 3 ± The third phase consists of converting sensor observations encoded in
O&amp;M to RDF. Since both O&amp;M and RDF have XML syntax, XSLT is used to
convert O&amp;M to RDF. XSLT is a language for transforming XML documents into other
XML documents [
        <xref ref-type="bibr" rid="ref16">18</xref>
        ]. The XSLT performs the conversion for our O &amp;M2RD F
Converter API.
3.4 Phase 4 - The fourth phase consists of storing the RDF in Virtuoso RDF store.
Virtuoso RDF is a native triple store available in both open source and commercial
licenses. It provides command line loaders, a connection API, support for SPARQL
and web server to perform SPARQL queries and uploading of data over HTTP. It has
been tested to scale up to a billion triple. A more detailed description of the data
generation workflow can be found in [
        <xref ref-type="bibr" rid="ref13">15</xref>
        ].
!
"# $%&amp;'()!*+,+'%,!*%'-)./,.(&amp;!
!
!"#$%&amp;'&amp;$(#)#*&amp;'+,)$-,*./0,-$%#12*+3#%$+)$1#2'+,)$4$0#&amp;%$',$'"#$(#)#*&amp;'+,)$,/$5$678$
%&amp;'&amp;1#'1$ !"#$%&amp;'%#()*+,-,$ &amp;)%$ !"#$%&amp;./(%*0,-")#+,-,1 2,)'&amp;+)+)($ ,9#*$ &amp;$ 3+00+,)$
'*+:0#1$%#12*+3#%$+)$%#'&amp;+0$3#0,-21
1
"#0 1.&amp;2%3!$%&amp;'()!*+,+!;$&lt;+).#%=#)1,*7&amp;'&amp;$+1$&amp;)$678$%&amp;'&amp;1#'$2,)'&amp;+)+)($#&gt;:*#1;
1+9#$%#12*+:'+,)1$,/$?5@A@@@$-#&amp;'"#*$1'&amp;'+,)1$+)$'"#$B)+'#%$='&amp;'#1C$!"#$%&amp;'&amp;$,*+(+;
)&amp;'#%$&amp;'$D#1,E#1'A$&amp;$ :*,F#2'$-+'"+)$'"#$ 7#:&amp;*'G#)'$,/$D#'#,*,0,(H$&amp;'$'"#$ B)+;
9#*1+'H$,/$B'&amp;"$'"&amp;'$"&amp;1$3##)$&amp;((*#(&amp;'+)($-#&amp;'"#*$%&amp;'&amp;$1+)2#$5@@5C$IJ4K$L)$&amp;9#*;
&amp;(#A$ '"#*#$ &amp;*#$ /+9#$ 1#)1,*1$ :#*$ -#&amp;'"#*$ 1'&amp;'+,)$ G#&amp;1M*+)($ :"#),G#)&amp;$ 1M2"$ &amp;1$
'#G:#*&amp;'M*#A$9+1+3+0+'HA$:*#2+:+'&amp;'+,)A$:*#11M*#A$-+)%$1:##%A$"MG+%+'HA$#'2C$N)$&amp;%%+;
'+,)$',$0,2&amp;'+,)$&amp;''*+3M'#1$1M2"$&amp;1$0&amp;'+'M%#A$0,)(+'M%#A$&amp;)%$#0#9&amp;'+,)A$'"#*#$&amp;*#$0+).1$
',$ 0,2&amp;'+,)1$ +)$ O#,)&amp;G#1$ IJPK$ )#&amp;*$ '"#$ -#&amp;'"#*$ 1'&amp;'+,)C$ !"#$ %+1'&amp;)2#$ /*,G$ '"#$
O#,)&amp;G#1$0,2&amp;'+,)$',$'"#$-#&amp;'"#*$1'&amp;'+,)$+1$&amp;01,$:*,9+%#%C$!"#$%&amp;'&amp;$1#'$&amp;01,$2,);
'&amp;+)1$ 0+).1$ ',$ '"#$ G,1'$ 2M**#)'$ ,31#*9&amp;'+,)$ /,*$ #&amp;2"$ -#&amp;'"#*$ 1'&amp;'+,)$ :*,9+%#%$ 3H$
D#1,E#1'$IJ4KC$!"+1$1#)1,*1$%#12*+:'+,)$%&amp;'&amp;1#'$+1$),-$:&amp;*'$,/$'"#$&lt;L7C$
"#4 1.&amp;2%3! 56'%)7+,.(&amp;! *+,+! ;$ &lt;+).#%L31#*9&amp;'+,)7&amp;'&amp;$ +1$ &amp;)$ 678$ %&amp;'&amp;1#'$ 2,);
'&amp;+)+)($ #&gt;:*#11+9#$ %#12*+:'+,)1$ ,/$ "M**+2&amp;)#$ &amp;)%$ 30+QQ&amp;*%$ ,31#*9&amp;'+,)1$ +)$ '"#$
B)+'#%$='&amp;'#1C$!"#$%&amp;'&amp;$&amp;(&amp;+)$,*+(+)&amp;'#%$&amp;'$D#1,E#1'C$IJ4K$!"#$,31#*9&amp;'+,)1$2,0;
0#2'#%$+)20M%#$G#&amp;1M*#G#)'1$,/$:"#),G#)&amp;$1M2"$&amp;1$'#G:#*&amp;'M*#A$9+1+3+0+'HA$:*#2+;
SLWDRQUHVXGLQZSVH LGXWPK\FH 7KHDWUZVLRQ¶EDWHUY
&amp;01,$+)20M%#$'"#$M)+'$,/$G#&amp;1M*#G#)'$/,*$#&amp;2"$,/$'"#1#$:"#),G#)&amp;$&amp;1$-#00$&amp;1$'"#$
'+G#$+)1'&amp;)'$&amp;'$-"+2"$'"#$G#&amp;1M*#G#)'1$-#*#$'&amp;.#)C$!"#$%&amp;'&amp;1#'$+)20M%#1$,31#*9&amp;;
'+,)1$ -+'"+)$ '"#$ #)'+*#$ B)+'#%$ ='&amp;'#1$ %M*+)($ '"#$ '+G#$ :#*+,%1$ '"&amp;'$ 1#9#*&amp;0$ G&amp;F,*$
1',*G1$ -#*#$ &amp;2'+9#$ ;;$ +)20M%+)($ RM**+2&amp;)#$ S&amp;'*+)&amp;A$ N.#A$ T+00A$ T#*'"&amp;A$ E+0G&amp;A$
U"&amp;*0#HA$OM1'&amp;9A$&amp;)%$&amp;$G&amp;F,*$30+QQ&amp;*%$+)$V#9&amp;%&amp;$+)$5@@5C$!"#1#$,31#*9&amp;'+,)1$&amp;*#$
(#)#*&amp;'#%$ 3H$ -#&amp;'"#*$ 1'&amp;'+,)1$ %#12*+3#%$ +)$ '"#$ &lt;+).#%=#)1,*7&amp;'&amp;$ %&amp;'&amp;1#'$ +)'*,;
%M2#%$&amp;3,9#C$UM**#)'0HA$'"+1$%&amp;'&amp;1#'$2,)'&amp;+)1$G,*#$'"&amp;)$&amp;$3+00+,)$'*+:0#1C$!"#$678$
%&amp;'&amp;1#'$ /,*$ #&amp;2"$ ,/$ '"#$ &amp;3,9#$ 1',*G1$ +1$ &amp;9&amp;+0&amp;30#$ /,*$ %,-)0,&amp;%$ +)$ (Q+:$ /,*G&amp;'$ &amp;'$
IJWKC$!"#$1'&amp;'+1'+21$/,*$#&amp;2"$,/$'"#$1',*G1$2&amp;)$&amp;01,$3#$/,M)%$&amp;'$IJWK$
      </p>
      <sec id="sec-2-1">
        <title>5. Sensor Provenance M anagement System</title>
        <p>
          !
The Sensor PMS infrastructure uses the data generation workflow described above
(section 3) and addresses three aspects of provenance management as identified by
[
          <xref ref-type="bibr" rid="ref18">20</xref>
          ]. See Figure 4 for an architecture of Sensor PMS.
"#$%&amp;%!"#$%&amp;'"()#')*&amp;#$+,$)"#$!"#$%&amp;'-./$%00&amp;#11(23$
!"&amp;##$%14#')1$+,$4&amp;+5#2%2'#$6%2%3#6#2)$
!
1. Provenance C apture ± The provenance information associated with the
sensor is captured within the data workflow as described in section 3. The
time related information (temporal parameter) is obtained from MesoWest
[
          <xref ref-type="bibr" rid="ref11">13</xref>
          ] and location related information (spatial parameter) is obtained by
querying GeoNames [
          <xref ref-type="bibr" rid="ref15">17</xref>
          ] with the sensor coordinates.
2. Provenance Representation ± The Sensor Provenance ontology (SP) is
used to model the provenance information related to the sensor. The SP
ontology extends the Provenir upper level provenance ontology defined in PMF
[
          <xref ref-type="bibr" rid="ref2">4</xref>
          ] to support interoperability with provenance ontology in different
domains.
3. Provenance Storage ± The provenance information is stored in the Virtuoso
RDF store. Virtuoso RDF is an open source triple store provided by
OpenLink Software.[
          <xref ref-type="bibr" rid="ref14">16</xref>
          ] The Virtuoso RDF store currently contains over a billion
triples of sensor observational data. Virtuoso RDF provides a SPARQL
endpoint to query these dataset discussed in section 4, which can be found at
[
          <xref ref-type="bibr" rid="ref19">21</xref>
          ]. More information about querying the dataset can be found at [
          <xref ref-type="bibr" rid="ref17">19</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>6. Sensor Provenance O ntology</title>
        <p>
          !
In this section we discuss the Sensor Provenance Ontology that forms the key
component of the Sensor PMS. As discussed above, provenance information includes the
location of the sensor, the time when the observations were taken by the sensor and
the sensor observation values. Since SP ontology extends the provenir ontology, we
discuss the provenir ontology in section 6.1 followed by SP ontology in section 6.2
6.1 Provenir O ntology - Provenir ontology is a common provenance model which
forms the core component of the provenance management framework. [
          <xref ref-type="bibr" rid="ref2">4</xref>
          ] This
modular framework forms a scalable and flexible approach to provenance modeling that
can be adapted to the specific requirement of different domains. Use of Provenir
ontology as the reference model to built domain-specific provenance ontologies ensures
(a) common modeling approach, (b) conceptual clarity of provenance terms, and (c)
use of design patterns for consistent provenance modeling
        </p>
        <p>
          F ig.5. Provenir Upper Level Ontology [
          <xref ref-type="bibr" rid="ref2">4</xref>
          ]
The ontology defines three base classes data , agent and process using the well
defined, primitive concepts of occurent and continuant . [
          <xref ref-type="bibr" rid="ref20">22</xref>
          ] Continuant is defined as
HWL³QVKFZHUGXQRHWLXQFR[HWLVUJRKXWLHPKOZGRHUQLJXWI
RVUW RI FDHKJQV LFOQGXJ FDQHVKJ RI SODFH´ &gt;
HW³Q
22] while Occurrent is defined as
iWLHV DWK RQOGIX HOVWKPY QL VFHXLY SRWHUDOP SK´DVH . [
          <xref ref-type="bibr" rid="ref20">22</xref>
          ]. The two base
classes, data and agent are defined as specialization (sub-class) of continuant class
while the third base class process is a synonym of occurent . The data class has two
sub-classes, data_collection -- that represents the datasets that undergo modification
during an experiment -- and parameter -- that influences the execution of an
experiment. The parameter class has three sub-classes representing the spatial, temporal,
and thematic (domain-specific) dimensions, namely spatial_parameter ,
temporal_parameter , and domain_parameter. Instead of defining a new set of properties, the
ontology reuses and adapts properties defined in the Relation ontology (RO)12 from
the Open Biomedical Ontologies (OBO) Foundry13 such as part_of , contained_in,
preceded_by, and has_participant. The Provenir ontology is defined using
OWLDL14 that is complaint with the DL profile of OWL215, with an expressivity of!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
"#!$%%&amp;'(()))*+,+-+./012*+13(1+(!!
"4!$%%&amp;'(()))*+,+-+./012*+13(!!
"5!$%%&amp;'(()))*)4*+13(67(+)89-:;%.1:&lt;(!!
!"#$!" further details of the ontology can be found at [
          <xref ref-type="bibr" rid="ref21">23</xref>
          ]. Figure 5 shows the
Provenir ontology schema obtained from [
          <xref ref-type="bibr" rid="ref2">4</xref>
          ].
5.2 Sensor O ntology - E xtending P rovenir O ntology
The Provenir ontology has been extended to create the Sensor ontology that models
the domain-specific provenance information for the sensor domain. The Sensor
ontology extends the relevant Provenir ontology terms using the rdfs:subClassO f and
rdfs:subPropertyOf relationships to create appropriate classes and properties. For
example, the sensor:ResultData (representing the observation value) is a subclass of
provenir:data_collection , the sensor:Location class (representing the geographical
location) is defined as a subclass of provenir:spatial_parameter . Similarly,
sensor:samplingTime is defined as a subproperty of provenir:has_temporal_value .
The sensor ontology has been defined in OWL-DL and consists of 89 classes, 53
properties with a DL expressivity of" !"%$&amp;'()*+#" By extending the Provenir
ontology, the sensor ontology ensures coherent modeling of concepts, consistent use
of provenance terminology, and compatibility with other existing domain-specific
provenance ontologies. For example, the Trident ontology extends the Provenir
ontology to model provenance information in the Neptune oceanography project [
          <xref ref-type="bibr" rid="ref22">24</xref>
          ]. In
the next section, we describe the queries that utilize the provenance information
modeled in the sensor ontology.
!
7. P rovenance Q ue r ies
Two classes of Provenance queries have been categorized by PMF [
          <xref ref-type="bibr" rid="ref2">4</xref>
          ]. Corresponding
queries in the sensors domain that could not be answered without provenance
information have been provided.
        </p>
        <p>Q uery for provenance metadata: Given a data entity, this category of queries
returns the complete set of provenance information associated with a data entity.
E xample: ³ Given an observation value, give me the provenance information
about the all the sensors that recorded this observation ´</p>
        <p>SELECT ?sensor ?ID ?geonamesLocation ?geonamesDistance
?geonamesDistanceMeasure ?latitude ?longitude
?observedProperty ?XSDTime
WHERE
{?sensor om-owl:generatedObservation ?generatedObservation .
?generatedObservation om-owl:observedProperty ?observedProperty .
?generatedObservation om-owl:result ?measureData .
?measureData om-owl:floatValue ?value .</p>
        <p>FILTER(?value = "78.0"^^xsd:float) .
?generatedObservation om-owl:samplingTime ?timeInstant .
?timeInstant owl-time:inXSDDateTime ?XSDTime .
?sensor om-owl:ID ?ID .
?sensor om-owl:hasLocatedNearRel ?locatedNear .
?locatedNear om-owl:hasLocation ?geonamesLocation .
?locatedNear om-owl:distance ?geonamesDistance .
?locatedNear om-owl:distanceUOM ?geonamesDistanceMeasure .
?sensor om-owl:processLocation ?sensorLocation .
?sensorLocation wgs84:lat ?latitude .
?sensorLocation wgs84:long ?longitude .</p>
        <p>}
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
$%"&amp;''()**+++#+,#-./*01*-+234(.-56278*""</p>
        <p>Q uery for data using provenance infor mation: An opposite perspective to the
first category of query is, given a set of constraints defined over provenance
information retrieve a set of data entities satisfying some set of constraints.
Example: ³ F ind all the sensors which have observations related to a blizzard
occurring in Nevada on 24th August 2005 at 11 AM ´
To solve this sensor discover query, provenance information describing the
spatio-temporal and thematic aspects of sensor observations and sensors can be
analyzed. Figure 6 describes the multiple steps followed in identifying the
appropriWDHVRUQ,6WHS VRUQOFDWHGLK1HDG³Y´ULRQJH identified (from
a pool of 20,000 sensors located across the United State). In Step 2, the sensors
that were active during the blizzard are identified, and finally in Step 3
provenance information describing the capabilities of a sensor help identify the
observations that are relevant for the blizzard under study (for example, a wind speed
sensor is considered relevant while a motion sensor is not considered relevant.)
F ig.6. Answering a sensor-discovery query using spatio-temporal, and thematic
provenance information!
!
!</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>8. R elated W or k</title>
      <p>Although this is the first attempt to develop an infrastructure for Sensor Provenance
Management, there have been successful attempts to do the same in the domain of
escience. Within the sensors domain, provenance has been addressed from the storage
point of view.</p>
      <p>
        Provenance management within the eScience community has primarily been
addressed in the context of workflow engines [
        <xref ref-type="bibr" rid="ref23">25</xref>
        ] while provenance management issues
have been surveyed by Simmhan et al. [
        <xref ref-type="bibr" rid="ref24">26</xref>
        ]. The database community has also
addressed the issue of provenance and defined various types of provenance, for example
K³Z\SUHRF´YDQ&gt; 7] anGKHU³ZSRY DQFH´&gt; 7]. A detailed comparison of PMF
(that underpins the Sensor PMS) with both workflow and database provenance is
presented in [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ].
      </p>
      <p>
        The Semantic Provenance Capture in Data Ingest Systems (SPCDIS) [
        <xref ref-type="bibr" rid="ref26">28</xref>
        ] is an
example of eScience project with dedicated infrastructure for provenance management. In
contrast to the Sensor PMS, the SPCDIS project uses the proof markup language
(PML) [
        <xref ref-type="bibr" rid="ref27">29</xref>
        ] to capture provenance information. The Inference Web toolkit [
        <xref ref-type="bibr" rid="ref27">29</xref>
        ]
features a set of tools to generate, register and search proofs encoded in PML. Both
Sensor PMS and the SPCDIS have common objectives but use different approaches to
achieve them, specifically the Sensor PMS uses an ontology-driven approach with
robust query infrastructure for provenance management.
      </p>
      <p>
        In the Sensors community, Ledlie et al. [
        <xref ref-type="bibr" rid="ref28">30</xref>
        ] show how provenance addresses the
naming and indexing issues related to sensor data storage. Park et al. [
        <xref ref-type="bibr" rid="ref29">31</xref>
        ] explore the
need for data provenance in Sensornet Republishing, a process of transforming
online sensor data and sharing the filtered, aggregated, or improved data with others.
      </p>
    </sec>
    <sec id="sec-4">
      <title>9. C onclusion</title>
      <p>This paper introduces an in-use ontology-driven provenance management
infrastructure for Sensor data called Sensor PMS. We have developed a domain specific sensor
provenance ontology by extending the provenir ontology. Due to this extension, SP
ontology can interoperate with other domain-specific provenance ontologies to
facilitate sharing and integration of provenance information from different domains and
projects. We also show how provenance information can help answer complex
queries within the sensors domain.</p>
      <p>A c knowledgments. This work is funded in part by NIH RO1 Grant#
1R01HL087795-01A1 The Dayton Area Graduate Studies Institute (DAGSI),
AFRL/DAGSI Research Topic SN08-8: "Architectures for Secure Semantic Sensor
Networks for Multi-Layered Sensing.".
!"#"$"%&amp;"'(
!
[1] W. C. Tan. Provenance in Databases: Past, Current, and Future. IE E E Data Engineering</p>
      <p>Bulletin, 30(4):3± 12, Dec. 2007.
[2] S. B. Davidson, S. C. Boulakia, A. Eyal, B.Lud¨ascher, T. M. McPhillips, S. Bowers, M.</p>
      <p>K.Anand, and J. Freire. Provenance in Scientific Workflow Systems. IE E E Data
Engineering Bulletin, 30(4):44± 50, Dec. 2007.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buneman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Data Provenance: Some Basic Issues</article-title>
          .
          <source>In Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical</source>
          Computer
          <string-name>
            <surname>Science (F ST T CS</surname>
          </string-name>
          <article-title>)</article-title>
          . Springer, Dec.
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.S.</given-names>
            <surname>Barga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.P.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thirunarayan</surname>
          </string-name>
          ,
          <article-title>"Where did you come from</article-title>
          ...
          <article-title>Where did you go?" An Algebra and RDF Query Engine for Provenance Kno</article-title>
          .e.sis Center, Wright State University;
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5] SensorMap. http://atom.research.microsoft.com/sensewebv3/sensormap/,
          <source>Retrieved March 22 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Wikipedia</given-names>
            <surname>Article</surname>
          </string-name>
          on Ontology. http://en.wikipedia.org/wiki/Ontology_(information_science),
          <source>Retrieved March 21 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Sensor</given-names>
            <surname>Data Ontology Model</surname>
          </string-name>
          . http://knoesis.wright.edu/research/semsci/application_domain/sem_sensor/ont/sensorobservation.owl,
          <source>Retrieved March 21 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Semantic</given-names>
            <surname>Web</surname>
          </string-name>
          <article-title>Wikipedia</article-title>
          . http://en.wikipedia.org/wiki/Semantic_Web, Retrieved March 15 2010
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Wikipedia</given-names>
            <surname>Article</surname>
          </string-name>
          on RDF. http://en.wikipedia.org/wiki/Resource_Description_Framework,
          <source>Retrieved March 19 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Resource</given-names>
            <surname>Description</surname>
          </string-name>
          <article-title>Framework</article-title>
          . http://www.w3.org/RDF/ Retrieved March 15 2010
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>SPARQL</given-names>
            <surname>Protocol</surname>
          </string-name>
          and
          <article-title>Language: Frequently Asked Questions</article-title>
          . http://www.thefigtrees.net/lee/sw/sparql-faq#what-is,
          <source>Retrieved March 15 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Linked</given-names>
            <surname>Open Data Cloud</surname>
          </string-name>
          , http://linkeddata.org/,
          <source>Retrieved March 20 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13] MesoWest. http://mesowest.utah.edu/index.html,
          <source>Retrieved March 20 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14] Observation and
          <string-name>
            <surname>Measurements (O&amp;M)</surname>
          </string-name>
          . http://www.opengeospatial.org/standards/om, Retrieved March 18 2010
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Patni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Henson</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Sheth, '
          <article-title>Linked Sensor Data,'</article-title>
          <source>In: Proceedings of 2010 International Symposium on Collaborative Technologies and Systems (CTS</source>
          <year>2010</year>
          ), Chicago, IL, May
          <volume>17</volume>
          -21,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Openlink</given-names>
            <surname>Software</surname>
          </string-name>
          . http://www.openlinksw.com/,
          <source>Retrieved March 12 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17] GeoNames. http://www.geonames.org/,
          <source>Retrieved March 12 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>[18] XSLT. http://www.w3.org/TR/xslt, Retrieved March 12 2010</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>SSW</given-names>
            <surname>Dataset</surname>
          </string-name>
          . http://wiki.knoesis.org/index.php/SSW_Datasets, Retrieved March 12 2010
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Weatherly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mutharaju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Anantharam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Tarleton</surname>
          </string-name>
          , ³ Ontology-Driven
          <source>Provenance Management in eScience: An Application in Parasite Research.´ OTM Conferences (2)</source>
          <year>2009</year>
          :
          <fpage>992</fpage>
          -
          <lpage>1009</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Virtuoso</surname>
            <given-names>SPARQL</given-names>
          </string-name>
          endpoint. http://harp.cs.wright.edu:8890/sparql, Retrieved March 14 2010
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ceusters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Klagges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kohler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lomax</surname>
          </string-name>
          , et al.,
          <article-title>³ Relations in biomedical ontologies</article-title>
          .
          <source>´ Genome Biol</source>
          <year>2005</year>
          ;
          <volume>6</volume>
          (
          <issue>5</issue>
          ):
          <fpage>R46</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Provenir</given-names>
            <surname>Ontology</surname>
          </string-name>
          . http://knoesis.wright.edu/library/ontologies/provenir/provenir.owl,
          <source>Retrieved March 13 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <article-title>3QUW³LRHY$6KD ontology: Towards a Framework for eScience ProveKNS2:W´6$8LURVQI0JHPDF</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Provenance</given-names>
            <surname>Challenge</surname>
          </string-name>
          <article-title>Wiki</article-title>
          . http://twiki.ipaw.info/bin/view/Challenge/WebHome, Retrieved March 11 2010
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.L.</given-names>
            <surname>Simmhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gannon</surname>
          </string-name>
          , ³
          <article-title>A survey of data provenance in e-science´</article-title>
          <source>SIGMOD Rec</source>
          .
          <year>2005</year>
          ;
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>P.</given-names>
            <surname>Buneman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          , W.C. Tan, ³
          <article-title>Why and Where: A Characterization of Data Provenance</article-title>
          .´ In: 8th International Conference on Database Theory;
          <year>2001</year>
          ;
          <year>2001</year>
          . p.
          <fpage>316</fpage>
          -
          <lpage>330</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [28] SPCDIS. http://spcdis.hao.ucar.edu/,
          <source>Retrieved March 11 2010</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Inference</given-names>
            <surname>Web</surname>
          </string-name>
          . http://iw.stanford.
          <source>edu/2</source>
          .0/, Retrieved March 11 2010
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ledlie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Holland</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-K. Muniswamy-Reddy</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Braun</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Seltzer</surname>
          </string-name>
          . ³
          <article-title>Provenance-aware sensor data storage</article-title>
          .´
          <source>In NetDB</source>
          <year>2005</year>
          ,
          <year>April 2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>U.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Heidemann.</surname>
          </string-name>
          ³ Provenance in Sensornet Republishing.´
          <source>In Proceedings of the 2nd International Provenance and Annotation Workshop</source>
          , pp.
          <fpage>208</fpage>
          -
          <lpage>292</lpage>
          . Salt Lake City, Utah, USA, Springer-Verlag. June, 2008
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>