Developing Registries for the Semantic Sensor Web using stRDF and stSPARQL (short paper)⋆ Kostis Kyzirakos, Manos Karpathiotakis and Manolis Koubarakis Dept. of Informatics and Telecommunications National and Kapodistrian University of Athens, Greece {kkyzir,mk,koubarak}@di.uoa.gr Abstract. We address the problem of registering and discovering sen- sors, sensor services and other related resources in the Semantic Sensor Web. We show how to develop a semantic registry for the storage and manipulation of metadata describing such resources using an implemen- tation of the data model stRDF and the query language stSPARQL. stRDF extends RDF with the ability to represent spatial and tempo- ral metadata using linear constraints. stSPARQL extends SPARQL for querying stRDF metadata. Together they provide a natural modeling framework for the problem at hand, since spatial and temporal metadata figure prominently in the description of sensors and related resources or services. We present Strabon, a storage and query evaluation module for stRDF/stSPARQL for the architecture of the EU FP7 project Semsor- Grid4Env and we show how to use Strabon to implement a semantic registry. 1 Introduction Millions of sensors are currently been deployed in sensor networks around the globe and are actively collecting an enormous amount of data. Together with legacy data sources, specialized software modules (e.g., modules performing mathematical modeling and simulation) and current Web 2.0 technologies such as mashups, deployed sensor networks give us the opportunity to develop unique applications in a variety of sectors (environment, agriculture, health, transporta- tion, surveillance, public security etc.). The term Semantic Sensor Web (SSW) [1] has recently been used to refer to the combination of sensor network, Web and semantic technologies with the view of addressing the opportunity that we have to unify the real and the virtual world. The SSW vision is currently shared by several research projects world-wide [1,2,3,4,5]. Another important activity in this area is the standardization activities of the Sensor Web Enablement (SWE) Working Group of the Open Geospatial Con- sortium (OGC) [6]. Since the standards of this working group do not emphasize semantics, it is interesting to investigate how SSW efforts can be coupled with or can extend the OGC SWE proposals [1]. ⋆ This work was supported in part by the European Commission project Semsor- Grid4Env (http://www.semsorgrid4env.eu/). An important component of SSW architectures is the registry which is used for discovering sensors, sensor services and other related resources. Registries store metadata about SSW resources and make this metadata available to client applications. In this paper we discuss the design and implementation of a se- mantic registry that is currently being developed in the context of the European FP7 SSW project SemsorGrid4Env[3]. In SemsorGrid4Env resource metadata is modeled using stRDF, a constraint-based extension of RDF, that can be used to represent thematic, spatial and temporal metadata. Resource metadata are queried using stSPARQL, an extension to SPARQL for querying stRDF data [7]. After introducing stRDF and stSPARQL, we present Strabon, a storage and query evaluation module for stRDF/stSPARQL. Strabon is built on top of the well-known RDF store Sesame [8] and extends Sesame’s components to be able to manage thematic, spatial and temporal metadata that are stored in PostGIS. Then, we show how to use Strabon to develop a semantic registry for the SSW architecture of SemsorGrid4Env. The organization of this paper is the following. In Section 2 we present the data model stRDF and the query language stSPARQL by means of examples. In Section 3 we present our current implementation of stRDF/stSPARQL and in Section 4 we present our implementation of a semantic registry. Comparison with related work is presented in Section 5 and in Section 6 we present our conclusions and discuss future work. 2 A Data Model and Query Language for SSW Resource Metadata The first two questions that needed to be answered while developing a registry for the SSW are: (i) what data model should be used to encode SSW resource metadata and (ii) what query language should be used to facilitate the discovery of SSW resouces. As first pointed out in [1], SSW resource metadata can be dis- tinguished into thematic (e.g., the sensor measures windspeed), spatial (e.g., the sensor is located in Athens) and temporal (e.g., the sensor was dead throughout the last two weeks). We have chosen to use RDF(S) as the base of our reg- istry metadata model and SPARQL as the base of our query language. However, RDF(S) can only represent thematic metadata and needs to be extended if we want to model spatial and temporal information. In [7] we have presented such an extension of RDF and SPARQL, called stRDF and stSPARQL respectively. In the rest of this section we present this data model and query language mostly by means of examples. Material in this section comes directly from [7] where the reader can also find a formal definition of the syntax and the semantics of stSPARQL query evaluation. 2.1 Data model To develop stRDF, we follow closely the ideas of constraint databases and es- pecially the work on CSQL [9]. First, we define the formulae that we allow as constraints. Then, we develop stRDF in two steps. The first step is to define the model sRDF which extends RDF with the ability to represent spatial data. Then, we extend sRDF to stRDF so that thematic and spatial data with a temporal dimension can be represented. Linear constraints. Constraints will be expressed in the first-order language L = {≤, +} ∪ Q over the structure Q = ⟨Q, ≤, +, (q)q∈Q ⟩ of the linearly ordered, dense and unbounded set of the rational numbers, denoted by Q, with rational constants and addition. The atomic ∑p formulae of this language are linear equations and inequalities of the form: i=1 ai xi Θa0 , where Θ is a predicate among =, or ≤, the xi ’s denote variables and the ai ’s are integer constants. Note that rational constants can always be avoided in linear equations and inequalities. The multiplication symbol is used as an abbreviation i.e., ai xi stands for xi + · · · + xi (ai times). We now define semi-linear subsets of Qk , where k is a positive integer. Definition 1. Let S be a subset of Qk . S is called semi-linear if there is a quantifier-free formula ϕ(x1 , . . . , xk ) of L where x1 , . . . , xk are variables such that (a1 , . . . , ak ) ∈ S iff ϕ(a1 , . . . , ak ) is true in the structure Q. We will use ∅ to denote the empty subset of Qk represented by any incon- sistent formula of L. The sRDF data model. As in theoretical treatments of RDF [10], we assume the existence of pairwise-disjoint countably infinite sets I, B and L that contain IRIs, blank nodes and literals respectively. In sRDF, we also assume the existence of an infinite sequence of sets C1 , C2 , . . . that are pairwise-disjoint with I,B and L. The elements of each Ck , k = 1, 2, . . . are the quantifier-free formulae of the first-order language L with k free variables. We denote with C the infinite union C1 ∪ C2 ∪ · · · . Definition 2. An sRDF triple is an element of the set (I∪B)×I×(I∪B∪L∪C). If (s, p, o) is an sRDF triple, s will be called the subject, p the predicate and o the object of the triple. An sRDF graph is a set of sRDF triples. In the above definition, the standard RDF notion of a triple is extended, so that the object of a triple can be a quantifier-free formula with linear constraints. According to Definition 1 such a quantifier-free formula with k free variables is a finite representation of a (possibly infinite) semi-linear subset of Qk . Semi-linear subsets of Qk can capture a great variety of spatial geometries, e.g., points, lines, line segments, polygons, k-dimensional unions of convex polygons possibly with holes, thus they give us a lot of expressive power. However, they cannot be used to represent other geometries that need higher-degree polynomials e.g., circles. Example 1. The following are sRDF triples: ex:s1 rdf:type, ex:Sensor . ex:s1 ex:has_location "x=10 and y=20"^^strdf:SemiLinearPointSet The above triples define a sensor and its location using a conjunction of linear constraints. The last triple is not a standard RDF triple since its object is an element of set C. In terms of the W3C specification of RDF, sRDF can be realized as an ex- tension of RDF with a new kind of typed literals: quantifier-free formulae with linear constraints (we will usually call them spatial literals). The datatype of these literals is e.g., strdf:SemiLinearPointSet (see Example 1 above) and can be defined using XML Schema. Alternatively, linear constraints can be ex- pressed in RDF using MathML1 and serialized as rdf:XMLLiterals as in [11]. [11] specifies a syntax and semantics for incorporating linear equations in OWL 2. We now move on to define stRDF. The stRDF Data Model We will now extend sRDF with time. Database researchers have differentiated among user-defined time, valid time and transac- tion time. RDF (and therefore sRDF) supports user-defined time since triples are allowed to have as objects literals of the following XML Schema datatypes: textt- txsd:dateTime, xsd:time, xsd:date, xsd:gYearMonth, xsd: gYear, xsd:gMonthDay, xsd:gDay, xsd:gMonth. stRDF extends sRDF with the ability to represent the valid time of a triple (i.e., the time that the triple was valid in reality) using the approach of Gutierrez et al. [12] where the a fourth component is added to each sRDF triple. The time structure that we assume in stRDF is the set of rational numbers Q (i.e., time is assumed to be linear, dense and unbounded). Temporal constraints are expressed by quantifier-free formulas of the language L defined earlier, but their syntax is limited to elements of the set C1 . Atomic temporal constraints are formulas of L of the following form: x ∼ c, where x is a variable, c is a rational number and ∼ is <, ≤, ≥, >, = or ̸=. Temporal constraints are Boolean combinations of atomic temporal constraints using a single variable. The following definition extends the concepts of triple and graph of sRDF so that thematic and spatial data with a temporal dimension can be represented. Definition 3. An stRDF quad is an sRDF triple (a, b, c) with a fourth com- ponent τ which is a temporal constraint. For quads, we will use the notation (a, b, c, τ ), where the temporal constraint τ defines the set of time points that the fact represented by the triple (a, b, c) is valid in the real world. An stRDF graph is a set of sRDF triples and stRDF quads. 2.2 Query language We present the syntax of stSPARQL by means of examples involving sensor networks. More examples of stSPARQL are given in [13,7]. We consider a dataset that describes static and moving sensors and uses the CSIRO/SSN Ontology [14] to describe them. The main classes of interest in the SSN ontology is the class F eature that describes the observed domain, the class Sensor that describes the sensor, the class SensorGrounding that describes the physical characteristics and the location of the sensor and the class Location that is self explained. We extend the aforementioned ontology with the properties strdf:hasGeometry and strdf:hasTrajectory with range strdf:SemiLinearPointSet. 1 http://www.w3.org/Math/ The stRDF description of a static sensor that measures temperature and has a certain location is the following (ssn is the namespace of the CSIRO/SSN ontology and ex an example ontology): ex:sensor1 rdf:type ssn:Sensor . ex:sensor1 ssn:measures ex:temperature . ex:temperature ssn:type ssn:PhysicalQuality . ex:sensor1 ssn:supports ex:grounding1 . ex:grounding1 rdf:type ssn:SensorGrounding . ex:grounding1 ssn:hasLocation ex:location1 . ex:location1 rdf:type ssn:Location . ex:location1 strdf:hasGeometry "x=10 and y=10"^^strdf:SemiLinearPointSet . Let us now present an example of modeling moving sensors in stRDF. Tra- jectories of moving sensors are represented by semi-linear sets of dimension 3 in variables x, y, and t. ex:sensor2 rdf:type ssn:Sensor . ex:sensor2 ssn:measures ex:temperature . ex:sensor2 ssn:supports ex:grounding2 . ex:grounding2 rdf:type ssn:SensorGrounding . ex:grounding2 ssn:hasLocation ex:location2 . ex:location2 rdf:type ssn:Location . ex:location2 strdf:hasTrajectory "(x=10t and y=5t and 0<=t<=5) or (x=10t and y=25 and 5<=t<=10)"^^strdf:SemiLinearPointSet. Finally, we assume that we have the stRDF descriptions of some rural area where the sensors are deployed. The stRDF description of such an area called Paros is: ex:area1 rdf:type ex:RuralArea . ex:area1 ex:hasName "Paros" . ex:area1 strdf:hasGeometry "(-10x+13y<=-50 and y<=79 and y>=13 and x<=133) or (y<=13 and x<=133 and x+2y>=129)"^^strdf:SemiLinearPointSet . Example 2. Spatial selection. Find the URIs of the static sensors that are inside the rectangle R(0,0,100,100)? select ?S where {?S rdf:type ssn:Sensor . ?G rdf:type ssn:SensorGrounding . ?L rdf:type ssn:Location . ?S ssn:supports ?G . ?G ssn:haslocation ?L . ?L strdf:hasGeometry ?GEO . filter(?GEO inside "0<=x<=100 and 0<=y<=100")} Let us now explain the new features of stSPARQL by referring to the above example. In stSPARQL, variables can be used in basic graph patterns to re- fer to spatial literals denoting semi-linear point sets. They can also be used in spatial filters, a new kind of filter expressions introduced by stSPARQL that is used to compare spatial terms using spatial predicates. Spatial terms in- clude spatial constants (finite representations of semi-linear sets e.g., "0<=x<=10 and 0<=y<=10"), spatial variables and complex spatial terms (e.g., ?GEO INTER "x=10 and y=10" which denotes the intersection of the value of spatial vari- able ?GEO and the semi-linear set "x=10 and y=10"). There are several types of spatial predicates such as topological, distance, directional, etc. that one could introduce in a user-friendly spatial query language. In the current version of stSPARQL only the topological relations of [15] can be used as predicates in a spatial filter expression e.g., filter(?GEO1 inside ?GEO2). Example 3. Intersection of an area with a trajectory. Which areas of Paros were sensed by a moving sensor and when? select (?TR[1,2] INTER ?GEO) as ?SENSEDAREA ?GEO[3] as ?T1 where {?SN rdf:type ssn:Sensor . ?RA rdf:type ex:RuralArea. ?X rdf:type ssn:SensorGrounding . ?Y rdf:type ssn:Location. ?SN ssn:supports ?X . ?X ssn:hasLocation ?Y. ?Y strdf:hasTrajectory ?TR . ?RA ex:hasName "Paros". ?RA strdf:hasGeometry ?GEO . filter(?TR[1,2] overlap ?GEO)} The above query demonstrates the projection of spatial terms. Projections of spatial terms (e.g., ?TR[1,2]) denote the projections of the corresponding point sets on the appropriate dimensions, and are written using the notation Variable "[" Dimension1 "," ... "," DimensionN "]". Example 4. Projection and spatial function application. Find the URIs and the locations of the sensors that are north of Paros. Encode the locations in WKT. select ?SN ToWKT(?SN_LOC) AS ?WKT_LOC where {?RA rdf:type ex:RuralArea . ?RA ex:hasName "Paros" . ?RA strdf:hasGeometry ?GEO . ?SN rdf:type ssn:Sensor . ?X rdf:type ssn:SensorGrounding . ?Y rdf:type ssn:Location . ?SN ssn:supports ?X . ?X ssn:hasLocation ?Y . ?Y strdf:hasGeometry ?SN_LOC.filter(MAX(?GEO[2])