Automating the web publishing process of environmental
data by using semantic annotations
Jürgen Moßgraber Désirée Hilbring
Fraunhofer IOSB Fraunhofer IOSB
Fraunhoferstraße 1 Fraunhoferstraße 1
76131 Karlsruhe 76131 Karlsruhe
juergen.mossgraber@iosb.fraunhofer.de desiree.hilbring@iosb.fraunhofer.de
ABSTRACT semantics, which require time-consuming discussions between
domain and IT experts. Furthermore, the domain experts need to
Large amounts of environmental data are still hidden away in be in control of which data are published. Since this is daily
databases only accessible by domain experts. There is the need to business, no programming should be required.
make this data available to other experts for further data fusion. In the following section 3, relevant standardized and
To implement standards like the Sensor Observation Service proprietary service interfaces for environmental data and their
(SOS) huge efforts on the side of environmental agencies are data models are described. The challenges of mapping data
required. At the same time, the pressure to make this data models are explained in section 4. After that, we present a method
available to the interested public arises in form of Linked Open to simplify the task of mapping the data models by facilitating
Data (LOD). This additional demand requires even more ontologies (section 5) and show a system architecture and
programming resources to fulfill the new requirements and experimental implementation based on our Extensible Database
interfaces. In this paper, we describe a system architecture, which Application Configurator (XCNF) framework.
simplifies and automates this problem of publishing
environmental data in different data models. Ontologies are
applied to map the different models’ syntax and semantics.
2. RELATED WORK
Additionally, we present a proof-of-concept implementation A lot of research has been executed in the area of mapping
supporting both SOS and LOD interfaces. (data) models. Especially, mapping schemas of relational
databases, which have been available for a long time, were in
Keywords focus. A good overview of the state-of-the-art is given by [5].
Linked Open Data, Semantic, Sensor Observation Service (SOS), More current research focuses on XML and ontology models [7]
Web Publishing, Software Architecture. of which the later have the advantage providing the semantics of
the model as well. In addition, mapping between these different
kinds of models has been researched. However, until now there is
1. INTRODUCTION no fully automatic mapping algorithm, which solves the problem
Geographical data play an increasingly important role in many 100% [6]. Therefore, we center the following work on
application fields. Especially in the environmental domain, large simplifying the manual mapping of models by facilitating
amounts of measurement data are stored in expert databases. semantic annotations, which can be applied by a domain expert.
However, these are not accessible to other public bodies and to An overview of the state-of-the-art in Linked Data is given in
the citizens. One reason for this is, among others the lack of use [4]. Tools such as “D2R Server” [12] are used to publish data
of standards for accessing the data. stored in relational databases. The data publisher defines a
The challenge is not to address a specific standard but the mapping between the relational schema of the database and the
increasing number of standards that have to be supported by an target ontology vocabulary with a declarative mapping language.
environmental information system. Examples are standards of the Due to this static nature domain experts cannot apply changes
Open Geospatial Consortium (OGC) such as Web Feature Service easily. Exemplary works are described in [13] and [14].
(WFS) and the Sensor Observation Service (SOS). At the same
time, the pressure to make this data available to the interested
public brings up the requirement to support also standards from 3. RELEVANT INTERFACES AND DATA
the Linked Open Data (LOD) domain. MODELS
Huge efforts on the side of environmental agencies would be The Open Geospatial Consortium (OGC) is concerned with the
required to support all of them, which is way beyond the budgets definition of standardized interfaces in the domain of
of these institutions. Not only the plain programming work needs geographical information and increasingly in the area of sensor
to be considered but also the mapping of the syntax and semantics data ("Sensor Web Enablement").
of the different data models. The difficulty lies especially in the
3.1 Sensor Observation Service (SOS)
Copyright © by the paper’s authors. Copying permitted only for private The SOS specification [1] provides operations to retrieve
and academic purposes. sensor data and specifically “observation” data.
In: S. Vrochidis, K. Karatzas, A. Karpinnen, A. Joly (eds.): Proceedings of The observations themselves are defined by another OGC
the International Workshop on Environmental Multimedia Retrieval standard: the Observation and Measurement Model (O&M) [2].
(EMR2014), Glasgow, UK, April 1, 2014, published at http://ceur-ws.org Observations described by O&M can be seen directly as
1
measurements from sensors, but they can also represent other data As noted above in 3.4 XCNF provides a metadata model to
structures. describe data models, which can change dynamically. This means
that we cannot apply a once-only mapping of the models. Instead,
3.2 Web Feature Service (WFS) the mapping always needs to be adjusted if an end user makes a
The Web Feature Service (WFS) represents a change in the change and therefore needs to be dynamic too.
way geographic information is created, modified and exchanged
on the Internet. Rather than sharing geographic information at the
file level using File Transfer Protocol (FTP), for example, the
WFS offers direct fine-grained access to geographic information
at the feature and feature property level [3].
3.3 Linked Open Data (LOD)
In computing, linked data (often capitalized as Linked Data)
describes a method of publishing structured data so that it can be
interlinked and become more useful. It builds upon standard Web
technologies such as HTTP, RDF and URIs, but rather than using
them to serve web pages for human readers, it extends them to Figure 1. Required mapping for accessing time series with
share information in a way that can be read automatically by different standards
computers. This enables data from different sources to be
connected and queried [4]. 4.1 Concept
To publish data from XCNF the existing features are used and
3.4 XCNF extended by ontology annotations:
XCNF (eXtensible database application CoNFigurator) is a An ontology is required for each interface which should
Java based client/server framework by Fraunhofer IOSB for be supported (SOS, WFS, etc.). The ontology must
developing information systems for time series analysis. While contain the specific concepts and properties to describe
the framework can be applied to any domain, we mainly apply it the model. Preferably, an existing ontology should be
to the domains of water management and water quality. Most of reused.
the data are time series with spatial relationships.
All required concepts and their accompanying
properties contained in the used ontology must be
mapped to existing XCNF Views and their attributes.
This is done by annotating them with the URIs of
ontology resources. For example if available datasets
shall be published as SOS Observations the appropriate
XCNF View is annotated with #Observation (this is
only the hash part of the URI for better readability). The
attributes of the view need to be annotated with
properties from the ontologies too, e.g. #hasValue,
#hasTime, etc.
Other interfaces (e.g. LOD) can be supported by
annotating the views with URIs from the ontology used
for the other interface.
The specific publishing service (SOS, WFS, etc.) can
now read all of the entries from the related XCNF
XCNF uses a proprietary metadata model, which not only
Views, annotated by concepts of its ontology.
describes the data but also the layout of input forms and search
masks. XCNF uses a concept called View. A View provides Since the structure is given by the ontology the service
access to a part of one or more connected databases quite similar can relate multiple views which belong together.
to a database view. In contrast to a database view, it provides
additional annotations to add semantics to its attributes and link
attributes to other views. This has the consequence that every end
user creates or extends its own data model by creating or
modifying a XCNF View.
4. PUBLISHING AND MAPPING OF DATA
MODELS
Figure 1 depicts the problem that needs to be solved. Several
interface standards with their specific data models have to be Figure 2. Architecture for SOS accessing XCNF
mapped with respect to their syntax and semantics to a proprietary
data model of an existing information system in the backend.
2
5. ARCHITECTURE AND IMPLEMENTA- #hasUnit GEW_MESSWERT_GUET
TION E.DIMENSION_NR
The architecture and implementation of an SOS interface is #Phenomenon GEW_PARAMETER
described in the following. Other interfaces can be supported in
the same way. #hasID GEW_PARAMETER
The following figure depicts the components of the system that .BASIS_NR
will be described in the following sub-sections:
#hasName GEW_PARAMETER
KURZNAME
5.1 Ontology #Procedure UIS_SL_MESSVERFAHR
Several translations to an ontology are available for the EN
Observation and Measurement Model (O&M) [8]. Since they tend
to be rather complex we have extracted only those concepts and #hasName UIS_SL_MESSVERFAHR
properties which were necessary for the mapping. The following EN.LANGNAME
concepts and their properties are used: #hasID UIS_SL_MESSVERFAHR
Observation EN.MESSVERFAHREN_
hasObservedProperty NR
measuredByProcedure #FeatureOfInterest GEW_PNST, GEW_MST,
hasValue GEW_POSITION
relatesToFeatureOfInterest
hasTime #hasName GEW_MST.NAME
hasUnit
Phenomenon #hasID GEW_PNST.PNST_NR
hasName #hasNorthing GEW_POSITION.HW
hasID
#hasEasting GEW_POSITION.RW
Procedure
hasName Since the example above contains German words and
hasID acronyms here is a little glossary:
FeatureOfInterest MESSWERT: measurement
hasName PROBE: observation
hasID GUETE: quality
hasNorthing MESSVERFAHREN: measurement procedure
hasEasting DATUM: date
KURZNAME: short name
5.1.1 Mapping Example LANGNAME: long name
Our test data is taken from the Fachinformationssystem HW + RW: the geo location
Gewässer Qualität (FISGeQua) which contains water quality data
from all measurement stations of the German state Baden- 5.2 SOS Requests and Results
Württemberg. By facilitating the above mapping, it is possible to receive data
The following tables show how the XCNF-Views of FISGeQua from the FISGeQua database to make it accessible via an SOS
have been annotated with resources from the SOS ontology to interface. A typical SOS request can be formulated in the
support the SOS interface: following way:
Give me all available data which matches the following
Ontology Concepts and XCNF Views and conditions:
Properties Attributes o The #Phenomenon shall be water
#Observation GEW_MESSWERT_GUET temperature.
E, GEW_PROBE o The #Procedure, which has been used to
determine the water temperature is
#hasObservedProperty GEW_MESSWERT_GUET electrometry.
E.PARAMETER_NR
o The data has been measured in the time range
#measuredByProcedure GEW_MESSWERT_GUET of 2nd to 4th January 2005.
E.MESSVERFAHREN_NR
o The #FeatureOfInterest, which defines the
#hasValue GEW_MESSWERT_GUET spatial region, is the measuring point with id
E.MESSWERT 1051.
#relatesToFeatureOf- GEW_PROBE.PNST_NR This request in the SOS XML notation looks like the
Interest following:
and the annotations of the views. The service provides the
10
following methods:
/xcnfrestservice/capabilities Provides a list with all
8289 supported models/ontologies
TW /xcnfrestservice/capabilities/viewNames Get the
names of all published XCNF views, e.g.:
{"viewNames":["GEW_MESSWERT_GUETE",
phenomenonTime
"GEW_PROBE","GEW_PARAMETER","UIS_SL_M
ESSVERFAHREN","GEW_PNST",“GEW_MST“,“G
2005-01- EW_POSITION“]}
02T14:00:00.000+01:00
2005-01-
/xcnfrestservice/capabilities/mapping Get the
04T15:00:00.000+01:00 mapping of the views to the ontology concepts and
properties. Note that multiple annotations from different
ontologies could be applied if the data should be
available via different interfaces! The following shows
1051.0 the mapping part for #Phenomenon and
#FeatureOfInterest and with deleted URIs to keep it
http://www.opengis.net/om/2.0
{"mappingStructure":{"viewMappingLi
st":[{"viewName":"GEW_PARAMETER","con
As you can see, the request contains no FISGeQua specific ceptNames":["#Phenomenon"],"mappingLi
nomenclatures. Here is the response to this request: st":[{"columnName":"KURZNAME","concep
t":"#hasName"}]},{"viewName":"
/xcnfrestservice/capabilities/model/?uri=uri Get the
ontology with the given URI.
abc8dbd3-
13ff-442a-9e23-80a9ec96881f
data mapped to the concept with the given URI.
ri&valueList=value Query for the data mapped to the
concept with the given URI. The response is filtered
with the properties provided in the additional
2005-01- parameters.
03T12:10:00.000+01:00
5.4 SOS server and XCNF-DAO
software, provides a SOS implementation based on Java. As
52°North develops the reference implementation for the OGC
SOS specification we chose their software (see
6.5
basis for our proof-of-concept implementation.
We chose an early access version 4 (4.0.0 Beta2) of the
software since it provides much better modularity than version 3.
In this new version there is now a defined way for plugging in
It says that a water temperature (TW) of 6.5°C has been your own data access into the server via so-called Data Access
measured at 1051 on January 3rd. Objects (DAO). Out of the box it retrieves its’ data from a
relational database in a proprietary format which did not fit our
4
needs since we wanted a direct access to the data stored in an Either the XNCF REST Service would need to map its
XCNF server for performance reasons. responses to RDF or OWL or
The implemented XCNF-DAO plugs into the SOS server. It
we use our extended SOS implementation and map the
retrieves the data from the XCNF-REST service by utilizing the
resulting XML Observation Collection to RDF or
SOS ontology annotations. The retrieved data is handed over to
OWL.
the SOS server, which handles the syntax formatting and
encoding (see Figure 2). The first approach will be faster, because it saves one mapping
step. However, it will contain a proprietary solution while the
second approach can use existing geospatial standards and might
6. DISCUSSION reuse mechanisms described in [10] and [11].
6.1 Distribution of Concept Properties over 6.4 Adapting the approach to other systems
several Views In this paper, we used the XCNF framework as an example to
Analyzing the mapping example described in section 5.1.1, one demonstrate our approach but it can be adapted to other systems
can see that it often happens that the properties of one concept as well. To facilitate that, the following steps need to be taken:
need to be mapped to attributes, which belong to several different 1. Enable annotation of your relational data (could be done
XCNF Views. Here is an example: with a standard relational mapper).
The #hasObservedProperty property of an #Observation can be 2. Support multiple mappings (ontologies)
found in the XCNF View GEW_MESSWERT_GUETE while the
property #relatedToFeatureOfInterest is contained in XCNF View 3. Add the possibility for the user to dynamically change
GEW_PROBE. the mapping
Requesting the #Observation concept via the integrated XCNF 4. Provide the means to publish only selections of the data
View filtering option filtered with #hasObservedProperty=A or (done by XCNF views in our approach).
requesting the #Observation concept filtered with
#relatedToFeatureOfInterest=B will lead in both cases to too
many results if the second filter option is missing. 7. CONCLUSION
To support the filtering mechanism of the XCNF REST In this paper, we presented a concept for dynamically mapping
Service data models of domain expert systems to different interface
/xcnfrestservice/data/filter/?uri=uri&propertyList=uri&valueList= standards by annotating the model with resources from an
value, the implementation must provide an additional filtering ontology. In contrast to static approaches like D2R shown in the
operation before returning the results via the URI. related work section, this allows for quicker adaptions to new
requirements by the domain expert.
6.2 Reducing the Amount of Data to be The described implementation shows that the concept is
applicable to a real world scenario. In the future, we will work on
published removing the discussed drawbacks and improve the user interface
Often only subsets of the data in the database are foreseen for for executing the mapping. For example, ontology properties for
publishing. Therefore, we need a mechanism for defining which an annotation could be suggested to the user depending on the
subsets of the data in the database can be delivered via the XCNF data type and the selected ontology concept. Furthermore, since
REST Service. XCNF views already contain some metadata annotations it is
XCNF foresees the possibility to create so called BDOs interesting to explore to what degree the mappings can be created
(“Benutzerdefiniertes Objekt”), which are user-defined objects. It automatically.
is possible to create a BDO which reduces the amount of data in
the database to the subset which shall be published, e.g. via
defining specific measurement points, a specific time range or 8. REFERENCES
specific phenomena. [1] Bröring, A., Stasch, C., Echterhoff, J. (Ed.) 2012. OGC®
Currently we consider implementing the following mechanism: Sensor Observation Service Interface Standard, Version: 2.0,
1. The #Observation concept in the ontology needs to be Open Geospatial Consortium Inc., 12-006
extended with a new property #hasBDO. [2] Cox, C. (Ed.) 2011. Observations and Measurements - XML
2. The owner of the database needs to define a specific Implementation, Version:2.0, Open Geospatial Consortium,
BDO for the data subset to be published. 10-025r1
3. This BDO needs to be annotated with #hasBDO. [3] Vretanos, P. (Ed.) 2010. OpenGIS Web Feature Service 2.0
Interface Standard, Open Geospatial Consortium, OGC 09-
4. The implementation of the XCNF REST Service 025r1 and ISO/DIS 19142
/xcnfrestservice/data?uri=uri and its filter mechanism
need to be extended with an additional filter [4] Bizer, C., Heath, T., Berners-Lee, T. 2009. Linked Data -
(propertyList: #hasBDO, valueList: #8289) which is not The Story So Far, International Journal on Semantic Web
seen from outside the XCNF Rest Service. and Information Systems 5 (3): 1–22.
doi:10.4018/jswis.2009081901
6.3 Ideas for Integrating Linked Open Data [5] Bellahsene, Z., Bonifati, A., Rahm, E. 2011. Schema
The possible support of Linked Open Data was another idea we Matching and Mapping, Springer Verlag, doi:10.1007/978-3-
had. Therefore, the architecture foresees the possibility to support 642-16518-4
several interfaces. The additional support of LOD would require [6] Bernstein, P., Madhavan, J., Rahm, E. Generic Schema
that we provide our data in RDF or OWL format. For our current Matching, Ten Years Later, 37th International Conference
implementation the following two possibilities exist:
5
on Very Large Data Bases (Seattle, Washington August 29th Washington, DC, USA, October 2009; CEUR-WS: Aachen,
- September 3rd 2011) Germany, 2010; Volume 522, pp. 49–63.
[7] Gross, A., Hartung, M., Thor, A., Rahm, A. 2012. How do [11] Probst, F., Gordon, A., Dornelas, I. 2006. Ontology-based
computed ontology mappings evolve?, Joint Workshop on Representation of the OGC Observations and Measurements
Knowledge Evolution and Ontology Dynamics (ISWC 2012) Model, Open Geospatial Consortium
[8] Compton, M., et al. 2012. The SSN ontology of the W3C [12] Bizer, C., Cyganiak, R. 2006. D2R Server - Publishing
semantic sensor network incubator group, Web Semantics: Relational Databases on the Semantic Web. Poster at the 5th
Science, Services and Agents on the World Wide Web 17, International Semantic Web Conference (ISWC2006)
p25-32. [13] Moraru, A., Fortuna, C., Mladenic, D. 2011. A System for
[9] Fielding, R., Taylor, R. 2002. Principled Design of the Publishing Sensor Data on the Semantic Web. Journal of
Modern Web Architecture, ACM Transactions on Internet Computing and Information Technology - CIT 19, 2011, 4,
Technology (TOIT) (New York: Association for Computing 239–245, doi:10.2498/cit.1002030
Machinery) 2 (2): 115–150, doi:10.1145/514183.514185, [14] Page, K., Frazer, A., Nagel, B., De Roure, D., Martinez, K.
ISSN 1533-5399 2011. Semantic Access to Sensor Observations through Web
[10] Page, K., De Roure, D., Martinez, K., Sadler, J., Kit, O. APIs, Fifth IEEE International Conference on Semantic
2009. Linked Sensor Data: RESTfully Serving RDF and Computing
GML. In Proceedings of 2nd International Workshop on
Semantic Sensor Networks (SSN09), conjunction with the
8th International Semantic Web Conference, ISWC 2009,
6