Automating the web publishing process of environmental data by using semantic annotations Jürgen Moßgraber Désirée Hilbring Fraunhofer IOSB Fraunhofer IOSB Fraunhoferstraße 1 Fraunhoferstraße 1 76131 Karlsruhe 76131 Karlsruhe juergen.mossgraber@iosb.fraunhofer.de desiree.hilbring@iosb.fraunhofer.de ABSTRACT semantics, which require time-consuming discussions between domain and IT experts. Furthermore, the domain experts need to Large amounts of environmental data are still hidden away in be in control of which data are published. Since this is daily databases only accessible by domain experts. There is the need to business, no programming should be required. make this data available to other experts for further data fusion. In the following section 3, relevant standardized and To implement standards like the Sensor Observation Service proprietary service interfaces for environmental data and their (SOS) huge efforts on the side of environmental agencies are data models are described. The challenges of mapping data required. At the same time, the pressure to make this data models are explained in section 4. After that, we present a method available to the interested public arises in form of Linked Open to simplify the task of mapping the data models by facilitating Data (LOD). This additional demand requires even more ontologies (section 5) and show a system architecture and programming resources to fulfill the new requirements and experimental implementation based on our Extensible Database interfaces. In this paper, we describe a system architecture, which Application Configurator (XCNF) framework. simplifies and automates this problem of publishing environmental data in different data models. Ontologies are applied to map the different models’ syntax and semantics. 2. RELATED WORK Additionally, we present a proof-of-concept implementation A lot of research has been executed in the area of mapping supporting both SOS and LOD interfaces. (data) models. Especially, mapping schemas of relational databases, which have been available for a long time, were in Keywords focus. A good overview of the state-of-the-art is given by [5]. Linked Open Data, Semantic, Sensor Observation Service (SOS), More current research focuses on XML and ontology models [7] Web Publishing, Software Architecture. of which the later have the advantage providing the semantics of the model as well. In addition, mapping between these different kinds of models has been researched. However, until now there is 1. INTRODUCTION no fully automatic mapping algorithm, which solves the problem Geographical data play an increasingly important role in many 100% [6]. Therefore, we center the following work on application fields. Especially in the environmental domain, large simplifying the manual mapping of models by facilitating amounts of measurement data are stored in expert databases. semantic annotations, which can be applied by a domain expert. However, these are not accessible to other public bodies and to An overview of the state-of-the-art in Linked Data is given in the citizens. One reason for this is, among others the lack of use [4]. Tools such as “D2R Server” [12] are used to publish data of standards for accessing the data. stored in relational databases. The data publisher defines a The challenge is not to address a specific standard but the mapping between the relational schema of the database and the increasing number of standards that have to be supported by an target ontology vocabulary with a declarative mapping language. environmental information system. Examples are standards of the Due to this static nature domain experts cannot apply changes Open Geospatial Consortium (OGC) such as Web Feature Service easily. Exemplary works are described in [13] and [14]. (WFS) and the Sensor Observation Service (SOS). At the same time, the pressure to make this data available to the interested public brings up the requirement to support also standards from 3. RELEVANT INTERFACES AND DATA the Linked Open Data (LOD) domain. MODELS Huge efforts on the side of environmental agencies would be The Open Geospatial Consortium (OGC) is concerned with the required to support all of them, which is way beyond the budgets definition of standardized interfaces in the domain of of these institutions. Not only the plain programming work needs geographical information and increasingly in the area of sensor to be considered but also the mapping of the syntax and semantics data ("Sensor Web Enablement"). of the different data models. The difficulty lies especially in the 3.1 Sensor Observation Service (SOS) Copyright © by the paper’s authors. Copying permitted only for private The SOS specification [1] provides operations to retrieve and academic purposes. sensor data and specifically “observation” data. In: S. Vrochidis, K. Karatzas, A. Karpinnen, A. Joly (eds.): Proceedings of The observations themselves are defined by another OGC the International Workshop on Environmental Multimedia Retrieval standard: the Observation and Measurement Model (O&M) [2]. (EMR2014), Glasgow, UK, April 1, 2014, published at http://ceur-ws.org Observations described by O&M can be seen directly as 1 measurements from sensors, but they can also represent other data As noted above in 3.4 XCNF provides a metadata model to structures. describe data models, which can change dynamically. This means that we cannot apply a once-only mapping of the models. Instead, 3.2 Web Feature Service (WFS) the mapping always needs to be adjusted if an end user makes a The Web Feature Service (WFS) represents a change in the change and therefore needs to be dynamic too. way geographic information is created, modified and exchanged on the Internet. Rather than sharing geographic information at the file level using File Transfer Protocol (FTP), for example, the WFS offers direct fine-grained access to geographic information at the feature and feature property level [3]. 3.3 Linked Open Data (LOD) In computing, linked data (often capitalized as Linked Data) describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to Figure 1. Required mapping for accessing time series with share information in a way that can be read automatically by different standards computers. This enables data from different sources to be connected and queried [4]. 4.1 Concept To publish data from XCNF the existing features are used and 3.4 XCNF extended by ontology annotations: XCNF (eXtensible database application CoNFigurator) is a  An ontology is required for each interface which should Java based client/server framework by Fraunhofer IOSB for be supported (SOS, WFS, etc.). The ontology must developing information systems for time series analysis. While contain the specific concepts and properties to describe the framework can be applied to any domain, we mainly apply it the model. Preferably, an existing ontology should be to the domains of water management and water quality. Most of reused. the data are time series with spatial relationships.  All required concepts and their accompanying properties contained in the used ontology must be mapped to existing XCNF Views and their attributes. This is done by annotating them with the URIs of ontology resources. For example if available datasets shall be published as SOS Observations the appropriate XCNF View is annotated with #Observation (this is only the hash part of the URI for better readability). The attributes of the view need to be annotated with properties from the ontologies too, e.g. #hasValue, #hasTime, etc.  Other interfaces (e.g. LOD) can be supported by annotating the views with URIs from the ontology used for the other interface.  The specific publishing service (SOS, WFS, etc.) can now read all of the entries from the related XCNF XCNF uses a proprietary metadata model, which not only Views, annotated by concepts of its ontology. describes the data but also the layout of input forms and search masks. XCNF uses a concept called View. A View provides  Since the structure is given by the ontology the service access to a part of one or more connected databases quite similar can relate multiple views which belong together. to a database view. In contrast to a database view, it provides additional annotations to add semantics to its attributes and link attributes to other views. This has the consequence that every end user creates or extends its own data model by creating or modifying a XCNF View. 4. PUBLISHING AND MAPPING OF DATA MODELS Figure 1 depicts the problem that needs to be solved. Several interface standards with their specific data models have to be Figure 2. Architecture for SOS accessing XCNF mapped with respect to their syntax and semantics to a proprietary data model of an existing information system in the backend. 2 5. ARCHITECTURE AND IMPLEMENTA- #hasUnit GEW_MESSWERT_GUET TION E.DIMENSION_NR The architecture and implementation of an SOS interface is #Phenomenon GEW_PARAMETER described in the following. Other interfaces can be supported in the same way. #hasID GEW_PARAMETER The following figure depicts the components of the system that .BASIS_NR will be described in the following sub-sections: #hasName GEW_PARAMETER KURZNAME 5.1 Ontology #Procedure UIS_SL_MESSVERFAHR Several translations to an ontology are available for the EN Observation and Measurement Model (O&M) [8]. Since they tend to be rather complex we have extracted only those concepts and #hasName UIS_SL_MESSVERFAHR properties which were necessary for the mapping. The following EN.LANGNAME concepts and their properties are used: #hasID UIS_SL_MESSVERFAHR  Observation EN.MESSVERFAHREN_ hasObservedProperty NR measuredByProcedure #FeatureOfInterest GEW_PNST, GEW_MST, hasValue GEW_POSITION relatesToFeatureOfInterest hasTime #hasName GEW_MST.NAME hasUnit  Phenomenon #hasID GEW_PNST.PNST_NR hasName #hasNorthing GEW_POSITION.HW hasID #hasEasting GEW_POSITION.RW  Procedure hasName Since the example above contains German words and hasID acronyms here is a little glossary:  FeatureOfInterest  MESSWERT: measurement hasName  PROBE: observation hasID  GUETE: quality hasNorthing  MESSVERFAHREN: measurement procedure hasEasting  DATUM: date  KURZNAME: short name 5.1.1 Mapping Example  LANGNAME: long name Our test data is taken from the Fachinformationssystem  HW + RW: the geo location Gewässer Qualität (FISGeQua) which contains water quality data from all measurement stations of the German state Baden- 5.2 SOS Requests and Results Württemberg. By facilitating the above mapping, it is possible to receive data The following tables show how the XCNF-Views of FISGeQua from the FISGeQua database to make it accessible via an SOS have been annotated with resources from the SOS ontology to interface. A typical SOS request can be formulated in the support the SOS interface: following way:  Give me all available data which matches the following Ontology Concepts and XCNF Views and conditions: Properties Attributes o The #Phenomenon shall be water #Observation GEW_MESSWERT_GUET temperature. E, GEW_PROBE o The #Procedure, which has been used to determine the water temperature is #hasObservedProperty GEW_MESSWERT_GUET electrometry. E.PARAMETER_NR o The data has been measured in the time range #measuredByProcedure GEW_MESSWERT_GUET of 2nd to 4th January 2005. E.MESSVERFAHREN_NR o The #FeatureOfInterest, which defines the #hasValue GEW_MESSWERT_GUET spatial region, is the measuring point with id E.MESSWERT 1051. #relatesToFeatureOf- GEW_PROBE.PNST_NR This request in the SOS XML notation looks like the Interest following: and the annotations of the views. The service provides the 10 following methods:  /xcnfrestservice/capabilities Provides a list with all 8289 supported models/ontologies TW  /xcnfrestservice/capabilities/viewNames Get the names of all published XCNF views, e.g.: {"viewNames":["GEW_MESSWERT_GUETE", phenomenonTime "GEW_PROBE","GEW_PARAMETER","UIS_SL_M ESSVERFAHREN","GEW_PNST",“GEW_MST“,“G 2005-01- EW_POSITION“]} 02T14:00:00.000+01:00 2005-01-  /xcnfrestservice/capabilities/mapping Get the 04T15:00:00.000+01:00 mapping of the views to the ontology concepts and properties. Note that multiple annotations from different ontologies could be applied if the data should be available via different interfaces! The following shows 1051.0 the mapping part for #Phenomenon and #FeatureOfInterest and with deleted URIs to keep it http://www.opengis.net/om/2.0 {"mappingStructure":{"viewMappingLi st":[{"viewName":"GEW_PARAMETER","con As you can see, the request contains no FISGeQua specific ceptNames":["#Phenomenon"],"mappingLi nomenclatures. Here is the response to this request: st":[{"columnName":"KURZNAME","concep t":"#hasName"}]},{"viewName":"  /xcnfrestservice/capabilities/model/?uri=uri Get the ontology with the given URI. abc8dbd3- 13ff-442a-9e23-80a9ec96881f data mapped to the concept with the given URI. ri&valueList=value Query for the data mapped to the concept with the given URI. The response is filtered with the properties provided in the additional 2005-01- parameters. 03T12:10:00.000+01:00 5.4 SOS server and XCNF-DAO software, provides a SOS implementation based on Java. As 52°North develops the reference implementation for the OGC SOS specification we chose their software (see 6.5 basis for our proof-of-concept implementation. We chose an early access version 4 (4.0.0 Beta2) of the software since it provides much better modularity than version 3. In this new version there is now a defined way for plugging in It says that a water temperature (TW) of 6.5°C has been your own data access into the server via so-called Data Access measured at 1051 on January 3rd. Objects (DAO). Out of the box it retrieves its’ data from a relational database in a proprietary format which did not fit our 4 needs since we wanted a direct access to the data stored in an  Either the XNCF REST Service would need to map its XCNF server for performance reasons. responses to RDF or OWL or The implemented XCNF-DAO plugs into the SOS server. It  we use our extended SOS implementation and map the retrieves the data from the XCNF-REST service by utilizing the resulting XML Observation Collection to RDF or SOS ontology annotations. The retrieved data is handed over to OWL. the SOS server, which handles the syntax formatting and encoding (see Figure 2). The first approach will be faster, because it saves one mapping step. However, it will contain a proprietary solution while the second approach can use existing geospatial standards and might 6. DISCUSSION reuse mechanisms described in [10] and [11]. 6.1 Distribution of Concept Properties over 6.4 Adapting the approach to other systems several Views In this paper, we used the XCNF framework as an example to Analyzing the mapping example described in section 5.1.1, one demonstrate our approach but it can be adapted to other systems can see that it often happens that the properties of one concept as well. To facilitate that, the following steps need to be taken: need to be mapped to attributes, which belong to several different 1. Enable annotation of your relational data (could be done XCNF Views. Here is an example: with a standard relational mapper). The #hasObservedProperty property of an #Observation can be 2. Support multiple mappings (ontologies) found in the XCNF View GEW_MESSWERT_GUETE while the property #relatedToFeatureOfInterest is contained in XCNF View 3. Add the possibility for the user to dynamically change GEW_PROBE. the mapping Requesting the #Observation concept via the integrated XCNF 4. Provide the means to publish only selections of the data View filtering option filtered with #hasObservedProperty=A or (done by XCNF views in our approach). requesting the #Observation concept filtered with #relatedToFeatureOfInterest=B will lead in both cases to too many results if the second filter option is missing. 7. CONCLUSION To support the filtering mechanism of the XCNF REST In this paper, we presented a concept for dynamically mapping Service data models of domain expert systems to different interface /xcnfrestservice/data/filter/?uri=uri&propertyList=uri&valueList= standards by annotating the model with resources from an value, the implementation must provide an additional filtering ontology. In contrast to static approaches like D2R shown in the operation before returning the results via the URI. related work section, this allows for quicker adaptions to new requirements by the domain expert. 6.2 Reducing the Amount of Data to be The described implementation shows that the concept is applicable to a real world scenario. In the future, we will work on published removing the discussed drawbacks and improve the user interface Often only subsets of the data in the database are foreseen for for executing the mapping. For example, ontology properties for publishing. Therefore, we need a mechanism for defining which an annotation could be suggested to the user depending on the subsets of the data in the database can be delivered via the XCNF data type and the selected ontology concept. Furthermore, since REST Service. XCNF views already contain some metadata annotations it is XCNF foresees the possibility to create so called BDOs interesting to explore to what degree the mappings can be created (“Benutzerdefiniertes Objekt”), which are user-defined objects. It automatically. is possible to create a BDO which reduces the amount of data in the database to the subset which shall be published, e.g. via defining specific measurement points, a specific time range or 8. REFERENCES specific phenomena. [1] Bröring, A., Stasch, C., Echterhoff, J. (Ed.) 2012. OGC® Currently we consider implementing the following mechanism: Sensor Observation Service Interface Standard, Version: 2.0, 1. The #Observation concept in the ontology needs to be Open Geospatial Consortium Inc., 12-006 extended with a new property #hasBDO. [2] Cox, C. (Ed.) 2011. Observations and Measurements - XML 2. The owner of the database needs to define a specific Implementation, Version:2.0, Open Geospatial Consortium, BDO for the data subset to be published. 10-025r1 3. This BDO needs to be annotated with #hasBDO. [3] Vretanos, P. (Ed.) 2010. OpenGIS Web Feature Service 2.0 Interface Standard, Open Geospatial Consortium, OGC 09- 4. The implementation of the XCNF REST Service 025r1 and ISO/DIS 19142 /xcnfrestservice/data?uri=uri and its filter mechanism need to be extended with an additional filter [4] Bizer, C., Heath, T., Berners-Lee, T. 2009. Linked Data - (propertyList: #hasBDO, valueList: #8289) which is not The Story So Far, International Journal on Semantic Web seen from outside the XCNF Rest Service. and Information Systems 5 (3): 1–22. doi:10.4018/jswis.2009081901 6.3 Ideas for Integrating Linked Open Data [5] Bellahsene, Z., Bonifati, A., Rahm, E. 2011. Schema The possible support of Linked Open Data was another idea we Matching and Mapping, Springer Verlag, doi:10.1007/978-3- had. Therefore, the architecture foresees the possibility to support 642-16518-4 several interfaces. The additional support of LOD would require [6] Bernstein, P., Madhavan, J., Rahm, E. Generic Schema that we provide our data in RDF or OWL format. For our current Matching, Ten Years Later, 37th International Conference implementation the following two possibilities exist: 5 on Very Large Data Bases (Seattle, Washington August 29th Washington, DC, USA, October 2009; CEUR-WS: Aachen, - September 3rd 2011) Germany, 2010; Volume 522, pp. 49–63. [7] Gross, A., Hartung, M., Thor, A., Rahm, A. 2012. How do [11] Probst, F., Gordon, A., Dornelas, I. 2006. Ontology-based computed ontology mappings evolve?, Joint Workshop on Representation of the OGC Observations and Measurements Knowledge Evolution and Ontology Dynamics (ISWC 2012) Model, Open Geospatial Consortium [8] Compton, M., et al. 2012. The SSN ontology of the W3C [12] Bizer, C., Cyganiak, R. 2006. D2R Server - Publishing semantic sensor network incubator group, Web Semantics: Relational Databases on the Semantic Web. Poster at the 5th Science, Services and Agents on the World Wide Web 17, International Semantic Web Conference (ISWC2006) p25-32. [13] Moraru, A., Fortuna, C., Mladenic, D. 2011. A System for [9] Fielding, R., Taylor, R. 2002. Principled Design of the Publishing Sensor Data on the Semantic Web. Journal of Modern Web Architecture, ACM Transactions on Internet Computing and Information Technology - CIT 19, 2011, 4, Technology (TOIT) (New York: Association for Computing 239–245, doi:10.2498/cit.1002030 Machinery) 2 (2): 115–150, doi:10.1145/514183.514185, [14] Page, K., Frazer, A., Nagel, B., De Roure, D., Martinez, K. ISSN 1533-5399 2011. Semantic Access to Sensor Observations through Web [10] Page, K., De Roure, D., Martinez, K., Sadler, J., Kit, O. APIs, Fifth IEEE International Conference on Semantic 2009. Linked Sensor Data: RESTfully Serving RDF and Computing GML. In Proceedings of 2nd International Workshop on Semantic Sensor Networks (SSN09), conjunction with the 8th International Semantic Web Conference, ISWC 2009, 6