Semantically enhancing SensorML with Controlled Vocabularies in the Marine Domain Alexandra Kokkinaki, Louise Darroch and Justin Buck Simon Jirka and the Marine Profiles for OGC Sensor The British Oceanographic Data Centre (BODC) Web Enablement Standards Team, National Oceanography Centre (NOC) 52°North GmbH, Liverpool, UK Muenster, Germany alexk@bodc.ac.uk jirka@52north.org Abstract—During the last decade, data produced by sensors a wide variety of domains and disciplines that utilize sensors have increased exponentially in the environmental domain. to make observations. Such flexibility increases the risk of Standardization is necessary in order to integrate data slightly different implementations of the standards, which originating from disparate sensor networks. Over the last few prevent sensors from becoming fully interoperable and years, marine organizations and communities have been working discoverable via the Web. towards the standardization of sensors, by implementing OGC SWE (Sensor Web Enablement) standards, i.e. Sensor Model The Open Geospatial Consortium’s SWE activities aim at Language (SensorML) to describe sensor metadata, Observations enabling “Sensor Webs”, through which applications and and Measurements (O&M) to describe sensor data and Sensor services will be able to access sensors of all types over Observation Service (SOS) to serve them to the world. In networks, such as the internet, and with the same standard addition, many European and US projects such as AtlantOS, technologies and protocols that enable the Web. These SenseOCEAN, BRIDGES, XDomes, FixO3, PANGEA etc. have initiatives have defined, prototyped and tested several been implementing OGC SWE standards to achieve machine to foundational components needed for a Sensor Web, namely: machine communication and interoperability with other sensor the Sensor Model Language (SensorML), Observations and networks. Measurements (O&M), Sensor Planning Service (SPS), SensorML is an XML based language that was purposely Transducer Markup Language (TML), Sensor Alert Service defined to offer many degrees of flexibility, to describe sensors (SAS), Sensor Observation Service (SOS) and Web with different requirements across different domains. SensorML Notification Service (WNS). is lenient enough to allow user generated terms to be encompassed in its syntax. As convenient as it sounds, this In this paper we concentrate on Sensor Model Language flexibility can result in many different variations of sensor (SensorML) which primarily focuses on providing a robust descriptions, which reduce interoperability and discoverability and semantically-tied means of defining processes and via the Web. To resolve this, it is important to bring together processing components associated with the measurement and potential user communities, identify lists of required terms, post-measurement transformation of observations. The main define them and then use controlled vocabularies to publish them objective of SensorML, which is an XML based language, is according to standards. to enable interoperability, first at the syntactic level and later In this paper, we will describe the ongoing work done by the at the semantic level (by using ontologies and semantic marine community, through the Marine SWE Profiles mediation), so that sensors and processes can be better collaboration, to create a more restrictive, semantically richer understood by machines, utilized automatically in complex subset of SensorML, by identifying, formalizing and publishing workflows, and be easily shared between intelligent Sensor on the Web the required terms and their definitions according to Web nodes [1]. Syntactic interoperability, which is the ability W3C standards. of two or more systems to communicate with each other, is Keywords— Controlled vocabularies; XML; soft typing; solely achieved by the wide adoption of the standard by the SensorML standardization; NVS2.0 communities that wish to communicate. Semantic interoperability, which is the ability to automatically interpret the information exchanged meaningfully and accurately, I. INTRODUCTION requires that both sides defer to a common information exchange reference model. The content of the information The marine domain has started to implement Open exchange requests is unambiguously defined: what is sent is Geospatial Consortium’s (OGC) Sensor Web Enablement the same as what is understood [2]. (SWE) standards in order to make all types of sensors, transducers and sensor data repositories discoverable, SensorML was designed to describe different types of accessible and useable via the Web. OGC SWE standards domain and discipline independent processes. As its creators were deliberately designed to be generic enough to encompass state, “In order to achieve interoperability within and between various sensor communities, implementation of SensorML created the SensorML ontology to list these terms, through the will require the definition of community specific semantics Marine Metadata Interoperability (MMI) project. It hosts an (within online dictionaries or ontologies) that can be utilized Ontology Registry and Repository hosting a number of small within the framework” [3]. This is true since most properties project specific controlled vocabularies [5]. Since different in SensorML utilize the concept of "soft-typing". That is, communities require different terminologies, the ontology can rather than trying to pre-define in the schema every possible fulfill only a subset of the required concepts. property that might be used to describe a particular sensor or might be measured by a sensor, SensorML allows property The marine community has been using controlled types to be defined outside of the SensorML schema (typically vocabularies, i.e. standardized sets of terms, in tagging within an online ontology) and then be used within SensorML metadata and labeling data in order to solve the problem of as a value to the definition attribute. The value of the ambiguities associated with data markup and enable records definition attribute must be a resolvable URL that references which are interpreted by computers. Controlled vocabularies an online property definition or single entry within an for the marine community are served by a number of servers ontology [4]. including the NERC Vocabulary Server version 2.0 (NVS2.0). This server provides access to lists of standardized terms that The list of properties that can be used to describe a sensor cover a broad spectrum of disciplines relevant to the can grow long especially if more than one sensor domain is oceanographic and wider community. NVS2.0 makes use of considered. The following example demonstrates the the World Wide Web Consortium's Simple Knowledge difficulties encountered when using SensorML to describe a Organization System (SKOS) to represent knowledge in a sensor. In this example, User A classified “fluorometer X” format understandable by computers. In SKOS, vocabularies under the category of “Fluorometers”. As shown in Figure 1, are modeled as collections and terms are modeled as concepts. User A defined a property named “Instrument Type” and Collections and concepts have unique URIs that are resolvable published it in an ontology. He then added the free text value through a RESTful interface to either HTML or RDF “Fluorometer” as the value of the instrument type. User B, documents through content negotiation. Collections are also following the same pattern, chose the term “Sensor Category” accessible through SOAP Web Services and a SPARQL to classify his sensor and assigned the term “active endpoint 1. flurometer” - wrongly spelled - as a value to the property. Software client X failed to discover all available fluorometers, In this paper we present an initiative from a collaboration as it used different term definitions and term values from those within the marine community to create and maintain several used by User A and User B. controlled vocabularies, to semantically enhance SensorML and bring semantic interoperability amongst environmental sensor networks. Instrument Type Fluorometer Our work to semantically enhance SensorML for the marine domain comprises four distinct steps. Step one is the formalization of the concepts and definitions used to describe sensors in the marine domain and their organization in collections. The next step is the publication of these concepts Fig. 1. User A SensorML description and collections using unique URIs. Step three is the definition of internal mappings between concepts and other NVS2.0 concepts sharing the same meaning. The last step is the definition of external mappings with overlapping concepts from the SensorML ontology and other well-known Sensor Category vocabularies, e.g. DBpedia, thus making sensors more Active flurometer accessible and discoverable via the Web. A. Marine SWE Profiles To avoid interoperability issues in OGC SWE Fig. 2. User B SensorML description implementations by different organizations and users, an agreement was needed on how to apply SWE concepts and SensorML creators have identified the benefit of how to use vocabularies in a common way that would be ontologies since the publication of the standards by stating shared by different projects, implementations, and users. that: “Sensor ontologies are becoming increasingly important for creating standard dictionaries of sensor-related Partners from several projects and initiatives (AODN, terminology and for mapping relationships between these BRIDGES, ENVRI+, EUROFLEETS /EUROFLEETS2, terms.” Many sensor technologies, including the Sensor Web FixO3, FRAM, IOOS, Jerico/Jerico-Next, NeXOS, Enablement (SWE) encodings and Web services, depend on ODIP/ODIP II, RITMARE, SeaDataNet, SenseOcean, X- and benefit greatly from the existence of online, resolvable ontologies of terms related to sensors. SensorML creators have 1 http://vocab.nerc.ac.uk/sparql/ DOMES) created the Marine SWE Profiles group as a solution • Key — a compact permanent identifier for the to the need mentioned above. They joined forces to develop collection, designed for computer storage rather than common marine profiles of OGC SWE standards that can be human readability used in multiple projects and organizations. [6] • Title — a text string representing the title of the Marine SWE Profile members interact and communicate vocabulary in human-readable form through the use of a mailing list and a wiki website. The wiki helps to collect and discuss different approaches to how OGC • Abbreviation — a concise text string representing the Sensor Web Enablement (SWE) standards (Sensor title in human-readable form where space is limited Observation Service (SOS), Observations and Measurements • Date — latest publication date (O&M) and SensorML are used in different projects and systems. It is currently structured in the following subsections, • Definition — full description of what the vocabulary which can be edited by its members after logging in: describes. • SweExamples: Examples of SensorML, O&M and • Creator — the organization that created the vocabulary SOS usage • Owner — the organization that owns the vocabulary • SweVocabularies: Vocabularies for the Marine SWE Profiles • Manager — the organization that manages the vocabulary • SweProfile: Structure and proposed content of the Marine SWE Profiles • Publisher — the organization that publishes the vocabulary • SosInventory: Inventory of SOS Servers The RDF snippet in Figure 3, demonstrates the information The Marine SWE Profiles mailing list is essentially a originating from W05 2 vocabulary that contains SensorML discussion list. Members are allowed to post their own items characteristic terms. which are broadcast to all of the other mailing list members. responsibility to act as the SensorML vocabulary content governance, which is important in order to stay up-to-date and SensorML Characteristic Section Terms in sync with ongoing developments. The publication of SensorML implementations by different SensorML Characteristic Section Terms projects revealed the lack of published vocabularies for term SensorML Characteristics and property definitions and the need for common vocabularies to refer to the same terms coherently in the SensorML Characteristics marine domain. Terms used in SensorML to describe properties of an observation system that do not further qualify B. NVS2.0 or quantify its output values. The NERC Vocabulary Server version 2.0 (NVS2.0) 2016-09-15 02:00:04.0 provides access to lists of standardized terms that cover a 2 broad spectrum of disciplines relevant to the oceanographic British Oceanographic Data and wider community. Centre NVS2.0 is based on the Simple Knowledge Organization Natural Environment Research Council System (SKOS) model. SKOS is based on the "concept", which it defines as a "unit of thought", that is an idea or Sensor Web Enablement Marine Profiles notion. In NVS2.0, each vocabulary is a collection and owns a unique URI that resolves, after content negotiation, in a self- descriptive RDF document or an HTML page if a machine or Sensor Web Enablement Marine Profiles a human entity requests it respectively [7]. Governance for vocabularies created for use NVS2.0 URIs are published using the following pattern: in SWE Marine Profiles. http://vocab.nerc.ac.uk/collection/XXX/current/ for collections and http://vocab.nerc.ac.uk/collection/XXX/current/YYYYY for concepts, where XXX is a three character code referring to a vocabulary collection and YYYYY is a variable length code Fig. 3. RDF code for NVS2.0 vocabularies uniquely identifying each concept in the collection, e.g. http://vocab.nerc.ac.uk/collection/P07/current/3AKCHY57/. Additionally, vocabularies contain lists of terms classified as SKOS concepts, each one having a unique URI resolving to Each controlled vocabulary delivered by NVS2.0 contains an RDF or HTML document, as for collections. The controlled the following information: 2 http://vocab.nerc.ac.uk/collection/W05/current/ vocabularies delivered by NVS2.0 contain the following Climate and Forecast standard names, have been nominated to information for each term: serve SensorML observable properties. The L05 collection, • Key — a compact permanent identifier for the term, designed for computer storage rather than human readability Instrument Type readable form http://vocab.nerc.ac.uk/collection/L05/current/113/ • Abbreviation — a concise text string representing the term in human-readable form where space is limited • Definition — a full description of what is meant by the term Fig. 4. SensorML code snippet All of the vocabularies are fully versioned and a permanent which lists device categories, is used for the classification of record is kept of all changes made. NVS2.0 can be accessed in instruments and procedures. L06, in the same respect, provides three different ways: through a SOAP service, a RESTFul a list of platform categories to be used for classifying interface and a SPARQL endpoint. platforms. G04 and C86 list roles and populate SensorML’s NVS2.0 was chosen by the Marine SWE Profiles role property. C19, which is the Salt and Fresh Water Body community to publish SensorML terms, as it and its Gazetteer, can be used to create a rich list of features of predecessors have successfully served the marine community interest. L35 and C75 can be both used to populate the for more than ten years. The use of NVS2.0 within the manufacturer property, since they refer to organizations and European Union SeaDataNet project is outlined in [8]. In the manufacturers respectively wider arena, the Ocean Data Interoperability Platform (ODIP) The absence of a standard list of term definitions initiated is an international collaboration of data management the SWE Marine members’ collaboration and agreement. The organizations which includes SeaDataNet. They are fostering SWE Examples wiki subsection was used to collate the best practices and common standards. In addition, they are various SensorML descriptions posted by the members. creating prototypes to enable the transfer of technologies. The Subsequently, the group identified the common terms under NVS2.0 has been utilized within ODIP prototype 2 to each section and provided a common name for the underpin interoperability by linking EU, US and Australian semantically same but differently named terms. The terms research cruise programs by providing cruise information at an were also enhanced with definitions and alternative labels and international level. were published on the SWE Vocabularies wiki page for review and final approval. The list was then submitted to the III. RESULTS Vocabulary Management Group at BODC. They performed final checks on integrity and conformity before accepting the This work, which is based on standards, aims to list for publication. semantically enhance SensorML in the marine domain according to W3C standards. Thus, it allows computers not For new terms and vocabularies, a new process has been only to communicate, but also to seamlessly understand the established. Members are encouraged to post the desired set of communicated information. terms on the wiki, complementing it with a title and a definition. The set is then checked by the Vocabulary Group in There are essentially two sections in SensorML that would BODC and if any changes are applied, it is posted again on the benefit by the use of vocabularies: The term definition and the wiki. The group needs to approve the changes to finally be term value, as shown in Figure4. published on the Web. Disagreements are discussed in the mailing list. For “term values”, Marine SWE Profiles members agreed to use existing concepts in NVS2.0. The following collections SensorML consists of sections which include several were identified to adequately serve term values: terms. In NVS2.0, each section is modeled as a new vocabulary, holding a unique URI, listing a set of domain • Observable property: NVS2.0 Collections P01, P07 relevant terms. Following the NVS2.0 URL pattern, • Instrument Type: NVS2.0 Collection L05 SensorML vocabularies are all grouped under the ‘W0X’ notation as shown in Table 1, although there is no semantic • Platform Type: NVS2.0 Collection L06 relevance between the vocabulary’s subject and the notation. • Roles: NVS2.0 Collections G04, C86 Each vocabulary is self-documented and refers to the Marine SWE Profiles group as its creator and owner. BODC is the • Feature of Interest: NVS2.0 Collection C19 manager and moderator and NERC is the publisher. • Manufacturer: NVS2.0 Collections L35, C75 The XML code snippet in Figure 4 displays the standardized version of the examples shown in Figure 1 and NVS2 Collection P01, which lists terms used to describe Figure 2 respectively. The different term definitions and individual measured phenomena and P07, which is a list of the values were merged under unique URIs, which are IV. DISCUSSION AND CONCLUSIONS accommodated in the SensorML code. The need for controlled and defined vocabularies in SensorML has been clear since its creation and this became A. Mappings evident as its use matured. In the marine community, the As stated previously, links from NVS2.0 concepts to other exposure of different SensorML implementations under the data sources can only benefit metadata tagged with these collaborative environment of the Marine SWE Profiles wiki concepts as they become more discoverable on the Web. The and the success of NVS2.0 were the two key factors that mapping process is still ongoing and the objective is to resulted in the creation of the SensorML vocabularies. As initially use the owl:sameAs property for stating that another stated in [10], vocabularies should be published by a trusted data source also provides information about a specific NVS2.0 group and they should be accessible for a long period. concept. The RDF links will be set manually for the mappings NVS2.0 fully meets these conditions. A critical element in this work was the vocabulary governance, applied by the SWE of NVS2.0 to MMI and to other NVS2.0 concepts. Marine Profile group, as a list of specialized users, as opposed to one authority, resulting in trust and acceptability of the new Table 1. Table listing the URI and the description of the published SensorML vocabularies. collections URI Title Although SensorML ontology published under the MMI http://vocab.nerc.ac.uk/collection/W03/current/ SensorML History project offers several terms and definitions, it does not capture Event Types all of the marine domain. As a result, the SWE Marine Profile http://vocab.nerc.ac.uk/collection/W04/current/ SensorML community chose NVS2.0 to host new domain-specific Capability Section Terms vocabularies. As stated in [10], mappings will be established http://vocab.nerc.ac.uk/collection/W05/current/ SensorML between vocabularies where there are overlapping terms to Characteristic enhance the discoverability of sensor metadata and to inform Section Terms users how terms relate with each other. http://vocab.nerc.ac.uk/collection/W06/current/ SensorML Classification The creation of the SensorML vocabularies draws the Section Terms required boundaries for the uniform use of the language in the http://vocab.nerc.ac.uk/collection/W07/current/ SensorML marine domain, but it also enhances SensorML semantically. Identification Section Terms http://vocab.nerc.ac.uk/collection/W08/current/ SensorML Contact V. CONCLUSIONS Section Terms SensorML’s flexibility, specifically the soft typing characteristic, causes variability in published sensor B. Applications descriptions, thereby reducing interoperability and discovery Enhancing SensorML with standardized lists of terms via the Web. To address this issue, Marine SWE Profiles ensures interoperability between different implementations of group decided to formalize the required terms and publish OGC SWE sensor descriptions. Providing these vocabularies them in the form of controlled vocabularies served by NVS2.0 vocabulary server. The collections and terms are governed by as allowed values through drop down lists for “term values” in the group and maintained by BODC so they are assured and SensorML editors leads to interoperable SensorML accepted by the community. The work is ongoing and descriptions. A worthy example is the EDI Metadata Editor, includes mappings between terms that share common meaning which is a template-driven metadata authoring tool that can be in NVS2.0, SensorML ontology and other existing easily customized to any XML-based metadata format (e.g. vocabularies. SensorML) and to a specific workgroup, institute, or project. It also connects to the NVS2.0 SPARQL endpoint to provide This work, which is highly collaborative, shows what can lists of allowed terms for property values [9]. be achieved when people reuse existing well-functioning infrastructures and join forces to handle interoperability issues. Additionally, SOS clients can leverage the standardization of SensorML to easily discover sensors based on their characteristics. For example, client software searching for ACKNOWLEDGMENT Instrument Types - as defined in NVS2.0 concept This work is funded by the European projects http://vocab.nerc.ac.uk/collection/W06/current/CLSS0002/ - SenseOCEAN and BRIDGES and supported by the National being fluorometers - as defined in Environmental Research Council (NERC). SenseOCEAN is a http://vocab.nerc.ac.uk/collection/L05/current/113/ - will be able to locate all relevant sensor descriptions that have been Collaborative Project funded by the European Union Seventh described with the vocabularies mentioned above. As a Framework Programme (FP7/2007–2013) under grant consequence, sensors described in SensorML become agreement No. 61414. The BRIDGES project has received standardized, more discoverable and usable via the Web. funding from the European Union's Horizon 2020 research and innovation program under grant agreement No 635359. The NVS2.0 server is supported by NERC National Capability (NC) funding for NC-services, facilities and data (NC-SFD). REFERENCES [1] "Sensor Model Language (SensorML) | OGC", Opengeospatial.org, 2016. [Online]. Available: http://www.opengeospatial.org/standards/sensorml. [Accessed: 31- Jul- 2016] [2] "Semantic interoperability of health information", En13606.org, 2016. [Online]. Available: http://www.en13606.org/the-ceniso-en13606- standard/semantic-interoperability. [Accessed: 31- Jul- 2016] [3] Portal.opengeospatial.org, 2016. [Online]. Available: http://portal.opengeospatial.org/files/?artifact_id=21273. [Accessed: 31- Jul- 2016] [4] "SensorML 2.0 Metadata - Identifiers and Classifiers", Sensorml.com, 2016. [Online]. Available: http://www.sensorml.com/sensorML- 2.0/examples/identifiers.html. [Accessed: 31- Jul- 2016] [5] J. Graybeal, A. Isenor and C. Rueda, "Semantic mediation of vocabularies for ocean observing systems", Computers & Geosciences, vol. 40, pp. 120-131, 2012.. [6] S. Jirka, "Marine Profiles for OGC Sensor Web Enablement Standard", in EGU General Assembly, Vienna, 2016, p. 14690. [7] A. Leadbetter, The Semantic Web in Earth and Space Science. Current Status and Future Directions Part II Chapter 2 Linked Ocean Data, 1st ed. 2016, pp. 11-31 [Online]. Available: https://books.google.co.uk/books?isbn=161499501X. [Accessed: 31- Jul- 2016] [8] D. Schaap and R. Lowry, "SeaDataNet – Pan-European infrastructure for marine and ocean data management: unified access to distributed data sets", International Journal of Digital Earth, vol. 3, no. 1, pp. 50-69, 2010. [9] C. Fugazza, A. Oggioni, M. Pepe, F. Pavesi, P. Carrara. DATA 2014 - 3rd International Conference on Data Management Technologies and Applications. DOI: 10.5220/0004997603490356 [10] "Best Practices for Publishing Linked Data", W3.org, 2014. [Online]. Available: https://www.w3.org/TR/ld-bp/. [Accessed: 31- Jul- 2016]