Semantic Model Extensibility in Interoperable IoT Data Marketplaces: Methods, Tools, Automation Aspects Yulia Svetashova [0000−0003−1807−107X] Robert Bosch GmbH, Corporate Sector Research and Advance Engineering Robert-Bosch-Campus 1, 71272 Renningen, Germany yulia.svetashova@de.bosch.com Karlsruhe Institute of Technology, AIFB Kaiserstr. 89, 76133 Karlsruhe, Germany Abstract. The proposed research is concerned with the problem of ex- tending semantic models used in the IoT data-sharing contexts with new elements. These suggestions are made by the domain experts at run time and then validated by the ontology engineers. The process of terms sug- gestion is organized as a modeling dialogue: dynamically generated user interface forms with labels in natural language assist the domain experts in the creation of structured proposals. We hypothesize that this commu- nication procedure will increase the efficiency and the perceived usability of performing change operations at different stages of the ontology evo- lution process. We then suggest a series of experiments to evaluate our approach. Keywords: Semantic Interoperability · Internet of Things · Data Mar- ketplace · Ontology Evolution · Ontology-Enhanced User Interface 1 Context and Motivation The study by International Data Corporation “Data Age 2025” estimates a 10- fold growth of annually created data in 2025; a quarter of this data is predicted to be real-time in nature and 95% of it is expected to originate from the Internet of Things (IoT) [17]. According to Manyika et al. [13], despite this constant increase in the amount of IoT data, the lack of interoperability leads to it being used mostly for anomaly detection and current operations, thereby leaving its prediction and optimization potential untapped. On the other hand, the data-intensive AI-based algorithms and applications which are intended to transform the everyday lives of people and businesses, have a critical dependency on high quality data which is difficult for them to obtain. Sharing this IoT data can be thus beneficial for both data owners and potential data consumers. To satisfy this increasing demand, a new type of IoT data-sharing platform – the data marketplace, has started gaining popularity in recent years [19]1 . 1 The examples of these marketplaces include: DatabrokerDAO, IOTA data marketplace, App- Nexus, Trimble, Datum, MDM (traffic data) and Wibson (personal data). 2 Yulia Svetashova The IoT data marketplaces trade data streams and/or aggregated sensor data and services. These marketplaces aim at becoming a “central point of dis- coverability” for IoT data. They are also required to ensure interoperability by defining “metaformats and abstractions” [12] for cross-standard, cross-platform and cross-domain use cases. These “metaformats” (marketplace standards, in our case ontology-based semantic models) serve a twofold purpose: 1) catego- rization of data offerings – related or similar offerings are placed closer together and can be retrieved with queries of different levels of generality; 2) data in- tegration – input and output parameters of different offerings are expressed in terms of one mediated schema making heterogeneous data jointly usable in the consumer application context. 2 Problem Statement The creation and use of semantic models for automated IoT resource or ser- vice discovery and subsequent data integration at run time presents two main challenges. The first challenge is that these models need to provide unified se- mantics to capture not only raw sensor data but context information [16] which becomes critically important when relevant data is filtered and/or hetero- geneous sources are combined. The second challenge is that these models require constant adaptations based on the change requests coming from varying users who want to provide data with very different underlying schemata and real world conceptualizations. An early study of semantic interoperability in decentralized distributed en- vironments by Vetere and Lanzerini [22] stresses that a “suitable coverage of primitive concepts with respect to the business domain (completeness) that en- tails the possibility to progressively enhance conceptual schemas (extensibility)” is the key success factor in the task of data integration with which we are faced. The majority of current interoperability efforts in IoT reflect this requirement. Nonetheless, these schema enhancement methods of including new model ele- ments, at run time, in the absence of a centralized semantic authority, have not yet been examined in a controlled setting. The current dissertation will address these issues. The results of this study will be the basis of the partially automated algorithm of model extension at run time. 3 Related Work The present work takes into account the results of current European initiatives in the field of IoT semantic interoperability, in particular, the experience accumulated in the BIG IoT2 project and semantic modeling solutions tested in the IoT-Lite [2], FIESTA-IoT [1] and Inter-IoT [7] projects. The BIG IoT project follows the approach of reusing existing ontologies by combining them within the data-sharing platform. 2 “Bridging the Interoperability Gap in the Internet of Things”, see the description at the project website: http://big-iot.eu; the marketplace Web portal: https://market.big-iot.org. Semantic Model Extensibility in IoT Data Marketplaces 3 The idea of thematic templates (see below in Sec. 6) supporting the model en- richment process extends various concepts of patterns circulating in the Seman- tic Web community: “ontology design patterns” [6], design patterns translated into templates [11], observation-driven “geo-ontology patterns” [10], “ontology alignment design patterns” [18]. The thematic template of a model element is its initial broad categorization and a coherent set of prototypical relations in the enclosing model that characterize elements of this category. Another relevant research domain for the present study is ontology evolu- tion defined as “timely adaptation of an ontology to the arisen changes and the consistent propagation of these changes to dependent artefacts” [20]. Flouris et al. [4] distinguishes 6 ontology evolution sub-tasks: 1) capturing required changes, 2) change representation using a formal language, 3) testing effects of the changes, resolving conflicts and forming a complete change request, 4) change implementation and verification, 5) change propagation to dependent data and affected applications, and finally, 6) change validation by the ontology engineer. Current research is planned to contribute mostly to the sub-tasks (1) and (2), but the effects of our approach on other evolution phases will be also investigated. The proposed ontology-enhanced user interface [15] reflects the conception of the modeling process “as a dialogue between participants”, “rooted in the sharing, translation, negotiation, argumentation, and imposing of (participant- based) knowledge representations” [9]3 . We adapt this approach to the new con- cept suggestion procedure. The research on distributed collaborative tagging systems and, in partic- ular, the investigation of the underlying cognitive mechanisms formalized in the semantic imitation model of social tag choices by Fu et al. [5] has an additional influence on our work. 4 Research Questions We understand the general approach to Semantic Web technologies-based infor- mation sharing to be “attaching semantic meta-data to information items, and (...) relating these metadata to each other through background knowledge in the form of ontologies” [21]. On this basis, this study will address the following research questions related specifically to the model enrichment process: – RQ1 How can we simplify the process of semantic model extension at run time for domain experts who propose new model elements4 ? – RQ2 How can we simplify the process of semantic model extension for on- tology engineers who validate new element proposals created by the domain experts at run-time? – RQ3 Which aspects of the new element inclusion and validation processes can be automated and to which degree? 3 See also an approach to knowledge authoring as dialogue in Parvizi et al. [14]. 4 Particularly, we concentrate on those domain experts who have only limited exposure to the model and who are not knowledge modeling experts. 4 Yulia Svetashova 5 Hypotheses The hypotheses corresponding to our research questions are the following: H1: The organization of the semantic model extension process as a modeling dialog (coupled with data annotation process) by using an ontology-enhanced user interface where modeling primitives (basic concepts), related model frag- ments and similar offerings are made accessible, can simplify the process of suggesting the new concepts by domain experts. H2: The resulting structured proposals, where concepts are linked to the corresponding super-classes and relations are established with other relevant concepts, can simplify the process of the validation and approval of new elements by ontology engineers. H3: The proposal validation process can be, at least partially, automated by a machine learning classifier. Validation can be modeled as a single-label mul- ticlass classification task where a set of categories {“Accept”, “UseExisting”, “ModifyAndAccept”} represent the decisions 5 . The features for model train- ing are derived from the model element characteristics (structured annotations) proposed by the domain expert, the context of a suggestion and the computed statistics of all marketplace offerings. 6 Approach and Preliminary Results Empirical and conceptual findings of the proposed research as well as the re- sulting software artifacts will be an extension of the BIG IoT project. We first outline the relevant model design and system architecture considerations of the project and then indicate the novel components we propose. There are three groups of the semantic modeling artefacts in BIG IoT. The Core ontology represents the concepts relevant for the marketplace operation: core:OfferingQuery, core:Provider, core:Consumer, core:Offering, etc.). Two do- main ontologies – Mobility and Environment – provide terms to characterize the content of the data/service offering: mobility:ParkingSiteStatus, environ- ment:PollutionIndicator, mobility:TrafficSpeed, etc.). Finally, the Application on- tology is used for organizing navigation through the model elements in the mar- ketplace Web portal6 . The marketplace Web portal is used to assist data providers in describing their data and data consumers in formulating queries (see Fig. 1). The user starts by selecting a category and a subcategory, and based on this initial choice, the set of available annotations for the output data is shown in dropdown lists. If users do not find a matching semantic term, they can propose a new one by typing the concept name. 5 I owe the idea of modeling the subject expert decision as a classification task to Dmitry Mironov (private conversation). 6 The models follow Linked data principles and are published as schema.org custom extensions; the example terms are dereferenceable provided that the namespace prefixes can be expanded as follows: core – http://schema.big-iot.org/core/, mobility – http://schema.big-iot.org/mobility/, environment – http://schema.big-iot.org/environment/. Other prefixes in the paper are used as declared on prefix.cc Semantic Model Extensibility in IoT Data Marketplaces 5 The new term is assigned a namespace prefix “proposed” and stored as a metadata annotation for the current offering and as a potential model element. This will be later approved by an ontology engineer responsible for the market- place models maintenance. Fig. 1. Schematic representation of the data description process (current state). Given the (1) output sample to be described as a data offering and (2) the ontology model to be used, the marketplace user interface (3) provides a set of expected annotations depending on the selected category. The resulting description in RDF format is stored in the triple store (4) as a part of the offerings metadata graph. Fig. 2. Schematic representation of the proposed data description and model exten- sion process. Components (5) – (8) marked red, point to the main modifications in comparison with the current state of the system. 6 Yulia Svetashova Addressing RQ1, we suggest the following modifications of the new model element inclusion process (they are summarized in the Fig. 2 above): (1) All marketplace domain ontologies are aligned with the SSN/SOSA on- tologies [8]; the categorization of the offerings is coupled with the concept of sosa:FeatureOfInterest ( 5 ). (2) Data-driven thematic templates (we start with temporal, spatial and sensor measurement templates), are derived from a representative set of data, matching the marketplace scope, and stored as a collection of annotation graphs in a triple store (the colored layering in the sample output 1 in Fig. 2 reflects thematically different data). (3) Proposing a new model element, the user first selects a thematic template which provides a high level characterization of the proposed concept. The sys- tem, in turn, sends a request to the back-end module 6 whose main function is to retrieve a corresponding annotation graph from a triple store and to dynamically generate a series of web forms 7 based on the structure of the annotation graph (the process is exemplified in Table 1). Table 1. Modeling as dialogue: triples from the sensor measurement annotation graph are transformed into questions in natural language which become labels of the corre- sponding user interface elements. Basic graph pattern with variables in Turtle syntax. Annotation Triple(s) UI Label UI Element sosa:Observation “Is the observation related to Car ?” sosa:hasFeatureOfInterest (actual sosa:FeatureOfInterest Toggle switch ?FeatureOfInterest . selected) sosa:Observation “Which property of Car is sosa:observedProperty Text field observed?” ?ObservableProperty . sosa:Observation sosa:hasResult sosa:Result [ Dropdown list “Which unit of measurement rdf:type with units is used?” qudt:QuantityValue [ of measurements. qudt:unit ?Unit ]]. (4) By answering questions displayed in the web forms, the user creates a structured semantic annotation of the new concept 8 . The annotation is persisted in the triple store as a new model element with the namespace prefix “proposed”. To test H2 corresponding to the RQ2, we will build a user interface for the ontology engineers responsible for the maintenance of the marketplace models. This interface will allow interactive exploration of 1) structured user proposals; 2) their contexts (the offering descriptions), 3) thematic templates and existing descriptions based on them. In our setting, we are interested in incorporating complex user-initiated changes. These changes involve the addition of new model elements, the es- Semantic Model Extensibility in IoT Data Marketplaces 7 tablishing of an is-a relation with existing concepts as well as the introduction of the non-taxonomic property relations specified in the user annotation. An ontology engineer may: 1) accept the proposed concept and the annota- tion without any modification, 2) decline a change request and use the existing model element instead, or 3) accept a concept but change the proposed annota- tion. In the cases when a change is accepted (1 or 3), it becomes a part of the model, and the effects of this change are propagated to all dependent/related offerings and taken into account by the matching queries. When the proposed concept is declined (2), the offering metadata is corrected accordingly. Our preliminary results in the BIG IoT project include the alignment of the domain models with the SSN/SOSA framework, the data-driven investigation of archetypal thematic templates and the translation of their prototypical represen- tation into basic annotation graphs. The first version of the system architecture extension (the UI dynamic form generation and the corresponding back-end) is implemented. 7 Evaluation Plan The proposed approach and associated hypotheses will be evaluated in a series of experiments with the groups of users representing the two roles in the change management procedure: the domain experts group proposing changes and the ontology engineers group validating the proposed changes. We have started by the construction of a gold standard dataset: 60 data sam- ples in JSON format were annotated by two ontology engineers with a category and a data type for each key-value pair in the sample. Next, a selection of sam- ples will be formed based on the following criteria: the data samples should be 1) typical for the intended marketplace scope; 2) balanced in terms of domains and sosa:FeaturesOfInterest; 3) represent various thematic templates. Then we will intentionally exclude a subset of elements representing hapax legomena (i.e. terms which occur only once in our gold standard corpus) from the existing domain models. The task formulated for a group of domain experts is to annotate given samples using the published domain models in the BIG IoT marketplace Web portal. If an annotation term is not found in the available domain models, a user is supposed to extend the model. Experiment 1 is designed to test H1, namely, to explore a causal relation- ship between the ease of suggesting a new concept for data providers (dependent variable) and their exposure to the “modeling as dialogue” approach (causal vari- able). The control group will use the BIG IoT marketplace Web portal described in Fig. 1. The experimental group will use the implementation of the proposed approach illustrated in Fig. 2. Then the groups will be compared in terms of 1) time spent on performing an annotation task (overall and per sample) and 2) the number of times the model websites are consulted (overall and per sample). Additionally we will look at 3) the agreement with the gold standard annota- tion for the terms present in the model; 4) the inter-rater agreement within the groups; 5) the naming strategies used for labeling the new concepts. 8 Yulia Svetashova We will also collect reflections of the group members on how the usability of proposing a new concept is perceived, and compare them in a structured way. For the experimental group, the average agreement with the gold standard annotation for the new terms will be quantified. Experiment 2 is designed to test H2, namely, to prove a causal relationship between the ease of validating a new concept proposal by the ontology engineer (dependent variable) and the prior usage of the “modeling as dialogue” approach by domain experts (causal variable). Ontology engineers will validate proposals from the control group and structured proposals from the experimental group. The validation process will be compared for 2 groups of proposals in terms of 1) time spent on performing a validation task (overall and per sample), 2) the steps needed to integrate a proposed change to the corresponding domain model (per change), 3) the steps needed for change propagation (per change); and 4) the inter-rater agreement between ontology engineers. Questionnaires based on the System Usability Scale [3] will also be used to structure the perceived diffi- culty/ease of integrating changes, and specifically when dealing with structured proposals. Based on the validation results with high inter-rater agreement, a new dataset that models the decision making task will be compiled to test H3. In a machine learning experiment which aims at predicting an ontology engineer decision (dependent variable) based on the features sketched in the previous section, the standard evaluation measures of precision, recall and their harmonic mean F1 score will be used to test the model performance on a test subset. 8 Reflections The proposed research introduces the problem of run-time semantic model ex- tension by a domain expert mediated by the ontology-enhanced user interface and further validated by an ontology engineer. Its main contribution is intended to be the experimental and theoretical basis for extending semantic models. These models will serve both as descriptive metadata for the resources exposed on an IoT data marketplace and also as a lingua franca (shared model) in the infrastructure where data sharing takes place. Through the exploration of various scenarios of on-the-fly model extension performed by data providers, we will develop methods and tools to optimize the new model element inclusion process, paying special attention to the aspects which can contribute to the partially automated workflow. Although our research is shaped by and tailored to the IoT data marketplaces context, we believe that the approach described above is generic and can be applied in many scenarios that require model usage and extension by non-experts. Acknowledgements. This work has been developed in the project BIG IoT funded by the European Commission’s Horizon 2020 program under the grant agreement No 688038. The author would like to thank Prof. Dr. York Sure-Vetter (the thesis advisor), Prof. Dr. Andreas Harth, Dr. Stefan Schmid and three anonymous reviewers for their critical reading of the paper, their constructive comments and suggestions. Semantic Model Extensibility in IoT Data Marketplaces 9 References 1. Agarwal, R. et al.: Unified IoT Ontology to Enable Interoperability and Federation of Testbeds. IEEE 3rd World Forum on Internet of Things (WF-IoT) (2016): 70–75. 2. Bermudez-Edo, M. et al. IoT-Lite: a lightweight semantic model for the Internet of Things. Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, IEEE (2016): 90–97. 3. Brooke, J.: SUS-A quick and dirty usability scale. Usability evaluation in industry, 189(194), 4–7 (1996). 4. Flouris, G. et al.: Ontology Change: Classification and Survey. Knowledge Engi- neering Review, 23, 117–152 (2008). 5. Fu, W. T. et al.: A semantic imitation model of social tag choices. In: Computational Science and Engineering. CSE’09. International Conference on. Vol. 4, 66–73, (2009). 6. Gangemi, A., Presutti, V.: Ontology design patterns. In: Handbook on ontologies, 221–243. Springer, Berlin, Heidelberg (2009). 7. Ganzha, M. at al.: Semantic interoperability in the Internet of Things: an overview from the INTER-IoT perspective. Journal of Network and Computer Applications, 81, 111–124 (2017). 8. Haller, A. et al.: Semantic Sensor Network Ontology. W3C Recommendation. W3C. https://www.w3.org/TR/vocab-ssn/. Accessed: April 11 2018. 9. Hoppenbrouwers, S. et al.: A fundamental view on the process of conceptual mod- eling. In: International Conference on Conceptual Modeling, 128–143 (2005). 10. Janowicz, K.: Observation-driven geo-ontology engineering. Transactions in GIS 16(3), 351–374 (2012). 11. Jupp, S. et al.: Populous: a tool for building OWL ontologies from templates. BMC bioinformatics 13(1) (2012). 12. Deichmann, J. et al.: Creating a successful Internet of Things data market- place. (2016). https://www.mckinsey.com/business-functions/digital-mckinsey/ our-insights/creating-a-successful-internet-of-things-data-marketplace. Accessed: April 11 2018. 13. Manyika, J. et al.: The Internet of Things: Mapping the value beyond the hype. McKinsey Global Institute. (2015). 14. Parvizi, A. et al.: A pilot experiment in knowledge authoring as dialogue. Pro- ceedings of the 10th International Conference on Computational Semantics (IWCS 2013)–Short Papers (2013). 15. Paulheim, H., Probst, F.: Ontology-enhanced user interfaces: A survey. Semantic- Enabled Advancements on the Web: Applications Across Industries: Applications Across Industries, 214 (2012). 16. Perera, Ch. et al.: Context-aware computing for the Internet of Things: A survey. IEEE communications surveys & tutorials 16(1), 414–454 (2014). 17. Reinsel, D. et al.: Data age 2025: The evolution of data to life- critical. (2017). https://www.seagate.com/files/www-content/our-story/trends/ files/Seagate-WP-DataAge2025-March-2017.pdf. Accessed: April 11 2018. 18. Scharffe, F. et al.: Ontology alignment design patterns. Knowledge and information systems, 40(1), 1–28 (2014). 19. Stahl, F. et al.: A classification framework for data marketplaces. Vietnam Journal of Computer Science 3 (3), 137–143 (2016). 20. Stojanovic, L.: Methods and Tools for Ontology Evolution. PhD Thesis, University of Karlsruhe, Germany, 2004. 10 Yulia Svetashova 21. Stuckenschmidt, H., Van Harmelen, F.: Information sharing on the semantic web. Springer Science & Business Media (2005). 22. Vetere, G., Lenzerini, M.: Models for semantic interoperability in service-oriented architectures. IBM Systems Journal 44 (4), 887–903 (2005).