Cross-Fertilizing Data through Web of Things APIs with JSON-LD Wenbin Li and Gilles Privat Orange Labs, Grenoble, France gilles.privat@orange.com, liwb1216@gmail.com Abstract. Internet of Things (IoT) data are mostly cloistered in closed IoT in- frastructures and vertically integrated applications, failing to leverage the poten- tial interlinking of corresponding APIs. We propose a data model based on JSON-LD, in which semantics are added to Web of Things (WoT) APIs, en- hancing their interoperability and evolvability by the composition of nested contexts factoring out shared generic categories. According to this model, we present the blueprint of a framework to automatically and iteratively enrich and interconnect WoT APIs, through a process of resource crawl, syntax extraction, semantic annotation and context generation. Based on our propositions, we demonstrate the idea of cross-fertilizing data with a scenario involving three separate IoT infrastructures, showing how their data are linked and interoperat- ed. Keywords: Internet of Things, Web of Things, semantic interoperability, REST 1 Introduction Many existing Internet of Things (IoT) infrastructures are little more than dedicated stovepipes that connect one set of sensor devices to a few applications, or watertight silos in exclusive mastery of a single operator, stakeholder (as for e.g. metering infra- structures) or manufacturer (as for the newish breed of “connected devices”). The provision of IoT services through APIs relies on such infrastructures, which vary widely in scope, scale, genericity, and the levels of abstraction of the data they pro- vide access to. We refer the IoT APIs as Web of Things (WoT) [1] APIs if they ad- here to the principles of the REST [2] architectural style, especially by using derefer- enceable URIs instead of arbitrary identifiers and providing explicit interlinking be- tween fine-grain resources such as devices, physical entities beyond devices (e.g., room or car) and the states and attributes of these. A key challenge of WoT API development is the semantic interoperability of data exposed by APIs, which refers to the capability of not only matching data, but also automatically interpreting them. This challenge comes from the heterogeneity of IoT ‘things” (from electronic devices to physical objects), the variety of their data models, the scarcity of explicit data descriptions, and the limited accessibility of data locked inside IoT infrastructures. In response to this, we propose the idea to cross-fertilize data through WoT APIs exposed by existing infrastructures. By cross-fertilize, we mean “extract and propa- gate semantics to connect, exchange and explore data via semantic links, thus provid- ing enhanced data understanding and uncovering additional knowledge”. We first propose an RDF-based data model with JSON-LD [3], in which semantics are divided into three parts as JSON-LD contexts to support data interoperability, evolvability and semantics reuse. We then introduce the blueprint of a framework that makes it possi- ble to extract and iteratively propagates semantics based on this model from IoT APIs. At last we present how data are cross-fertilized through a WoT-RDF infrastruc- ture connecting data, ontologies and IoT infrastructures. To illustrate our contribution, we present a smart building configuration containing a room and a corridor on the same floor managed by different IoT infrastructures. The room contains a presence sensor and an electrical door lock using FIWARE [4] and also contains a thermostat using Netatmo® [5], while the corridor contains a presence sensor using openHAB [6]. Thus three distinct and non-interoperable IoT infrastruc- tures coexist on the same floor. In addition, there is a mismatch between these infra- structures since FIWARE does, as other high-level IoT infrastructures, describe phys- ical entities at a level of abstraction above devices (a room in this example), while openHAB and vendor-provided APIs such as Netatmo® are lower-level infrastructures that give access to data only at the level of connected devices. 2 JSON-LD based Data Model Following Linked Data principles [7], we propose a RDF data model based on JSON- LD. A JSON-LD description consists of at least three main parts [3]: 1) @context: a context description; 2) @id: an identifier; 3) A JSON description. We use JSON-LD rather than other RDF serialization formats such as RDF/XML for three main reasons: Separation of lexical/syntactic and semantic levels. JSON-LD contexts [3] can either be defined within the same JSON-LD document or by referencing URIs con- taining context definitions. By referring URIs, the syntax and semantics in a JSON- LD description are separated into two documents to ensure the evolvability of model: whenever a semantic update is required, we simply update the context definition in a different URI without making any change in the JSON-LD description. Context composition. JSON-LD supports the composition of several contexts for one document. By context composition, we are able to specify different abstraction levels for semantics included in a JSON-LD document, and promote the reuse of semantic definitions. Higher abstraction levels link data from different domains, while lower abstraction levels link data from the same subdomain. The complete JSON-LD con- text results from the composition of contexts in different levels. Compatibility with JSON. JSON-LD is totally compatible with plain JSON, and the compatibility allows directly updating JSON-LD descriptions to IoT APIs based on JSON without making any further change in APIs side. To take full advantage of JSON-LD, we specify three levels of abstraction for JSON- LD contexts. In particular, the value of the “@context” is defined as a URI of a doc- ument containing the context definition by composing contexts from different abstrac- tion levels. In the following, we introduce each abstraction level. Generic IoT Context: Generic IoT context defines common concepts (e.g., thing vs. device, location, time, name, attribute) shared by all domains of IoT. A JSON-LD document refers to only one such context. Objective: Generic IoT context is used to match and link data from different domains through the common concepts used, if any. Reference Ontologies: Generic IoT context adopts standard domain-independent ontologies such as OneM2M ontology [8]. All IoT data adopt the same references in Generic IoT context. Domain-specific Context: Domain-specific context defines vocabularies (e.g., SoilHumiditySensor) related to the subdomains of IoT (e.g, Smart Agriculture, Smart Cities and Smart Home). A JSON-LD context can combine one or more domain- specific contexts (e.g. smart energy and smart buildings) since possible it corresponds to devices and data from different subdomains. Objective: Domain-specific context is used to match and link data from the same sub- domain. Reference Ontologies: Domain-specific context references standard domain- dependent ontologies such as SAREF [9] for smart home domain, or CityGML [10] for smart city domain. All data of the same domain adopt the same references in do- main-specific context. Vendor/Technology-specific Context: Vendor/Technology-specific context provides references for specific terms related to device manufacturers (e.g., Philips® and Netatmo®), protocols (e.g., ZigBee and Z-Wave), or technologies (e.g., CoAP). A JSON-LD context can combine one or more vendor//technology specific contexts since it possible contains data from different vendors/technologies. Objective: Vendor/Technology specific context aims at mapping vendor/technology specific terms with generic IoT and domain-specific ontologies to link data. Reference Ontologies: Vendor/Technology specific context firstly maps ven- dor/technology specific vocabularies with generic IoT or domain-specific reference ontologies; if mapping relation cannot be found, ontologies (e.g., Z-Wave Ontology [11]) related to specific manufactures, protocols or technologies are used; if no se- mantic reference exists for certain concept, ontologies are created by domain experts. For illustration purpose, listing. 1 presents a simplified example of Netatmo® thermo- stat data from our scenario, in which the URI in the example represents our WoT- RDF infrastructure introduced in section 3. A detailed description of our scenario with all corresponding entities is provided in [12]. The left column presents the JSON-LD description based on our data model and three URIs for contexts, and the right column illustrates the three genericity levels of JSON-LD contexts used. By just adding two lines (i.e., @context and @id) to the basic JSON description provided by the Netat- mo® API, we transform it into semantically meaningful JSON-LD. The complete context for the building, shared by all entities, is composed from three contexts, i.e., Netatmo, smart home, and generic IoT; when the context requires to be updated to take into account other concepts, we simply modify or add contexts at the required level without changing the building context itself. URI: http://lab.wot-rdf.org/jsonld/thermostat1 URI: http://lab.wot-rdf.org/jsonld/GenericIoTContext { {"@context": { "@context":"http://lab.wot- "m2m": "http://www.onem2m.org/ontology/Base_Ontology/" rdf.org/jsonld/context/building", "status": "m2m:hasOperationState", "@id": "http://lab.wot-rdf.org/jsonld/thermostat1", "body": "m2m:hasOutput", "status": "ok", "unit": "m2m:concerns" }} "body": { URI: http://lab.wot-rdf.org/jsonld/SmartHomeContext "temperature": 21.7, {"@context": { "unit": "Celsius"}, "saref": "http://ontology.tno.nl/saref#", "module_name": "Inside", "biopax": "http://www.biopax.org/release/biopax-level3.owl#", "rf_status": 161} "temperature": "biopax:temperature" }} URI: "http://lab.wot-rdf.org/jsonld/NetatmoContext URI: http://lab.wot-rdf.org/jsonld/context/building {"@context": { {"@context": [ "schema": "http://schema.org/", "http://lab.wot-rdf.org/jsonld/GenericIoTContext", "m2m": http://www.onem2m.org/ontology/Base_Ontology/, "http://lab.wot-rdf.org/jsonld/SmartHomeContext", "module_name": "m2m:isPartOf", "http://lab.wot-rdf.org/jsonld/NetatmoContext"]} "rf_status": "netatmo:radioStatus" }} Listing 1. Thermostat example 3 Semantics Extraction and Propagation Framework We present the framework of semantics extraction and propagation, which is designed to generate semantic data based on the previous model from WoT APIs. The frame- work is illustrated in Fig. 1. IoT Infrastructures REST Entrypoint(s) Semantics Extraction and Propogation Framework REST Crawler Syntax Extractor URI Graphs SPARQL Endpoint(s) Syntax Graphs General RDF Matching Reasoning Context Generator Clustering Classifying Formalized JSON-LD Users Semantic Annotator Reference Ontologies TripleStore Fig. 1. Semantics extraction and propagation framework Generally, starting with the input of REST entry points from different IoT infra- structures, the framework consists of four modules i.e., REST Crawler, Syntax Ex- tractor, Semantic Annotator and Context Generator to generate JSON-LD documents. Finally RDF triples represented in JSON-LD are stored in a triplestore that provides a SPARQL endpoint for user and application queries. 3.1 REST Crawler Most of IoT infrastructures expose REST APIs for user queries. In order to fully ex- plore the APIs, a REST crawler is expected to automatically discover REST resources by following links from REST entry point(s). According to the design principle HATEOAS of REST, REST resources descriptions must have well-defined ways in which they expose links to related resources [2]. A number of formats define relations between resources to support HATEOAS in REST design, and their discovery strate- gies are introduced in [13]. In our framework, the REST crawler applies a recursive process of identifying the format of resources, extracting relationships from resource descriptions and generat- ing a RDF graph connecting different resources identified by URIs. In our scenario, all three distinct IoT infrastructures use REST as their architectural style. Fig. 2 presents the entry points and URI graphs constructed by REST crawler. openHAB FIWARE Netatmo® entrypoint entrypoint entrypoint http://my.openhab.org/rest/items http://lab.fiware.org/room1/ https://api.netatmo.com/api/devicelist/ /smartbulb1 /door1 /presencesensor1 /thermostat1 /presencesensor1 /dooractuator1 Fig. 2. URI graphs from three IoT infrastructures 3.2 Syntax Extractor Syntax extractor extracts information from discovered resource descriptions and cre- ates from these a stub RDF graph for future semantic annotation. More than one way transforms hierarchical serialization formats into graphs, and here we present a recursive method to generate a RDF graph from JSON-based re- source description. Other serialization formats are firstly transformed into JSON and then are processed in our framework. JSON is built on two base structures, either a collection of key/value pairs or an array. The extraction process is introduced as fol- lows: 1) A first subject is generated by use of the JSON-based resource URI; 2) for a collection of key/value pairs, the keys are used to generate predicates in RDF graph while the values are regarded as the objects. In case that the value of a key k is anoth- er JSON object obj instead of a simple data type, an anonymous node anode is created as the object of the key k, and anode is the subject of the key/value pair of obj; 3) for an array of values, the predicate is regarded as “rdf: predicate”, while the value ele- ments in array are RDF objects. 4) An elimination algorithm introduced in [14] is applied to reduce the redundancy of the RDF graph. 3.3 Semantic Annotator In order to provide semantics for WoT APIs, semantic annotator updates the stub RDF graphs from syntax extractor by associating RDF elements i.e., nodes and arcs, with three levels of context reference ontologies. The semantic annotator proceeds with an iterative process of the following four steps. Keywords Matching. The descriptions of RDF elements are matched against ontolo- gies elements i.e., classes and properties. We adopt the semantic matching algorithm introduced in [15] because of its performance on heterogeneities and inconsistence matching. The ontology alignment algorithm in [16] is applied to deal with multiple matching between one graph element and ontology elements. Semantic Reasoning. This step firstly infers the classes and properties in RDF graphs based on ontology property’s domain and range: for a RDF statement, if the predicate between two RDF nodes is identified, the classes of the subject and object can be inferred based on the domain and range of the predicate; equivalently for the contrary case. Secondly, this step infers RDF graph elements following semantic rules which come from the axioms in ontology definitions. Classifying. This step analyzes graph elements’ connections with other elements and use pre-trained classifier to refine RDF element. Here we adopt the ontology classifi- cation algorithm introduced in [17] to create the classifier based on Maximum Entro- py Markov Model due to the optimum performance for class induction. Interlinking. This step discovers RDF graph elements that represent the same con- cept by use of clustering algorithms to refine RDF graph elements within the same cluster. Here we use the approach presented in [18] to calculate the distance between RDF graph elements and carry out interlinking step. Each internal step runs for sequent. At the end of one cycle, the generated RDF graph is sent to the first step to start a new cycle, because certain graph elements deduced by latter steps can possibly be used by previous steps to identify graph elements. The RDF graph automatically and incrementally propagates through such iteration. The iteration stops when the graph has not changed from the previous iteration. Fig. 3 presents the simplified output graph of semantic annotator, while a detailed graph is presented in [12]. The ontologies used are OneM2M ontology [8] and DogOnt [19]. Through SPARQL endpoints, we are able to get additional information such as the presence state of the whole floor. dogont:Floor dogont:Room rdf:type rdf:type items room1 dogont:hasSensor dogont:contains dogont:PresenceSensor smartbulb1 dogont:hasSensor m2m:hasThingRelation thermostat1 rdf:typerdf:type dogont:hasActuator rdf:type rdf:type dogont:hasSensor door1 dogont: Thermostat dogont:Lighting presencesensor1 dogont:hasActuator rdf:type presencesensor1 dooractuator1 dogont:Door rdf:type dogont:DoorActuator Fig. 3. Output RDF graph 3.4 Context Generator In order to transform RDF graphs into JSON-LD documents, context generator trans- forms the result from semantic annotator to formalized JSON-LD documents with three abstraction levels of contexts as defined before. Since tools exist to transform RDF between different serialization formats [20], here we only present the transformation process from RDF semantics into three levels of JSON-LD context. The generation process is a top-down matching process as fol- lows: for a RDF graph element with semantic label, if a matching exists between the semantic label and the reference ontologies of generic IoT context, this label is stored in the generic IoT context part; otherwise, context generator checks if a matching exists between the semantic label and the reference ontologies of domain-specific context. If yes, this label is stored in the domain-specific context part; otherwise, it is stored in the part of vendor/technology specific context. This process repeats until all semantics in a RDF graph is transformed as JSON-LD context. After generation, IoT data from different sources are connected by the generic IoT context, and IoT data from the same subdomain are also connected through the do- main-specific context. Fig. 4 summarizes the idea of cross-fertilizing IoT data based on our propositions. Semantics extraction and propagation framework connects WoT-RDF infrastructure and IoT infrastructures by constructing RDF graphs from IoT REST interfaces and generating JSON-LD descriptions based on our data model. WoT-RDF infrastructure provides two views for the linked IoT data: general RDF graphs for SPARQL query, and JSON-LD documents to connect IoT infrastructures. Three parts of ontologies respectively provides semantics in three abstraction levels. Fig. 4. Cross-fertilizing IoT data with JSON-LD Regarding queries, users are able to not only perform local queries through REST APIs of individual IoT infrastructures, but also query through SPARQL endpoints in WoT-RDF infrastructure to get global information between IoT infrastructures and deduce explicit knowledge. Moreover, JSON-LD documents are also stored in WoT- RDF infrastructure and are used to update IoT infrastructure-side descriptions (for open IoT infrastructure such as FIWARE). By such updates users are able to locally perform queries via individual infrastructure APIs to get additional semantics and interconnected information. 4 Conclusion JSON-LD is a promising evolutionary solution to cross-fertilize data through existing WoT APIs, maintaining full backward compatibility if needed. A framework has been presented to generate, by iterative propagation, specified JSON-LD data from WoT interfaces, and a WoT-RDF “superstructure” has been introduced to interoperate ex- isting WoT infrastructures from the semantic level. This idea has been proposed as a solution for further opening and “semanticizing” the APIs of FIWARE, which is itself a high-level open infrastructure, but it can be taken up, as we have shown as a mini- mal pragmatic interoperability solution between existing platforms or infrastructures, taken as they are, without the need to subsume, replace or even alter them. References 1. Guinard, D.D., Trifa, V.M.: Building the Web of Things Book | Web of Things. Manning Publications (2015). 2. Richardson, L., Ruby, S.: RESTful web services. O’Reilly, Farnham (2007). 3. JSON-LD 1.0, https://www.w3.org/TR/json-ld/ 4. Internet of Things (IoT) Services Enablement Architecture, https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/Internet_of_Things_%28 IoT%29_Services_Enablement_Architecture 5. Netatmo Developers, https://dev.netatmo.com/doc 6. openHAB, http://www.openhab.org/ 7. Bizer, C., Berners-Lee, T.H.T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 1-22 (2009) 8. OneM2M Base Ontology, http://www.onem2m.org/ontology/Base_Ontology 9. SAREF Ontology Documentation, http://ontology.tno.nl/saref/ 10. CityGML, http://www.citygml.org 11. Z-Wave ontology - Smart Appliances Project, https://sites.google.com/site/smartappliancesproject/ontologies/z-wave-ontology 12. Interoperability with JSON-LD, https://github.com/WoTRDF/InteroperabilityWithJSON- LD 13. Mayer, S., Guinard, D.: An Extensible Discovery Service for Smart Things. In: Proceed- ings of the Second International Workshop on Web of Things. No. 7. ACM (2011). 14. Pichler, R., Polleres, A., Skritek, S., Woltran, S.: Redundancy Elimination on RDF Graphs in the Presence of Rules, Constraints, and Queries. In: Web Reasoning and Rule Systems. pp. 133–148. Springer (2010). 15. Khan, S., Safyan, M.: Semantic Matching in Hierarchical Ontologies. Journal of King Saud University - Computer and Information Sciences. 26, 247–257 (2014). 16. Cheatham, M., Hitzler, P.: String similarity metrics for ontology alignment. In: The Se- mantic Web–ISWC 2013. pp. 294–309. Springer (2013). 17. Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. eds: Semantic Web Evaluation Challenges. Springer International Publishing, Cham (2015). 18. Ferrara, A., Genta, L., Montanelli, S.: Linked data classification: a feature-based approach. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops. pp. 75-82. ACM (2013). 19. DogOnt, http://elite.polito.it/ontologies/dogont/dogont.html 20. JSON-LD implementation for Java, https://github.com/jsonld-java/jsonld-java