-

Enabling Contextualized Knowledge Description with Non-symbolic Integration

Elton Soares

Raphael Thiago

raphaeltg@br.ibm.com 0

Wallas Santos

Rodrigo Santos

rodrigo.costag@ibm.com 0

Marcio Moreno

0 0 IBM Research, Brazil , Av Pasteur 146 Rio de Janeiro - RJ , Brazil

Formal knowledge descriptions have been one of the main tools for enabling knowledge representation and reasoning in AI. The W3C Semantic Web initiative proposed a set of standards, including on a common data interchange format that enabled greater interoperability and integration between knowledge descriptions generated by di erent organizations in the academy and industry. Nonetheless, these standards have been unable to provide contextualized integration of knowledge descriptions e ectively, as they do not provide constructs to explicitly de ne contextualized n-dimensional (e.g. time and space, n-ary and hierarchical facts, etc.) relationships between concepts from di erent descriptions, leading to ine ciencies in the reasoning and query processing over decontextualized relationships, and inconsistencies in the modeling approaches utilized for dealing with this limitation. The main goal of this demo is to present how a hybrid knowledge representation model and description language, namely Hyperknowledge, allows contextualized integration of knowledge descriptions using constructs that enable the explicit de nition of rich contextual relationships. This demonstration will be performed using the Knowledge Explorer System (KES), that supports visualization and management of the e ective contextualization of Hyperknowledge expressivity w.r.t. n-dimensional relationships between concepts from multiple descriptions and between those concepts with their corresponding non-symbolic content.

Hyperknowledge Hybrid Knowledge Representation Knowledge Management Knowledge Visualization Knowledge Curation

As the AI industry and research community grow, the number and size of knowledge descriptions produced and shared using the W3C Semantic Web Standards 2 (WSWS) tend to grow at a similar pace.

Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 https://www.w3.org/standards/semanticweb/

Meanwhile, it has become clear that knowledge descriptions are not always absolute, but instead are assumed to hold under certain circumstances that de ne a speci c context [ 7 ]. Descriptions may need to be contextualized, for example, with regards to dimensions such as time and space, as a fact that holds in a certain period, or a speci c location of the world, might not be true in another period or location.

Existing alternatives based on the WSWS, such as RDF graphs/datasets 3, the OWL Time Ontology 4 or Ontology Patterns for N-ary relations 5 can be used to work around the expressivity limitations of the underlying conceptual model, but they fail to provide a general approach in the following aspects, respectively: Multiple valid interpretations depending on the assumptions made for graph naming and meaning; addresses only one speci c contextual dimension using the same constructs that are used to represent the knowledge descriptions associated with it; propose multiple ways of representing the same n-ary relations using binary relations, as the underlying conceptual model is unable to represent them as rst-class constructs.

Therefore, as the WSWS does not prescribe a standard approach to contextualize knowledge descriptions, the context of those descriptions is often expressed in their identi ers or using textual labels and annotations, which is both ine cient, w.r.t reasoning and query processing but also hard to standardize across multiple organizations.

Several theories of context have been investigated in seminal works of the elds of AI and knowledge representation, for example, the formalization of contexts as rst-class constructs [ 1 ] and the use of a "box" metaphor to properly represent contexts [ 7 ]. Both of these theories served as inspiration for the approach presented in this work, that not only addresses the problem of contextualized knowledge description integration, but also its integration with non-symbolic content (i.e. multimodal content such as machine learning (ML) model, image, video, text, audio, etc.) in a given context.

The integration of knowledge and non-symbolic content has been deeply inuenced by the semantic gap [ 8 ] between what the content means and its knowledge representation, which has been an open issue until the recent proposal of a hybrid conceptual model, namely Hyperknowledge [ 3 ], capable of representing relationships between conceptual description and non-symbolic content.

Previous solutions from the non-symbolic AI community, such as metadata standards and models, attempted to bridge this gap by allowing the speci cation of prede ned elds to describe low-level aspects of non-symbolic content (e.g. MPEG-7 6 and Dublin Core 7). Meanwhile, solutions from the symbolic AI 3 https://www.w3.org/TR/rdf11-datasets 4 https://www.w3.org/TR/owl-time/ 5 https://www.w3.org/2001/sw/BestPractices/OEP/n-aryRelations-20060323 6 https://mpeg.chiariglione.org/standards/mpeg-7 7 https://dublincore.org/speci cations/dublin-core/ community, such as the WSWS, focused on formally describing abstract concepts and semantic relationships between them (e.g., RDF 8, OWL 9).

Hyperknowledge lls the gap left by previous solutions by providing rstclass constructs that can be used to explicitly describe relationships between concepts and n-dimensional fragments of non-symbolic content [ 5 ]. It also provides constructs that enable the contextualization of those relationships, therefore enabling the description of rich relationships involving concepts from multiple ontologies in a given context. 2

The Hyperknowledge Conceptual Model The Hyperknowledge conceptual model is composed of three main groups of entities: terminal nodes, composite nodes, and links [ 2 ].

A terminal node is composed of a collection of information units. The exact notion of what constitutes an information unit is part of the node de nition and depends on its specialization. A context node is a composite node (or a set) that may contain links, terminal nodes, and composite nodes. Links de ne relationships among nodes' anchors. There are di erent types of links such as SPO (subject-predicate-object) links, causal links, and constraint links.

Every node, either composite or terminal, can be connected with other nodes through anchors, that represent portions or fragments of a node. Anchors can be used to represent n-dimensional fragments of a node (e.g time and space dimensions), enabling the creation of links between either concept or content nodes within those dimensions. Also, as every entity in the Hyperknowledge model can be contained in a composite node, it is possible to create links that represent the relationships between nodes in a given context.

Therefore, the n-dimensional anchors and the context nodes can be used to explicitly contextualize n-dimensional relationships between concepts from multiple knowledge descriptions and corresponding non-symbolic content, which are represented as content nodes. 3

Knowledge Description and Non-Symbolic Integration To enable the integration of knowledge descriptions and non-symbolic content using Hyperknowledge, we devised an architecture that enables both human users (e.g., knowledge engineers, domain experts, and developers) and other systems (e.g. AI assistants and tools) to interact with the same Hyperknowlege Base (HKBase).

An overview of this architecture is depicted in Figure 1 where we highlight the process of ingestion of knowledge descriptions, expressed using WSWS (e.g., 8 https://www.w3.org/RDF/ 9 https://www.w3.org/OWL/ RDF and OWL), and non-symbolic content (e.g., ML models and images). Human users can perform this process using KES while other systems can communicate directly with the HKBase using a RESTful API [ 4 ]. During the ingestion of the knowledge descriptions, HKBase converts those descriptions to the Hyperknowledge representation and stores the original descriptions and the corresponding Hyperknowledge descriptions using one of its storage options.

For this demonstration, the main storage option used for the Hyperknowledge descriptions is a triple store as it enables e cient ingestion and querying of knowledge descriptions based on WSWS, which makes it easier to integrate applications that were originally implemented using WSWS. As illustrated by the dashed arrows and boxes in Figure 1 the Hyperknowledge descriptions could be stored and queried using a document or graph database as well.

For the ingestion of non-symbolic content, we currently make use of object storage but future implementations will make use of optimized storage options for each type of content.

Our main object storage option is IBM Cloud Object Storage 10 but our system can be used with any AWS S3 11 compatible object storage. In that sense, we currently have support to all Hyperknowledge features using either MongoDB 12 or JanusGraph 13, but the extended RDF compatibility and SPARQL support are optimized when using a triple store, which in the current implementation is Apache Jena 14.

Datasets

Knowledge Descriptions Non-Symbolic

Content

Knowledge Engineers,Domain

Experts & Developers Knowledge Explorer

System (KES)

AI Assistants &Tools Hyperknowledge Base

Hyperknowledge

Descriptions Non-Symbolic Content

Storage

Triple Store Document Database

Graph Database Object Storage 10 https://www.ibm.com/cloud/object-storage 11 https://docs.aws.amazon.com/AmazonS3 12 https://www.mongodb.com/ 13 https://janusgraph.org/ 14 https://jena.apache.org/ The main goal of this demo is to show Hyperknowledge features that support the contextualized integration of multiple knowledge descriptions and non-symbolic content through KES. In this sense, we de ned a dataset containing tra c simulation images extracted from a visual perception benchmark [ 6 ], concepts extracted from these images, and several ontologies used in Intelligent Transportation Systems to show how KES can support the contextualized speci cation of rich relationships between concepts from multiple knowledge descriptions for the Smart Mobility domain. The demo shows the collaborative capabilities of the system by illustrating di erent users having a synchronized view of the same knowledge description, while also being able to explore it individually, by expanding nodes of interest or visualizing the result of queries. The images will be exploited to demonstrate KES capability of contextualized integration of knowledge descriptions with non-symbolic content. The use of the ontologies is to illustrate the support to contextualized knowledge description integration. Finally, queries performed using KES will demonstrate its capability to answer contextualized spatial queries over non-symbolic content and knowledge descriptions. Demo video 1: Collaborative ingestion and contextualized integration of tra c knowledge descriptions and non-symbolic content. https://ibm.box.com/v/iswc2020-kes-demo1-video1 Demo video 2: Individualized exploration of the knowledge descriptions. https://ibm.box.com/v/iswc2020-kes-demo1-video2 Demo video 3: Visualization of contextualized spatial queries over tra c images and knowledge descriptions. https://ibm.box.com/v/iswc2020-kes-demo1-video3

1. MCCARTHY , J. : Notes on formalizing context, ijcai ( 1993 )

2. Moreno , M. , Brand~ao, R., Santos , R. , Sousa , W. , Cerqueira , R.: Representing experts' interpretive trails with hyperknowledge speci cations . In: Proceedings of the 5th International Conference on Frontiers of Educational Technologies ( 2019 )

3. Moreno , M.F. , Brandao , R. , Cerqueira , R.: Extending hypermedia conceptual models to support hyperknowledge speci cations . International Journal of Semantic Computing 11 ( 01 ) ( 2017 )

4. Moreno , M.F. , Santos , R.C., dos Santos , W.H. , Cerqueira , R.: Kes: The knowledge explorer system . In: International Semantic Web Conference (P&D/Industry/BlueSky) ( 2018 )

5. Moreno , M.F. , Santos , R.C.M. , Santos , W.H.S.d. , Fiorini , S.R. , Silva , R.M.d.G.: Multimedia search and temporal reasoning . arXiv preprint arXiv: 1911 . 08225 ( 2019 )

6. Richter , S.R. , Hayder , Z. , Koltun , V. : Playing for benchmarks . In: IEEE International Conference on Computer Vision , ICCV 2017 . pp. 2232 { 2241 ( 2017 )

7. Sera

, L., Homola , M. : Contextualized knowledge repositories for the semantic web . Journal of Web Semantics 12 ( 2012 )

8. Smeulders , A.W. , Worring , M. , Santini , S. , Gupta , A. , Jain , R.: Content-based image retrieval at the end of the early years . IEEE Transactions on pattern analysis and machine intelligence 22 ( 12 ) ( 2000 )