The Whyis Knowledge Graph Framework in Action James P. McCusker, Sabbir M. Rashid, Nkechinyere Agu, Kristin P. Bennett, and Deborah L. McGuinness Rensselaer Polytechnic Institute, Troy, NY 12180, USA 1 Introduction We will demonstrate a reusable framework for developing knowledge graphs that supports general, open-ended development of knowledge curation, interaction, and inference. Knowledge graphs need to be easily maintainable and usable in sometimes complex application settings. Often, scaling knowledge graph updates can require developing a knowledge curation pipeline that either replaces the graph wholesale whenever updates are made, or requires detailed tracking of knowledge provenance across multiple data sources. Fig. 1 shows how Whyis provides a semantic analysis ecosystem: an environ- ment that supports research and development of semantic analytics for which we previously had to build custom applications [3,4]. Users interact through a suite of knowledge graph views driven by the node type and view requested in the URL. Knowledge curation methods include Semantic ETL, external linked data mapping,and Natural Language Processing (NLP). Autonomous inference agents expand the available knowledge using traditional deductive reasoning as well as inductive methods that can include predictive models, statistical reason- ers, and machine learning. Whyis is used in a number of areas today, including nanopolymers, spectrum policy, and health informatics. We demonstrate Whyis by creating and deploying an example Biological Knowledge Graph (BioKG), using data from DrugBank and Uniprot1 , and briefly discuss benefits of using our approach over a conventional knowledge graph pipeline. 2 Architecture Whyis uses nanopublications to encapsulate every piece of knowledge introduced into the knowledge graphs it manages. A nanopublication is composed of three named RDF graphs: Assertion, Provenance, and Publication Info [2]. We see knowledge graphs with the level of granularity supported by nanopublications as essential to fine-grained management of knowledge graphs that are curated and inferred from diverse sources and can change on an ongoing basis. The use of nanopublications as a fundamental unit of knowledge in Whyis has enabled the systematic inclusion of provenance in ways that support knowledge revision 1 http://drugbank.ca, http://uniprot.org, respectively Users Visualization, Answers, Questions Explanations Analysis Knowledge Semantic Semantic Cognitive Interaction, Annotators Browsers Agents Creation, Exploration Literature Contributed Results NLP, Predictive Statistical Knowledge Knowledge Hypotheses Machine Reading Modelers Reasoners and Data Databases Knowledge Semantic ETL, Machine Deductive Inferred/Expanded SDDs Public Knowledge Ontologies Learning Reasoners Linked Datasets Knowledge Data Mapping Open and Data Data Knowledge Knowledge Inference Curation Fig. 1. The semantic ecosystem enabled by the Whyis framework for knowledge cura- tion, interaction, and inference. and truth maintenance of inferred knowledge as underlying knowledge changes. Whyis is written in Python using the Flask framework, and uses a number of existing infrastructure tools to work, as shown in Fig. 2. Whyis inference is handled by a suite of “Agents”, each performing as the analogue to a single rule in traditional deductive inferencing. An agent is com- posed of a SPARQL query that serves as a “body” and a python function that serves has the “head”. The agent is invoked when new nanopublications are added to the knowledge graph that match the SPARQL query defined by the agent. The agent superclass assigns some basic provenance related to the given infer- ence activity, which developers can customize in their implementations. Included inference agent types include entity extraction and resolution against existing knowledge graph nodes, deductive reasoning agents that can be configured with custom rules, as well as many available pre-configured OWL 2 rules. 3 Related Work Some existing frameworks support some of Whyis’ capabilities. Stardog2 includes OWL reasoning, mapping of data silos into RDF, and custom rules. Ontowiki provides a user interface on top of an RDF database that tracks history, allows users to browse and edit knowledge, and supports user interface extensions 3 . Callimachus, a “Semantic Content Manager,” lets developers create UIs by object type using RDFa [1]. Virtuoso Openlink Data Spaces is a linked data publishing tool that provides a set of pre-defined data import tools and a fixed set of views 2 A case study: https://www.stardog.com/blog/nasas-knowledge-graph/ 3 http://ontowiki.net Browser Ontology Custom Literature Knowledge Search Browser Views Browser Viewer/Editor Server DBPedia Importer DOI Importer Ontology Importer LOD crawler Knowledge Entity View Manager Linked Data Mapping Curation Extraction/ (by class and view type) (SETLr) Resolution Whyis Knowledge REST API Knowledge Expansion Agents Web Storage/Query Stack RDF Nanopublication File Database Fig. 2. The Whyis technology stack. Nanopublications are stored in the RDF database, while the entire history is stored in the nanopublication file archive using File Depot. Celery invokes and manages autonomic inference agents by listening for graph changes. on the linked data it creates.4 Vitro5 supports the creation of new ontology classes and instances, but does not allow users to create custom interfaces. 4 Demonstration We demonstrate Whyis using our Biology knowledge graph at http://bit.ly/ whyis-demo. All user views are built-in views in Whyis. Nothing has been cus- tomized for the biology domain except for queries to find biological interactions. The BioKG main page allows users to view knowledge graph along with the most recent changes and the graph neighborhood of the most recently changed entity. Users can search for entities and either view search results or select from one of the resolved entities. Every entity in the knowledge graph gets its own page, which can be customized by knowledge graph developers by the entity type. Users can also explore the knowledge graph beyond the current node using the knowledge explorer (Figure 3), a refinement of the user interface developed in [3]. 5 Conclusions We believe Whyis is the first provenance-aware framework for knowledge graph development that enables curation, interaction, and inference within a unified 4 https://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/Ods 5 Available: https://github.com/vivo-project/Vitro Fig. 3. The knowledge explorer lets users expand the current view by expanding node connections (upper right) or searching for new entities (search box in upper left). ecosystem. We demonstrate these features in a BioKG setting, exploring drug- protein-disease interactions, and providing semi-automated support for semantic queries previously custom developed [3]. Whyis is published under the Apache 2.0 License on Github6 with documentation on how to develop custom knowledge graphs. Acknowledgements: This work was funded by NIEHS Award 0255-0236- 4609 / 1U2CES026555-01, NSF Award OAC-1640840 IBM Research AI Horizons Network, and by the Gates Foundation through HBGDki. References 1. Battle, S., Wood, D., Leigh, J., Ruth, L.: The callimachus project: Rdfa as a web template language. In: Proceedings of the Third International Conference on Con- suming Linked Data-Volume 905. pp. 1–14. CEUR-WS. org (2012) 2. Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Information Services and Use 30(1), 51–56 (2010), http://dx.doi.org/10.3233/ISU-2010-0613 3. McCusker, J.P., Dumontier, M., Yan, R., He, S., Dordick, J.S., McGuinness, D.L.: Finding melanoma drugs through a probabilistic knowledge graph. PeerJ Computer Science 3, e106 (Feb 2017), https://doi.org/10.7717/peerj-cs.106 4. McGuinness, D.L., Bennett, K.: Integrating semantics and numerics: Case study on enhancing genomic and disease data using linked data technologies. Proceedings of SmartData pp. 18–20 (2015) 6 https://tetherless-world.github.io/whyis