<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Demonstration of KGNet: a Cognitive Knowledge Graph Platform</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Concordia University</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Knowledge graph (KG) engines lack the support for cognitive queries based on semantic affinity and classification models. Having the power of semantic search on KG engines will enable users to quickly generate deep hidden insights from their KG and enrich it. Variant vectorized representations (embeddings) techniques are proposed to encode the semantics of a word, image, graph node, and edge. User Defined Functions (UDFs) could be used to calculate semantic affinity between two entities by measuring the distance between their embeddings. We will demonstrate KGNet; a system that supports cognitive queries by transparently optimizing the queries, estimating UDF cost, automatically selecting the suitable embedding technique, executing the optimized query, and finally explaining the semantic results. During the demo, the audience will experience KGNet using four different use cases based on real datasets and variant embeddings techniques in real applications. KGNet is a step forward to enable advanced AI capabilities in KG engines. A demo video is available online here1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Knowledge graphs (KGs) are adopted across variant application domains to
integrate heterogeneous datasets via semantic information extraction from csv files,
text, images, and videos. RDF engines are widely utilized to store KGs due to
RDF’s simplicity, powerful query language (SPARQL), and inferencing support
on top of RDF Schema and Web Ontology Language. Thus, numerous
applications create RDF-based KGs and enable access via online service (endpoint)
receiving SPARQL queries via HTTP requests. RDF engines support SPARQL
queries that apply logical, arithmetical, and set operators, such as union and
join. For example, a query finds dogs of a certain weight, or a list of companies
located in a given location. Cognitive queries are a new class of queries that use
AI technologies to extract relevant information based on semantic similarity and
classification models. There is a lack of support for cognitive queries based on
semantic affinity and classification models, e,g., a query retrieving information
about dogs semantically matching a given dog’s image, or a list of companies
whose financial growth is similar to a particular company.
1 https://rebrand.ly/KGNET01
0 Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>KGNET Interface
Query Parser</p>
      <p>Get Results</p>
      <p>Model Explanation
Embeddings</p>
      <p>As A Service</p>
      <p>Get Vector
Get Semantics</p>
      <p>Similarity
Opt Embedding</p>
      <p>Technique</p>
      <p>Store
On the fly</p>
      <p>Cognitive Query</p>
      <p>Engine</p>
      <p>UDF
Manager
Semantic
Operations</p>
      <p>Data
Augmentation</p>
      <p>Query</p>
      <p>Optimizer
Statistics and
Cost Prediction</p>
      <p>Data</p>
      <p>Sampling</p>
      <p>RDF Engines</p>
      <p>
        Numerous embedding techniques can encode the semantics of heterogeneous
datasets ranging from text and images to graph nodes and edges. Glove [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a
word embedding technique capturing semantic similarity of words. VGG16 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is
a deep CNN image classification model to generate image embeddings. FaceNet
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] obtains vector representations for face images. Moreover, HolE,TransE and
others [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are KG embeddings techniques that can embed graph nodes/edges.
      </p>
      <p>The semantic similarity between two entities could be measured by
calculating the distance between the entities’ embeddings using distance metrics, such as
Euclidean distance, Cosine, and Jaccard. The accuracy of these embedding
techniques and distance metrics varies from a dataset to another. Moreover, space
and time complexity of generating the embeddings vary, too. Thus, it is a
timeconsuming task for technical users to build AI pipelines that extract semantic
information from KGs based on embedding similarity and classification models.</p>
      <p>
        We can utilize User Defined Functions (UDFs) supported in RDF engines to
extend a SPARQL query with semantic affinity operators and classification
models. However, the query engine should address several research challenges, such as
estimating UDF cost, automatically selecting the suitable embedding technique,
and finally explaining the semantic results. The cognitive queries in relational
databases were introduced by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] which focused on learning representations for
database tokens, i.e., words. Cognitive KG queries are more challenging due to
the large varieties of embedding techniques and heterogeneous datasets.
      </p>
      <p>In this demo, we purpose KGNet; a cognitive KG platform that supports
numerous embedding techniques, different semantic similarity measures, and data
augmentation for training classification models. KGNet addresses the above
research challenges to allow efficient semantic exploration of KGs with extensible
AI capabilities. Section 2 outlines the KGNet architecture. Section 3 gives a
glimpse on demo scenario. Section 4 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The KGNet Platform</title>
      <p>
        KGNet is modularized into three layers as illustrated in Figure 1. The
interface layer enables users to post their cognitive queries through our GUI for easy
exploration of their KG or through our web service endpoint for easy
integration into data science pipelines via Jupyter Notebooks and Google Colab. The
KGNet interface layer is integrated with SHAP [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] tool to provide visual
explanation of results based on classification models. The middle layer is composed
of two main components, namely Embedding As A Service and Cognitive
Query Engine. Embedding As A Service acts as an embedding store that
maintains linked-data or graph embeddings and builds a catalogue of embedding
techniques. It provides three main services: get embedding vector for an entity
by choosing an embedding model from a catalogue or through providing custom
trained embedding model, get similarity score between two entities based on a
pre-defined similarity measure metrics and finally get near-optimal embedding
techniques by choosing between different embedding techniques for the same
data set [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Hence, selecting the near-optimal model for a cognitive query is
important by sampling query results that satisfy the query conditions.
      </p>
      <p>Cognitive Query Engine is responsible for providing semantic operators
functionality to SPARQL queries and augmenting graph data. In data
augmentation, we enrich graph nodes with several features to generate more accurate
embeddings. KGNet maintains a catalogue of pre-defined UDFs covering
primitive semantic similarity measures for different applications. KGNet makes it
easy to write a cognitive query using the UDFs, semi-automates the execution
pipeline, and collects statistics to support query optimizations.</p>
      <p>KGNet automates the selection of the near-optimal embedding technique
using our data sampling. The KGNet underlying layer is an RDF engine. In
KGNet, the RDF engine should support UDFs and communicate with external
endpoints through HTTP get/post requests. KGNet currently tested with
Virtuoso and the Apache Jena and supports UDFs in PL/C++ or Java, respectively.
KGNet design provides modular components that interact together to support
scalability. Integration with KGNet is pretty simple; users have to provide the
KG data, use a pre-defined (embeddings service, UDF).
3</p>
    </sec>
    <sec id="sec-3">
      <title>Demonstration Overview</title>
      <p>
        We developed four different use cases based on real datasets and variant
embeddings techniques. Due to lack of space, this section highlights three use cases
as summarized in Table 1. The first use case, Dogs Breeds demos semantic
search in KG with image contents. We collected real datasets from Kaggel and
Data World for dog breeds classification. Moreover, we augmented the generated
KG with features, such as breed overview, intelligence level, breed health issues,
and recommended for, collected from bowwowinsurance.com. The constructed
and augmented KG is serialized as a Turtle file and loaded into a Virtuoso
server. KGNet used a pre-trained VGG16 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] deep learning (DL) model to
classify these breeds and generate embeddings then link them with KG through a
UDF called getDogSimilarityScore (see Figure 2 line 3) that returns
similarity score between two images. VGG16 pre-processes the input image to generate
its embeddings on the fly, i.e., the used UDF consumes more time. Hence, a
simple cashing mechanism for external URLs embedding vector in memory is used
to improve cognitive query response time. Building a cost-effective estimation
1 prefix ns1:&lt;https://www.dog_breeds.com/&gt;
2 SELECT ?dog_image ?breed_class ?breed_overview
3 (sql:getDogSimilarityScore(?dog_image,?external_image_url)) as ?Score
4 WHERE {?s ns1:img_folder_name ?breed_class. ?s ns1:img1 ?dog_image.
5 optional {?s ns1:breed_overview ?breed_overview} }
6 ORDER BY DESC(xsd:float(?Score))
model for these UDFs is mandatory to optimize user queries and predict the
shortest execution pipeline.
      </p>
      <p>
        The second use case, Economics KG, demos semantic search based on
the KG structure and the interconnection among KG nodes. In this use case,
we use structural KG embedding techniques to group similar companies in the
Forbes-2013 dataset together based on companies attributes and neighbourhood
location in the graph. We used off-the-shelf KG embedding techniques, such as
RDF2Vec, HolE, TransE, DistMul [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], to generate embeddings, and evaluated
the accuracy using Random Forest ML models. This dataset contains a list of
attributes like company name and market value. To augment this graph, we
converted attributes, such as market value, profit, and rank, into three categorical
classes (low, medium and high) based on quantile values of these features.
      </p>
      <p>
        The final use case is related to KG Question-Answering. In this use case,
we use text embedding techniques to get a similar keywords list to a certain
predicate and use this list to improve QA results. We used DBPedia KG to
query for a subject based on predicate similar embedded words retrieved from
GloVe [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] pre-trained model and linked with KG subject.
      </p>
      <p>It is a challenging task to select a near-optimal embedding technique for a
specific dataset. It is tedious for expert users to decide which embedding technique
to use, especially with different dependant dataset attributes. Figure 3 shows
the accuracy of using three different KG-embedding techniques to predict the
market value attribute using a random forest classifier. DistMul embedding
technique achieved the best prediction accuracy with a score of 0.978. KGNet finds
a near-optimal embedding technique for a specific cognitive query by sampling
graph data and applying ML prediction task on it. KGNet uses this mechanism
to decide the best embedding technique on the fly and generate the results with
the highest accuracy.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>KGNet bridges the gap between KG engines and AI pipelines to provide built-in
cognitive query support. Thus, there is no need to reformat and migrate KG data
from RDF engines to data science or AI platforms. Our proposed cognitive query
extension enables the invocation of a user-defined semantic function based on
different embedding techniques. KGNet optimizes the query execution pipeline,
estimates UDF’s cost, and automatically opts for the near-optimal embedding
techniques. In KGNet, a user needs only to provide data and customize UDF for
semantic search or use an existing one. This enables semantic discover on KGs
for users without prior knowledge on embedding techniques and AI pipelines.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bordawekar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shmueli</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Enabling cognitive intelligence queries in relational databases using low-dimensional word embeddings (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.I.:</given-names>
          </string-name>
          <article-title>A unified approach to interpreting model predictions</article-title>
          . In: Guyon,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.V.</given-names>
            ,
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Vishwanathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <fpage>4765</fpage>
          -
          <lpage>4774</lpage>
          . Curran Associates, Inc. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Schroff</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalenichenko</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Philbin</surname>
          </string-name>
          , J.:
          <article-title>Facenet: A unified embedding for face recognition and clustering</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          . pp.
          <fpage>815</fpage>
          -
          <lpage>823</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          . In: Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>LeCun</surname>
          </string-name>
          , Y. (eds.) 3rd
          <source>International Conference on Learning Representations, ICLR</source>
          <year>2015</year>
          , San Diego, CA, USA, May 7-
          <issue>9</issue>
          ,
          <year>2015</year>
          , Conference Track Proceedings (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Knowledge graph embedding: A survey of approaches and applications</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>29</volume>
          (
          <issue>12</issue>
          ),
          <fpage>2724</fpage>
          -
          <lpage>2743</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1109/TKDE.
          <year>2017</year>
          .2754499
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>