<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Visual Relationship Detection using Knowledge Graphs for Neural-Symbolic AI</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dave Herron</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>City, University of London</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Momentum is surging behind the consensus that neural-symbolic AI is the right road for AI to take today. We propose to travel this road using Semantic Web technologies to represent the symbolic AI tradition. Our objective is to investigate and compare the eficacy of a variety of strategies for combining the capabilities of deep neural networks for statistical learning from data with those of OWL ontologies and knowledge graphs for symbolic knowledge representation and reasoning. Our application area is visual relationship detection within images. Deep learning is data hungry and struggles to generalise to examples outside the training distribution. We seek to show that combining Semantic Web domain knowledge and reasoning with deep learning can deliver superior performance, can substitute for plentiful training data, and can deliver robust generalisation in few-shot/zero-shot learning scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;neural-symbolic</kwd>
        <kwd>AI</kwd>
        <kwd>semantic web</kwd>
        <kwd>knowledge graphs</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Problem Statement</title>
      <p>
        At the micro-level, our problem space is the application area of visual relationship detection
within images, using a small (5,000 images) dataset prepared for this purpose in 2016: the VRD
dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The dataset originators used crowd-sourcing to annotate each image with some
number of visual relationships (VRs). A VR is a (subject, predicate, object) triple,
where the subject and object are individual objects (represented by bounding boxes and class
labels) and the predicate expresses some relationship between them. For example, (person,
ride, horse) and (horse, on, grass) might be two VRs for a given image. The VR
annotations refer to 100 common, everyday classes of object that broadly but sparsely span the
material world: types of vehicle, furniture, appliance, device, clothing, sporting good, animal,
plant, landscape feature, etc.. The 70 predicates (relationships) referred to in the VR annotations
are primarily common spatial relations (above, below, behind, beside, on, in, ...) and common
verbs (wear, hold, use, carry, drive, ride, eat, touch, kick, has, ...). The breadth and variety of
object classes and predicates permitted us to design an ontology with rich class and property
hierarchies for describing (what we call) this VRD-world domain. Our VRD-world ontology
currently contains 239 classes and makes extensive use of object property characteristics such
as domain/range restrictions, subPropertyOf, property equivalence, symmetry, inverses, etc..
The VR annotations also map nicely to RDF triples for populating a KG with facts (ABox data).
      </p>
      <p>Deep learning is known to depend on big data for good performance, and to struggle to
generalise and extrapolate to examples that lie outside the distribution of data seen during
training. The small size of the VRD image dataset, and the long tail on the highly skewed
frequency distribution of the VR annotations of the images, are likely to provoke these limitations
of deep learning. Our research considers how to leverage the symbolic knowledge representation
and reasoning capabilities of a KG hosting our OWL VRD-world ontology so as to overcome
the limitations of deep learning and deliver superior performance at detecting VRs in images.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Importance</title>
      <p>
        Our research has relevance and benefits for several groups: the AI community, industry, society
generally and the SW community. Prominent voices from the AI community such as those
of Marcus [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Chollet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Kautz [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] corroborate one another in arguing that, due to its
limitations (like those just mentioned), deep learning alone, despite its spectacular achievements,
will not lead to human-level, artificial general intelligence (AGI). Marcus [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] warns that the
ill-advised hype around deep learning could lead to a 3rd AI winter. Kautz [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] speculates that
the (current) 3rd AI summer may avoid succumbing to a 3rd winter, but only because of the
momentum that now exists behind neural-symbolic integration. A consensus has emerged that
the neural-symbolic road is the right one for AI today [
        <xref ref-type="bibr" rid="ref5 ref6 ref7 ref8">6, 7, 5, 8</xref>
        ]. By helping to show how
symbolic background knowledge and reasoning can reduce deep learning’s dependence on
big data whilst boosting its ability to generalise, and by advancing understanding of
neuralsymbolic AI generally, our research directly contributes to taking AI further along the road
toward human-level, AGI and to helping it avoid another AI winter.
      </p>
      <p>AI has transformed many aspects of everyday life, in industry and for society generally,
and in ways we all now take for granted. We all have expectations of continuing positive
innovations and transformative efects from AI. Hence, as neural-symbolic AI research advances
AI along the road to AGI, industry and society generally will be impacted directly and benefit
proportionately.</p>
      <p>
        Finally, our research has the potential to show that OWL ontologies and KGs can be used
to integrate neural, statistical learning with symbolic background knowledge and reasoning
in concrete, tangible ways. In doing so, it can demonstrate that these SW technologies are
exemplars of the types of symbol manipulating tools and abstraction and reasoning modules
that Marcus [
        <xref ref-type="bibr" rid="ref5 ref8">5, 8</xref>
        ] and Chollet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], respectively, call for being incorporated into hybrid,
neuralsymbolic systems in order for AI to advance along the road to AGI. Such a demonstration may
shine a spotlight on SW technologies that helps to place them, and the SW community, at the
centre of attention of neural-symbolic AI.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        Work relating to neural-symbolic AI in general can be reviewed in surveys such as [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9, 10, 11, 12</xref>
        ].
One prominent example is Logic Tensor Networks (LTN) [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. LTN is a fuzzy logic-based
framework for training conventional NNs to satisfy logical constraints expressed as background
knowledge over training data.
      </p>
      <p>
        Work relating to neural-symbolic AI that uses SW technologies to represent the domain
of symbolism is rapidly accumulating. Myklebust et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] create a composite KG from
disparate sources, use it to generate KG embeddings (via various models), and then use the KG
embeddings to train a NN binary classifier to predict whether or not a property representing
mortality risk should be present in the composite KG to link certain individual chemicals with
certain individual species of organism. The theme of ‘deep deductive reasoning’ (training NNs
to reason over SW knowledge bases and KGs) is progressively developed in [
        <xref ref-type="bibr" rid="ref16 ref17 ref18">16, 17, 18</xref>
        ]. The
theme of using KGs to compensate for the lack of plentiful samples with which to train robust
deep learning-based systems (in so-called few-shot and zero-shot learning scenarios) is studied
in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and surveyed in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Similarly, in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], Wang et al. demonstrate that the
structure of the class hierarchy of a cell ontology can be leveraged (as an undirected graph) to
significantly improve the accuracy of deep learning-driven cell classification for cells whose
types were unseen during training.
      </p>
      <p>
        Work relating to using neural-symbolic AI for VR detection on the VRD dataset also exists.
The original (2016) VRD paper by Lu et al., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], does not mention neural-symbolic integration
(which reflects how little traction this area of AI had just some years ago), and neither is it
mentioned in the comprehensive survey of the area in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. But it should absolutely now be
recognised as an early and innovative form of neural-symbolic AI because their system is a
hybrid that includes a ‘language module’ trained on word embeddings of the (symbolic) VRD
object class names. Donadello &amp; Serafini [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] enumerate LTN negative domain/range knowledge
constraints to train NNs to detect VRs in VRD images. Their approach, however, exposes a
scalability limitation of LTN that we hope to show can be elegantly overcome by using KGs.
Daniele &amp; Serafini [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] test their KENN system on the VRD dataset.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Research Questions</title>
      <p>RQ1: How can we combine learning and reasoning to get the best of both worlds? Here,
‘learning’ refers to the the statistical ‘learning from data’ capabilities of deep NNs, ‘reasoning’
refers to the symbolic knowledge representation and reasoning capabilities of OWL
ontologies and KGs, and ‘best of both worlds’ refers to improved VR predictive performance. We
hypothesise that each of the several distinct NN-KG integration (NN-KG-I) strategies that we
have conceived (and which we describe shortly) will deliver VR predictive performance that is
superior to whatever baseline performance our deep NNs are able to deliver by themselves. We
aim to experiment with each of our NN-KG-I strategies individually, rank them, and explain
their relative eficacy by analysing the nature of the interactions between deep learning and
symbolic knowledge representation and reasoning that they exercise.</p>
      <p>RQ2: Some of our NN-KG-I strategies will be compatible with one another. How will they
perform when used in diferent combinations? We hypothesise that we will find the contributions
to improved VR predictive performance (beyond the baseline) that they make when used
individually are additive when used in certain combinations, but that when used in other
combinations there are interesting interaction efects between the integration strategies which
either amplify or diminish their collective efect on VR predictive performance. Analysis of
the results of these experiments is expected to yield further insights into the nature of the
interactions between deep learning and symbolic knowledge representation and reasoning.
RQ3: How best and to what extent can the symbolic knowledge representation and reasoning
capabilities of OWL ontologies and KGs be leveraged by deep NNs to substitute for plentiful
training data and enable robust generalisation on out-of-distribution examples? Within the
VRD image annotations, many VR types have just a few training instances, and many have test
instances but zero training instances. Hence, the VRD dataset lends itself to the examination
of the sort of small dataset and few-shot/zero-shot learning scenarios in which deep learning
alone struggles to perform well. We hypothesise that our NN-KG-I strategies will show that SW
technologies such as our OWL VRD-world ontology, together with the reasoning capabilities of
a KG, can be used to construct hybrid, neural-symbolic systems that out-perform deep learning
alone in such settings.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Methods</title>
      <p>
        The architecture of our baseline hybrid, neural-symbolic system consists of a NN pipeline and a
KG populated with our VRD-world ontology. The NN pipeline consists of an object detection
NN followed by a multi-label predicate prediction NN that takes ordered pairs of detected
objects, plus geometric features derived from their bounding boxes (an idea borrowed from
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]), as input. Experimentation with our several NN-KG-I strategies for combining these neural
and symbolic components will drive the exploration of our research questions. We describe two
of our NN-KG-I strategies, denoted S1 and S2, in some detail and mention others briefly.
1
      </p>
      <p>2</p>
      <p>Object
Detection</p>
      <p>Neural
Network</p>
      <p>Predicate
Prediction</p>
      <p>Neural</p>
      <p>Network
&lt; person, ride, bike &gt;, &lt; bike, carry, person &gt;,
&lt; person, hold, phone &gt;, &lt; person, wear, jacket &gt;
process KG feedback 4
&lt; error: KG state inconsistent &gt;</p>
      <p>insert candidate
3 VR prediction in KG
&lt; subjectX, rdf:type, Person &gt;
&lt; objectY, rdf:type, Phone &gt;
&lt; subjectX, wear, objectY &gt;</p>
      <p>Knowledge</p>
      <p>Graph
with
VRD-world
ontology
S1: Strategy S1 is about leveraging a SW KG’s ability to automatically enforce the semantic
rules of an OWL ontology. The only predicted VRs that stand a chance of matching with
annotated (ground truth) VRs are those that are semantically valid according to our VRD-world
ontology. So it makes sense to help our predicate prediction NN learn the relevant semantic
rules of our VRD-world ontology. This is analogous to a checkers-playing system learning
the legal checkers moves. It does not fully solve the problem of finding the best move, but
by narrowing the search space to the legal moves, learning to find the best move becomes
significantly easier. One way to utilise strategy S1 is to take the VR predictions emitted by the
predicate prediction NN during training, insert them into a KG populated with the VRD-world
ontology, and then communicate any feedback from the KG regarding invalid VRs back to the
NN (e.g. by penalising its loss function). This scenario is depicted in Figure 1.</p>
      <p>There is also potential to utilise strategy S1 during inference. Despite having trained our
predicate prediction NN as best we can to only predict semantically valid VRs, it may still
sometimes predict semantically invalid VRs during inference on test set images. But the KG
is an active agent capable of participating in determining the final predictions of the hybrid,
neural-symbolic system. Specifically, we can use it as a final filter to suppress bad VR predictions.
Rather than submit the VR predictions of the predicate prediction NN directly (on behalf of the
hybrid, neural-symbolic system), we may first insert them into the KG and then submit only
those that the KG does not flag as being semantically invalid.</p>
      <p>
        Our strategy S1 is related conceptually to the approach taken in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] (mentioned earlier)
of enumerating (in code) LTN negative domain/range knowledge constraints to train NNs to
detect VRs in VRD images. But by using a KG and ontology rather than hand-coded knowledge
axioms, our strategy S1 will be shown to be more general, more scalable and more flexible than
the LTN neural-symbolic framework. It is more general because a KG automatically enforces
all semantic rules of an ontology, not just one particular category of domain rule. It is more
scalable because a KG scales efortlessly to handle domains with any number of classes and
properties, whereas the approach taken with LTN in [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] proved intractable given only the
limited diversity of VRD object classes and predicates. Finally, it is more flexible because, as we
have explained, a KG used for strategy S1 can participate not just in NN training but during NN
inference as well.
      </p>
      <p>S2: Strategy S2 involves using common-sense, Datalog-like rules to leverage and augment the
reasoning capabilities of our KG. For example, a rule expressing the plausibility of VR pattern
(X, wear, Y) can be described as follows:</p>
      <p>
        wear(X,Y) :- Person(X), WearableThing(Y), ir(Y,X) ~ 1
Triples asserting the predicted classes of the detected objects  and  will have been inserted
into our KG before the rule is evaluated: (X, rdf:type, ), (Y, rdf:type, ).
Determining if the first goal of the body of this rule is satisfied is accomplished by a KG query
to check whether X is a member of class Person. Determining if the second goal is satisfied is
accomplished similarly but depends on the KG having deduced whether Y is a WearableThing
(which is not a VRD object class). The third goal represents a novel reuse of a bounding box
geometric feature function (per [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]), the inclusion ratio. This goal is satisfied if the bounding
box for Y is mostly enclosed within the bounding box for X (as would generally be the case
when a person can plausibly be said to be wearing something).
      </p>
      <p>We will define a collection (base) of such rules using the knowledge of the VRD-world domain
gained from having analysed the VR annotations so as to design the ontology. The impact of
strategy S2 will likely be proportional to the comprehensiveness of this rule base, but on a
diminishing scale. Analysis of the frequency distribution of the VR annotation types will help
us identify those rules likely to have the greatest impact. We can minimise the number of rules
required to deliver a measurable performance efect by focusing on high-impact rules.</p>
      <p>We have identified two approaches for implementing our common-sense rules. Approach
S2-A is to use Python to build custom rules and a simple (non-recursive) rule engine component
for evaluating them. This rule engine component would mediate interaction between the
predicate prediction NN and the KG. The description of S2 given above presumes this approach.
Approach S2-B is to define proper Datalog rules and to use a KG tool whose support for Datalog
includes support for basic arithmetic functions suficient for emulating bounding box geometric
feature functions (such as the inclusion ratio).</p>
      <p>A further approach to S2, S2-C, would share S2-A’s Python-based rules and rule engine while
allowing us to explore an additional dimension of NN-KG integration. This approach involves
developing methods for transferring and representing the structure of the class hierarchy of
our VRD-world ontology within supplementary layers (and their associated weight matrices) of
our object detection NN. The objective is to enable the object detection NN to not just detect
objects of base classes (e.g. jacket) but to perform the generalisations needed to convey all
the parent classes for each object (e.g. clothing, wearable thing, etc..) as well. This way, our
Python-based rule engine need not query the KG because any class membership information
required to determine rule goal satisfaction will have been supplied by the NN.
S1 and S2 combined: NN-KG-I strategy S1 should be good at identifying poor VR predictions
(negative cases), while S2 should be good at identifying plausible VR predictions (positive cases).
So, in theory, they are complementary and may work well together, delivering a combined
boost to VR predictive performance.</p>
      <p>Some other NN-KG-I strategies: Another integration strategy involves using KG
embeddings to train a NN to score the plausibility of VR predictions. This scoring NN would then be
used to help train the predicate prediction NN.</p>
      <p>A further strategy involves leveraging the training set VR annotations (KG data). Each ordered
pair of object classes will have some number of annotated VR instances involving some subset
of the 70 predicates. These can be transformed into discrete probability distributions over the
predicates. The VR predictions generated by our predicate prediction NN during training can
similarly be transformed into corresponding discrete probability distributions. The dissimilarity
of corresponding pairs of predicted and annotated VR probability distributions can then be
measured (using a metric identified for this purpose), and these measures can be aggregated to
produce a penalty term with which to augment the loss function of the predicate prediction NN.
Tactics for leveraging the ontology so as to intelligently redistribute probability in the target
distributions can also be explored to better facilitate few-shot and zero-shot learning.
Evaluation: The NN pipeline of our system architecture will be capable of delivering some
measure of VR predictive performance on its own. This is the baseline performance measure
against which all NN-KG-I strategies will be judged.</p>
      <p>
        The authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] measure VR predictive performance using a recall@N metric
that measures recall globally, across all images. In addition to this global recall@N metric, we
plan to use a per-image measure of recall@N that we average over the images. Basic recall@N
(whether global or per-image and averaged), however, takes account only of the number of hits
in the top  predictions. We have also designed a more sensitive metric that we call ‘Mean Avg
Recall@K top-N’ that measures both the hit count and the positions of the hits within the top
 ranked predictions.
      </p>
      <p>
        As per [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], we plan to evaluate zero-shot VR predictive performance similarly to
overall performance. The only diference is that when evaluating zero-shot performance, the
annotated VRs participating in evaluation are limited to zero-shot VR instances — i.e. VR
instances whose VR types are not represented within the training set VR annotations. We plan
to evaluate few-shot VR predictive performance in the same way, but here the annotated VRs
participating in evaluation will be limited to those for which the training set VR annotations
contain only some small number of instances (1 to 5, say).
      </p>
      <p>A key principle of our evaluation strategy is to keep the baseline architecture of our hybrid
system unchanged across investigations of diferent NN-KG-I strategies. This will enable us to
attribute changes in VR predictive performance to a NN-KG-I strategy alone. It will also best
enable us to compare and rank our strategies in terms of VR predictive performance eficacy.</p>
      <p>We use multiple metrics so our evaluation strategy cross-validates. If the multiple measures
of performance corroborate one another, we can interpret the efect of a given NN-KG-I strategy
with confidence. If not, this will signal the need for caution and investigation.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Preliminary Results</title>
      <p>We are still assembling the infrastructure to enable experimentation, so we discuss preliminary
results in the sense of things accomplished. The original, crowd-sourced VR annotations of the
VRD dataset are full of inconsistencies and errors. For example, object class ‘bear’ refers to real
bears and teddy bears; class ‘plate’ refers to dishware plates, license plates (on vehicles) and
baseball (home) plates; too many instances of VR pattern (person, wear, Y) have Y on a diferent
person. Apart from making object detection and relationship prediction noisily problematic,
the semantic variability of the object classes made precise ontology design infeasible. No class
hierarchy felt credible, and few opportunities existed to define useful domain/range restrictions
on object properties (VRD predicates). We therefore undertook a comprehensive VR analysis
and customisation exercise to strengthen the semantic consistency of the VR annotations. In
time, our VRD-world ontology, our customised VR annotations, our protocol for specifying VR
customisations textually, and our code for applying them in an automated, repeatable fashion,
will be made publicly available as a contribution to the AI and SW communities.</p>
      <p>Object detection training and experimentation is underway. Our predicate prediction NN has
been designed. Our evaluation metrics have been implemented and proof-of-concept testing
confirms they behave as expected. Proof-of-concept exercises confirming the feasibility of
NN-KG-I strategies S1 and S2 (described above) have been completed successfully. A customised
binary cross-entropy loss function has been conceived for training our multi-label predicate
prediction NN that provides parameters for influencing the loss attracted by predicted VRs that
have no matching annotated VR. Many of these predicted VRs will be entirely plausible and yet
will be treated as false positives simply due to the unavoidable sparsity and arbitrariness of the
annotated VRs. We aim to explore the efect of influencing the magnitude of the loss attracted
by such plausible false positive cases, based on judgements derived from our KG.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Reflection and Future Work</title>
      <p>Reflection: First, our research is about combining the use of KGs with deep learning in hybrid,
neural-symbolic systems. The application task of VR detection within images is simply a context
for exploring combination/integration strategies. We believe our NN-KG-I strategies to be
generic and widely applicable. However, we plan to continue looking for other dataset/ontology
pairs with which to apply our strategies so as to further demonstrate their generality.</p>
      <p>
        Second, as we have described, we chose to heavily customise the original, crowd-sourced
VR annotations of the VRD images in order to enable the design of a precise ontology (and
to correct egregious errors). A consequence of this choice is that we sacrifice the ability to
directly compare the predictive performance results of our various hybrid VR detection systems
with those of the systems of previous researchers (such as [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]). However, this sacrifice
is justified by the fact that the purpose of our research is not to build a better VR detection
system on the VRD dataset than others, it is to explore generic ways of combining KGs with
deep learning that deliver performance superior to what deep learning can deliver alone.
      </p>
      <p>
        Third, NN-KG-I strategies such as S1 and S2 that rely on real-time interaction with a KG are
likely to increase NN training times considerably, particularly if the KG is out-of-process (even
online) and accessed via a SPARQL endpoint. We do not, however, believe this consideration
to be a major concern. In our case, the small VRD dataset (4,000 training images) means no
issues should arise. More generally, we surmise that the growth rate of NN training times
relative to dataset size, , will be linear (() time) and that, on this basis, the computational
complexity implications of real-time KG access should always be manageable. Further, tactics
such as caching may well be exploitable to help keep KG access to a minimum.
Future work: Our research can readily extend in multiple directions. One direction is to
pursue the goal of contributing to the development of a theory to help formalise the foundations
of neural-symbolic AI, as advocated by van Harmelen in [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. One such contribution involves
positioning our NN-KG-I strategies within the schemes for categorising approaches to (and
compositional patterns for) neural-symbolic integration proposed by others (e.g. [
        <xref ref-type="bibr" rid="ref13 ref26 ref27 ref7">13, 26, 27, 7</xref>
        ]).
Where we find they do not fit comfortably, we might propose scheme/pattern refinements.
We expect this task to be both challenging and rewarding given that several alternate
categorisation/pattern schemes have been proposed and given that some of our multiple diferent
strategies may well sit best at diferent positions within these diferent schemes. Another
contribution involves taking our analyses of the interactions between deep statistical learning
and symbolic knowledge representation and reasoning exposed by our NN-KG-I strategies
to a deeper, more theoretical level. Another direction in which our research leads involves
enhancing the interpretability of hybrid, neural-symbolic system behaviour by, for example,
investigating methods for generating explanations of predictions for system users. Yet another
direction involves exploring NN-KG-I strategies for the express purpose of extracting new
knowledge from data to add to KGs (aka KG completion).
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Thank you to my supervisors Dr. Ernesto Jiménez-Ruiz and Dr. Tillman Weyde for their
guidance and support.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Valiant</surname>
          </string-name>
          , Three Problems in Computer Science,
          <source>J. ACM</source>
          <volume>50</volume>
          (
          <year>2003</year>
          )
          <fpage>96</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>A Review of the Semantic Web Field, Commun</article-title>
          . ACM
          <volume>64</volume>
          (
          <year>2021</year>
          )
          <fpage>76</fpage>
          -
          <lpage>83</lpage>
          . URL: https://doi.org/10.1145/3397512.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <article-title>Neural-Symbolic Integration and the Semantic Web</article-title>
          ,
          <source>Semantic Web</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>3</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>Visual Relationship Detection with Language Priors</article-title>
          ,
          <source>in: European Conference on Computer Vision</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>852</fpage>
          -
          <lpage>869</lpage>
          . URL: https: //cs.stanford.edu/people/ranjaykrishna/vrd/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <article-title>Deep Learning: A Critical Appraisal</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1801</year>
          .00631.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chollet</surname>
          </string-name>
          ,
          <source>Deep Learning: Current Limits and What Lies Beyond Them, Presentation at RAAIS</source>
          ,
          <year>2018</year>
          . URL: https://raais.co/speakers-2018.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kautz</surname>
          </string-name>
          ,
          <source>The Third AI Summer, AAAI Robert S. Engelmore Memorial Lecture, Thirtyfourth AAAI Conference on Artificial Intelligence</source>
          , New York, NY,
          <year>2020</year>
          . URL: https://www. cs.rochester.edu/u/kautz/talks/,
          <source>presentation slides and video.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Marcus</surname>
          </string-name>
          , The Next Decade in
          <source>AI: Four Steps Towards Robust Artificial Intelligence</source>
          ,
          <source>CoRR</source>
          (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2002</year>
          .06177.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Besold</surname>
          </string-name>
          , A. S.
          <string-name>
            <surname>d'Avila Garcez</surname>
          </string-name>
          , et al.,
          <article-title>Neural-Symbolic Learning and Reasoning: A Survey and Interpretation</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2017</year>
          ). URL: https://arxiv.org/abs/1711.03902.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>A. d'Avila Garcez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gori</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Neural-Symbolic Computing</surname>
          </string-name>
          :
          <article-title>An Efective Methodology for Principled Integration of Machine Learning and Reasoning</article-title>
          , FLAP
          <volume>6</volume>
          (
          <year>2019</year>
          )
          <fpage>611</fpage>
          -
          <lpage>632</lpage>
          . URL: https://collegepublications.co.uk/ifcolog/?00033.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <source>Neuro-Symbolic Approaches in Artificial Intelligence, National Science Review</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>M. K. Sarker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Eberhart</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <surname>Neuro-Symbolic Artificial</surname>
            <given-names>Intelligence</given-names>
          </string-name>
          : Current Trends,
          <source>CoRR</source>
          (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2105.05330.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Badreddine</surname>
          </string-name>
          , A.
          <string-name>
            <surname>d'Avila Garcez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Serafini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Spranger</surname>
          </string-name>
          , Logic Tensor Networks,
          <source>Artificial Intelligence</source>
          <volume>303</volume>
          (
          <year>2022</year>
          )
          <fpage>103649</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. S.</surname>
          </string-name>
          <article-title>d'Avila Garcez, Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge</article-title>
          ,
          <source>in: Proceedings of NeSy'16</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E. B.</given-names>
            <surname>Myklebust</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Tollefsen</surname>
          </string-name>
          ,
          <article-title>Prediction of Adverse Biological Efects of Chemicals using Knowledge Graph Embeddings</article-title>
          ,
          <source>Semantic Web</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>299</fpage>
          -
          <lpage>338</lpage>
          . URL: https://doi.org/10.3233/SW-222804.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>On the Capabilities of Logic Tensor Networks for Deductive Reasoning</article-title>
          ,
          <source>in: Proceedings of the AAAI-MAKE</source>
          , volume
          <volume>2350</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>Towards Bridging the Neuro-Symbolic Gap: Deep Deductive Reasoners</article-title>
          , Appl. Intell.
          <volume>51</volume>
          (
          <year>2021</year>
          )
          <fpage>6326</fpage>
          -
          <lpage>6348</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          , et al.,
          <article-title>Neuro-Symbolic Deductive Reasoning for Cross-Knowledge Graph Entailment</article-title>
          ,
          <source>in: Proceedings of the AAAI-MAKE</source>
          , volume
          <volume>2846</volume>
          ,
          <year>2021</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2846</volume>
          /paper8.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Zero-Shot Visual Question Answering using Knowledge Graph, CoRR (</article-title>
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2107.05348.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Chen,
          <article-title>Explainable Zero-shot Learning via Attentive Graph Convolutional Network and Knowledge Graphs</article-title>
          ,
          <source>Semantic Web</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>741</fpage>
          -
          <lpage>765</lpage>
          . URL: https://doi.org/10.3233/SW-210435.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Geng</surname>
          </string-name>
          , et al.,
          <article-title>Low-Resource Learning with Knowledge Graphs: A Comprehensive Survey</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2112.10006.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Pisco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McGeever</surname>
          </string-name>
          , et al.,
          <article-title>Leveraging the Cell Ontology to Classify Unseen Cell Types</article-title>
          ,
          <source>Nature Communications</source>
          <volume>12</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>I.</given-names>
            <surname>Donadello</surname>
          </string-name>
          , L. Serafini,
          <article-title>Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation</article-title>
          , in: IJCNN, IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Daniele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <article-title>Knowledge Enhanced Neural Networks</article-title>
          ,
          <source>in: PRICAI 2019: Trends in Artificial Intelligence</source>
          , volume
          <volume>11670</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>542</fpage>
          -
          <lpage>554</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>F. van Harmelen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Preface.</surname>
          </string-name>
          <article-title>The 3rd AI Wave is Coming, and it Needs a Theory, in: Neuro-Symbolic Artificial Intelligence: The State of the Art</article-title>
          , IOS Press,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>M. van Bekkum</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. de Boer</surname>
          </string-name>
          , F. van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Meyer-Vitali</surname>
          </string-name>
          ,
          <article-title>A. ten Teije, Modular Design Patterns for Hybrid Learning and Reasoning Systems</article-title>
          , Appl. Intell.
          <volume>51</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>F. van Harmelen</surname>
          </string-name>
          ,
          <article-title>A. ten Teije, A Boxology of Design Patterns for Hybrid Learning and Reasoning Systems</article-title>
          , J. Web Eng.
          <volume>18</volume>
          (
          <year>2019</year>
          )
          <fpage>97</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>