Visual Relationship Detection using Knowledge Graphs for Neural-Symbolic AI Dave Herron City, University of London, London, United Kingdom Abstract Momentum is surging behind the consensus that neural-symbolic AI is the right road for AI to take today. We propose to travel this road using Semantic Web technologies to represent the symbolic AI tradition. Our objective is to investigate and compare the efficacy of a variety of strategies for combining the capabilities of deep neural networks for statistical learning from data with those of OWL ontologies and knowledge graphs for symbolic knowledge representation and reasoning. Our application area is visual relationship detection within images. Deep learning is data hungry and struggles to generalise to examples outside the training distribution. We seek to show that combining Semantic Web domain knowledge and reasoning with deep learning can deliver superior performance, can substitute for plentiful training data, and can deliver robust generalisation in few-shot/zero-shot learning scenarios. Keywords neural-symbolic, AI, semantic web, knowledge graphs, CEUR-WS 1. Problem Statement At the macro-level, our problem space is the subfield of artificial intelligence (AI) called neural- symbolic AI, which is concerned with integrating the learning capabilities of deep neural networks (NNs) with the knowledge representation and reasoning capabilities of the symbolic tradition of AI in order to get the best of both worlds. Combining the neural and symbolic traditions of AI is an open research challenge. As Valiant explains in [1], although learning and reasoning have been thoroughly studied (separately) by the two traditions of AI, the semantics of their models are hard to reconcile: neural AI learning models have a statistical character whereas symbolic AI knowledge representation and reasoning models have a logical character. Further, the distributed, sub-symbolic representations of knowledge (features) encoded by NNs differ dramatically from the symbolic representations of knowledge used in logic. Our research explores neural-symbolic AI using Semantic Web (SW) ontologies and knowl- edge graphs (KGs) for symbolic knowledge representation and reasoning. Selecting SW tech- nologies to represent symbolic AI is uncontroversial. In [2], Hitzler explains that the SW field has long had a strong association with the symbolic tradition of AI. Further, in a recent review of specific synergies between SW technologies and deep learning, Hitzler et al. [3] anticipate that hybrid, neural-symbolic systems that leverage OWL ontologies and KGs should be capable of delivering better performance and interpretability than deep learning alone. Doctoral Consortium at ISWC 2022 co-located with 21st International Semantic Web Conference (ISWC 2022) $ david.herron@city.ac.uk (D. Herron) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) At the micro-level, our problem space is the application area of visual relationship detection within images, using a small (5,000 images) dataset prepared for this purpose in 2016: the VRD dataset [4]. The dataset originators used crowd-sourcing to annotate each image with some number of visual relationships (VRs). A VR is a (subject, predicate, object) triple, where the subject and object are individual objects (represented by bounding boxes and class labels) and the predicate expresses some relationship between them. For example, (person, ride, horse) and (horse, on, grass) might be two VRs for a given image. The VR annotations refer to 100 common, everyday classes of object that broadly but sparsely span the material world: types of vehicle, furniture, appliance, device, clothing, sporting good, animal, plant, landscape feature, etc.. The 70 predicates (relationships) referred to in the VR annotations are primarily common spatial relations (above, below, behind, beside, on, in, ...) and common verbs (wear, hold, use, carry, drive, ride, eat, touch, kick, has, ...). The breadth and variety of object classes and predicates permitted us to design an ontology with rich class and property hierarchies for describing (what we call) this VRD-world domain. Our VRD-world ontology currently contains 239 classes and makes extensive use of object property characteristics such as domain/range restrictions, subPropertyOf, property equivalence, symmetry, inverses, etc.. The VR annotations also map nicely to RDF triples for populating a KG with facts (ABox data). Deep learning is known to depend on big data for good performance, and to struggle to generalise and extrapolate to examples that lie outside the distribution of data seen during training. The small size of the VRD image dataset, and the long tail on the highly skewed frequency distribution of the VR annotations of the images, are likely to provoke these limitations of deep learning. Our research considers how to leverage the symbolic knowledge representation and reasoning capabilities of a KG hosting our OWL VRD-world ontology so as to overcome the limitations of deep learning and deliver superior performance at detecting VRs in images. 2. Importance Our research has relevance and benefits for several groups: the AI community, industry, society generally and the SW community. Prominent voices from the AI community such as those of Marcus [5], Chollet [6] and Kautz [7] corroborate one another in arguing that, due to its limitations (like those just mentioned), deep learning alone, despite its spectacular achievements, will not lead to human-level, artificial general intelligence (AGI). Marcus [5] warns that the ill-advised hype around deep learning could lead to a 3rd AI winter. Kautz [7] speculates that the (current) 3rd AI summer may avoid succumbing to a 3rd winter, but only because of the momentum that now exists behind neural-symbolic integration. A consensus has emerged that the neural-symbolic road is the right one for AI today [6, 7, 5, 8]. By helping to show how symbolic background knowledge and reasoning can reduce deep learning’s dependence on big data whilst boosting its ability to generalise, and by advancing understanding of neural- symbolic AI generally, our research directly contributes to taking AI further along the road toward human-level, AGI and to helping it avoid another AI winter. AI has transformed many aspects of everyday life, in industry and for society generally, and in ways we all now take for granted. We all have expectations of continuing positive innovations and transformative effects from AI. Hence, as neural-symbolic AI research advances AI along the road to AGI, industry and society generally will be impacted directly and benefit proportionately. Finally, our research has the potential to show that OWL ontologies and KGs can be used to integrate neural, statistical learning with symbolic background knowledge and reasoning in concrete, tangible ways. In doing so, it can demonstrate that these SW technologies are exemplars of the types of symbol manipulating tools and abstraction and reasoning modules that Marcus [5, 8] and Chollet [6], respectively, call for being incorporated into hybrid, neural- symbolic systems in order for AI to advance along the road to AGI. Such a demonstration may shine a spotlight on SW technologies that helps to place them, and the SW community, at the centre of attention of neural-symbolic AI. 3. Related Work Work relating to neural-symbolic AI in general can be reviewed in surveys such as [9, 10, 11, 12]. One prominent example is Logic Tensor Networks (LTN) [13, 14]. LTN is a fuzzy logic-based framework for training conventional NNs to satisfy logical constraints expressed as background knowledge over training data. Work relating to neural-symbolic AI that uses SW technologies to represent the domain of symbolism is rapidly accumulating. Myklebust et al. [15] create a composite KG from disparate sources, use it to generate KG embeddings (via various models), and then use the KG embeddings to train a NN binary classifier to predict whether or not a property representing mortality risk should be present in the composite KG to link certain individual chemicals with certain individual species of organism. The theme of ‘deep deductive reasoning’ (training NNs to reason over SW knowledge bases and KGs) is progressively developed in [16, 17, 18]. The theme of using KGs to compensate for the lack of plentiful samples with which to train robust deep learning-based systems (in so-called few-shot and zero-shot learning scenarios) is studied in [19] and [20] and surveyed in [21]. Similarly, in [22], Wang et al. demonstrate that the structure of the class hierarchy of a cell ontology can be leveraged (as an undirected graph) to significantly improve the accuracy of deep learning-driven cell classification for cells whose types were unseen during training. Work relating to using neural-symbolic AI for VR detection on the VRD dataset also exists. The original (2016) VRD paper by Lu et al., [4], does not mention neural-symbolic integration (which reflects how little traction this area of AI had just some years ago), and neither is it mentioned in the comprehensive survey of the area in [9]. But it should absolutely now be recognised as an early and innovative form of neural-symbolic AI because their system is a hybrid that includes a ‘language module’ trained on word embeddings of the (symbolic) VRD object class names. Donadello & Serafini [23] enumerate LTN negative domain/range knowledge constraints to train NNs to detect VRs in VRD images. Their approach, however, exposes a scalability limitation of LTN that we hope to show can be elegantly overcome by using KGs. Daniele & Serafini [24] test their KENN system on the VRD dataset. 4. Research Questions RQ1: How can we combine learning and reasoning to get the best of both worlds? Here, ‘learning’ refers to the the statistical ‘learning from data’ capabilities of deep NNs, ‘reasoning’ refers to the symbolic knowledge representation and reasoning capabilities of OWL ontolo- gies and KGs, and ‘best of both worlds’ refers to improved VR predictive performance. We hypothesise that each of the several distinct NN-KG integration (NN-KG-I) strategies that we have conceived (and which we describe shortly) will deliver VR predictive performance that is superior to whatever baseline performance our deep NNs are able to deliver by themselves. We aim to experiment with each of our NN-KG-I strategies individually, rank them, and explain their relative efficacy by analysing the nature of the interactions between deep learning and symbolic knowledge representation and reasoning that they exercise. RQ2: Some of our NN-KG-I strategies will be compatible with one another. How will they perform when used in different combinations? We hypothesise that we will find the contributions to improved VR predictive performance (beyond the baseline) that they make when used individually are additive when used in certain combinations, but that when used in other combinations there are interesting interaction effects between the integration strategies which either amplify or diminish their collective effect on VR predictive performance. Analysis of the results of these experiments is expected to yield further insights into the nature of the interactions between deep learning and symbolic knowledge representation and reasoning. RQ3: How best and to what extent can the symbolic knowledge representation and reasoning capabilities of OWL ontologies and KGs be leveraged by deep NNs to substitute for plentiful training data and enable robust generalisation on out-of-distribution examples? Within the VRD image annotations, many VR types have just a few training instances, and many have test instances but zero training instances. Hence, the VRD dataset lends itself to the examination of the sort of small dataset and few-shot/zero-shot learning scenarios in which deep learning alone struggles to perform well. We hypothesise that our NN-KG-I strategies will show that SW technologies such as our OWL VRD-world ontology, together with the reasoning capabilities of a KG, can be used to construct hybrid, neural-symbolic systems that out-perform deep learning alone in such settings. 5. Methods The architecture of our baseline hybrid, neural-symbolic system consists of a NN pipeline and a KG populated with our VRD-world ontology. The NN pipeline consists of an object detection NN followed by a multi-label predicate prediction NN that takes ordered pairs of detected objects, plus geometric features derived from their bounding boxes (an idea borrowed from [23]), as input. Experimentation with our several NN-KG-I strategies for combining these neural and symbolic components will drive the exploration of our research questions. We describe two of our NN-KG-I strategies, denoted S1 and S2, in some detail and mention others briefly. process KG feedback 4 Knowledge Object Predicate < error: KG state inconsistent > Graph Detection Prediction 1 2 insert candidate with Neural Neural 3 VR prediction in KG Network Network VRD-world < subjectX, rdf:type, Person > ontology < objectY, rdf:type, Phone > < person, ride, bike >, < bike, carry, person >, < subjectX, wear, objectY > < person, hold, phone >, < person, wear, jacket > Figure 1: The baseline architecture of our hybrid, neural-symbolic systems illustrating NN-KG inte- gration strategy S1. The KG deduces that is a poor VR prediction (because it violates a range restriction of property wear — i.e. a phone is not a wearable thing). S1: Strategy S1 is about leveraging a SW KG’s ability to automatically enforce the semantic rules of an OWL ontology. The only predicted VRs that stand a chance of matching with annotated (ground truth) VRs are those that are semantically valid according to our VRD-world ontology. So it makes sense to help our predicate prediction NN learn the relevant semantic rules of our VRD-world ontology. This is analogous to a checkers-playing system learning the legal checkers moves. It does not fully solve the problem of finding the best move, but by narrowing the search space to the legal moves, learning to find the best move becomes significantly easier. One way to utilise strategy S1 is to take the VR predictions emitted by the predicate prediction NN during training, insert them into a KG populated with the VRD-world ontology, and then communicate any feedback from the KG regarding invalid VRs back to the NN (e.g. by penalising its loss function). This scenario is depicted in Figure 1. There is also potential to utilise strategy S1 during inference. Despite having trained our predicate prediction NN as best we can to only predict semantically valid VRs, it may still sometimes predict semantically invalid VRs during inference on test set images. But the KG is an active agent capable of participating in determining the final predictions of the hybrid, neural-symbolic system. Specifically, we can use it as a final filter to suppress bad VR predictions. Rather than submit the VR predictions of the predicate prediction NN directly (on behalf of the hybrid, neural-symbolic system), we may first insert them into the KG and then submit only those that the KG does not flag as being semantically invalid. Our strategy S1 is related conceptually to the approach taken in [23] (mentioned earlier) of enumerating (in code) LTN negative domain/range knowledge constraints to train NNs to detect VRs in VRD images. But by using a KG and ontology rather than hand-coded knowledge axioms, our strategy S1 will be shown to be more general, more scalable and more flexible than the LTN neural-symbolic framework. It is more general because a KG automatically enforces all semantic rules of an ontology, not just one particular category of domain rule. It is more scalable because a KG scales effortlessly to handle domains with any number of classes and properties, whereas the approach taken with LTN in [23] proved intractable given only the limited diversity of VRD object classes and predicates. Finally, it is more flexible because, as we have explained, a KG used for strategy S1 can participate not just in NN training but during NN inference as well. S2: Strategy S2 involves using common-sense, Datalog-like rules to leverage and augment the reasoning capabilities of our KG. For example, a rule expressing the plausibility of VR pattern (X, wear, Y) can be described as follows: wear(X,Y) :- Person(X), WearableThing(Y), ir(Y,X) ~ 1 Triples asserting the predicted classes of the detected objects 𝑋 and 𝑌 will have been inserted into our KG before the rule is evaluated: (X, rdf:type, 𝐶𝑥 ), (Y, rdf:type, 𝐶𝑦 ). Determining if the first goal of the body of this rule is satisfied is accomplished by a KG query to check whether X is a member of class Person. Determining if the second goal is satisfied is accomplished similarly but depends on the KG having deduced whether Y is a WearableThing (which is not a VRD object class). The third goal represents a novel reuse of a bounding box geometric feature function (per [23]), the inclusion ratio. This goal is satisfied if the bounding box for Y is mostly enclosed within the bounding box for X (as would generally be the case when a person can plausibly be said to be wearing something). We will define a collection (base) of such rules using the knowledge of the VRD-world domain gained from having analysed the VR annotations so as to design the ontology. The impact of strategy S2 will likely be proportional to the comprehensiveness of this rule base, but on a diminishing scale. Analysis of the frequency distribution of the VR annotation types will help us identify those rules likely to have the greatest impact. We can minimise the number of rules required to deliver a measurable performance effect by focusing on high-impact rules. We have identified two approaches for implementing our common-sense rules. Approach S2-A is to use Python to build custom rules and a simple (non-recursive) rule engine component for evaluating them. This rule engine component would mediate interaction between the predicate prediction NN and the KG. The description of S2 given above presumes this approach. Approach S2-B is to define proper Datalog rules and to use a KG tool whose support for Datalog includes support for basic arithmetic functions sufficient for emulating bounding box geometric feature functions (such as the inclusion ratio). A further approach to S2, S2-C, would share S2-A’s Python-based rules and rule engine while allowing us to explore an additional dimension of NN-KG integration. This approach involves developing methods for transferring and representing the structure of the class hierarchy of our VRD-world ontology within supplementary layers (and their associated weight matrices) of our object detection NN. The objective is to enable the object detection NN to not just detect objects of base classes (e.g. jacket) but to perform the generalisations needed to convey all the parent classes for each object (e.g. clothing, wearable thing, etc..) as well. This way, our Python-based rule engine need not query the KG because any class membership information required to determine rule goal satisfaction will have been supplied by the NN. S1 and S2 combined: NN-KG-I strategy S1 should be good at identifying poor VR predictions (negative cases), while S2 should be good at identifying plausible VR predictions (positive cases). So, in theory, they are complementary and may work well together, delivering a combined boost to VR predictive performance. Some other NN-KG-I strategies: Another integration strategy involves using KG embed- dings to train a NN to score the plausibility of VR predictions. This scoring NN would then be used to help train the predicate prediction NN. A further strategy involves leveraging the training set VR annotations (KG data). Each ordered pair of object classes will have some number of annotated VR instances involving some subset of the 70 predicates. These can be transformed into discrete probability distributions over the predicates. The VR predictions generated by our predicate prediction NN during training can similarly be transformed into corresponding discrete probability distributions. The dissimilarity of corresponding pairs of predicted and annotated VR probability distributions can then be measured (using a metric identified for this purpose), and these measures can be aggregated to produce a penalty term with which to augment the loss function of the predicate prediction NN. Tactics for leveraging the ontology so as to intelligently redistribute probability in the target distributions can also be explored to better facilitate few-shot and zero-shot learning. Evaluation: The NN pipeline of our system architecture will be capable of delivering some measure of VR predictive performance on its own. This is the baseline performance measure against which all NN-KG-I strategies will be judged. The authors of [4] and [23] measure VR predictive performance using a recall@N metric that measures recall globally, across all images. In addition to this global recall@N metric, we plan to use a per-image measure of recall@N that we average over the images. Basic recall@N (whether global or per-image and averaged), however, takes account only of the number of hits in the top 𝑁 predictions. We have also designed a more sensitive metric that we call ‘Mean Avg Recall@K top-N’ that measures both the hit count and the positions of the hits within the top 𝑁 ranked predictions. As per [4] and [23], we plan to evaluate zero-shot VR predictive performance similarly to overall performance. The only difference is that when evaluating zero-shot performance, the annotated VRs participating in evaluation are limited to zero-shot VR instances — i.e. VR instances whose VR types are not represented within the training set VR annotations. We plan to evaluate few-shot VR predictive performance in the same way, but here the annotated VRs participating in evaluation will be limited to those for which the training set VR annotations contain only some small number of instances (1 to 5, say). A key principle of our evaluation strategy is to keep the baseline architecture of our hybrid system unchanged across investigations of different NN-KG-I strategies. This will enable us to attribute changes in VR predictive performance to a NN-KG-I strategy alone. It will also best enable us to compare and rank our strategies in terms of VR predictive performance efficacy. We use multiple metrics so our evaluation strategy cross-validates. If the multiple measures of performance corroborate one another, we can interpret the effect of a given NN-KG-I strategy with confidence. If not, this will signal the need for caution and investigation. 6. Preliminary Results We are still assembling the infrastructure to enable experimentation, so we discuss preliminary results in the sense of things accomplished. The original, crowd-sourced VR annotations of the VRD dataset are full of inconsistencies and errors. For example, object class ‘bear’ refers to real bears and teddy bears; class ‘plate’ refers to dishware plates, license plates (on vehicles) and baseball (home) plates; too many instances of VR pattern (person, wear, Y) have Y on a different person. Apart from making object detection and relationship prediction noisily problematic, the semantic variability of the object classes made precise ontology design infeasible. No class hierarchy felt credible, and few opportunities existed to define useful domain/range restrictions on object properties (VRD predicates). We therefore undertook a comprehensive VR analysis and customisation exercise to strengthen the semantic consistency of the VR annotations. In time, our VRD-world ontology, our customised VR annotations, our protocol for specifying VR customisations textually, and our code for applying them in an automated, repeatable fashion, will be made publicly available as a contribution to the AI and SW communities. Object detection training and experimentation is underway. Our predicate prediction NN has been designed. Our evaluation metrics have been implemented and proof-of-concept testing confirms they behave as expected. Proof-of-concept exercises confirming the feasibility of NN-KG-I strategies S1 and S2 (described above) have been completed successfully. A customised binary cross-entropy loss function has been conceived for training our multi-label predicate prediction NN that provides parameters for influencing the loss attracted by predicted VRs that have no matching annotated VR. Many of these predicted VRs will be entirely plausible and yet will be treated as false positives simply due to the unavoidable sparsity and arbitrariness of the annotated VRs. We aim to explore the effect of influencing the magnitude of the loss attracted by such plausible false positive cases, based on judgements derived from our KG. 7. Reflection and Future Work Reflection: First, our research is about combining the use of KGs with deep learning in hybrid, neural-symbolic systems. The application task of VR detection within images is simply a context for exploring combination/integration strategies. We believe our NN-KG-I strategies to be generic and widely applicable. However, we plan to continue looking for other dataset/ontology pairs with which to apply our strategies so as to further demonstrate their generality. Second, as we have described, we chose to heavily customise the original, crowd-sourced VR annotations of the VRD images in order to enable the design of a precise ontology (and to correct egregious errors). A consequence of this choice is that we sacrifice the ability to directly compare the predictive performance results of our various hybrid VR detection systems with those of the systems of previous researchers (such as [4] and [23]). However, this sacrifice is justified by the fact that the purpose of our research is not to build a better VR detection system on the VRD dataset than others, it is to explore generic ways of combining KGs with deep learning that deliver performance superior to what deep learning can deliver alone. Third, NN-KG-I strategies such as S1 and S2 that rely on real-time interaction with a KG are likely to increase NN training times considerably, particularly if the KG is out-of-process (even online) and accessed via a SPARQL endpoint. We do not, however, believe this consideration to be a major concern. In our case, the small VRD dataset (4,000 training images) means no issues should arise. More generally, we surmise that the growth rate of NN training times relative to dataset size, 𝑛, will be linear (𝑂(𝑛) time) and that, on this basis, the computational complexity implications of real-time KG access should always be manageable. Further, tactics such as caching may well be exploitable to help keep KG access to a minimum. Future work: Our research can readily extend in multiple directions. One direction is to pursue the goal of contributing to the development of a theory to help formalise the foundations of neural-symbolic AI, as advocated by van Harmelen in [25]. One such contribution involves positioning our NN-KG-I strategies within the schemes for categorising approaches to (and compositional patterns for) neural-symbolic integration proposed by others (e.g. [13, 26, 27, 7]). Where we find they do not fit comfortably, we might propose scheme/pattern refinements. We expect this task to be both challenging and rewarding given that several alternate cate- gorisation/pattern schemes have been proposed and given that some of our multiple different strategies may well sit best at different positions within these different schemes. Another contribution involves taking our analyses of the interactions between deep statistical learning and symbolic knowledge representation and reasoning exposed by our NN-KG-I strategies to a deeper, more theoretical level. Another direction in which our research leads involves enhancing the interpretability of hybrid, neural-symbolic system behaviour by, for example, investigating methods for generating explanations of predictions for system users. Yet another direction involves exploring NN-KG-I strategies for the express purpose of extracting new knowledge from data to add to KGs (aka KG completion). Acknowledgments Thank you to my supervisors Dr. Ernesto Jiménez-Ruiz and Dr. Tillman Weyde for their guidance and support. References [1] L. G. Valiant, Three Problems in Computer Science, J. ACM 50 (2003) 96–99. [2] P. Hitzler, A Review of the Semantic Web Field, Commun. ACM 64 (2021) 76–83. URL: https://doi.org/10.1145/3397512. [3] P. Hitzler, F. Bianchi, M. Ebrahimi, M. K. Sarker, Neural-Symbolic Integration and the Semantic Web, Semantic Web 11 (2020) 3–11. [4] C. Lu, R. Krishna, M. Bernstein, L. Fei-Fei, Visual Relationship Detection with Language Priors, in: European Conference on Computer Vision, 2016, pp. 852–869. URL: https: //cs.stanford.edu/people/ranjaykrishna/vrd/. [5] G. Marcus, Deep Learning: A Critical Appraisal, CoRR (2018). URL: http://arxiv.org/abs/ 1801.00631. [6] F. Chollet, Deep Learning: Current Limits and What Lies Beyond Them, Presentation at RAAIS, 2018. URL: https://raais.co/speakers-2018. [7] H. Kautz, The Third AI Summer, AAAI Robert S. Engelmore Memorial Lecture, Thirty- fourth AAAI Conference on Artificial Intelligence, New York, NY, 2020. URL: https://www. cs.rochester.edu/u/kautz/talks/, presentation slides and video. [8] G. Marcus, The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence, CoRR (2020). URL: https://arxiv.org/abs/2002.06177. [9] T. R. Besold, A. S. d’Avila Garcez, et al., Neural-Symbolic Learning and Reasoning: A Survey and Interpretation, CoRR (2017). URL: https://arxiv.org/abs/1711.03902. [10] A. d’Avila Garcez, M. Gori, et al., Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning, FLAP 6 (2019) 611–632. URL: https://collegepublications.co.uk/ifcolog/?00033. [11] P. Hitzler, A. Eberhart, M. Ebrahimi, M. K. Sarker, L. Zhou, Neuro-Symbolic Approaches in Artificial Intelligence, National Science Review (2022). [12] M. K. Sarker, L. Zhou, A. Eberhart, P. Hitzler, Neuro-Symbolic Artificial Intelligence: Current Trends, CoRR (2021). URL: https://arxiv.org/abs/2105.05330. [13] S. Badreddine, A. d’Avila Garcez, L. Serafini, M. Spranger, Logic Tensor Networks, Artificial Intelligence 303 (2022) 103649. [14] L. Serafini, A. S. d’Avila Garcez, Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge, in: Proceedings of NeSy’16, 2016. [15] E. B. Myklebust, E. Jiménez-Ruiz, J. Chen, R. Wolf, K. E. Tollefsen, Prediction of Adverse Biological Effects of Chemicals using Knowledge Graph Embeddings, Semantic Web 13 (2022) 299–338. URL: https://doi.org/10.3233/SW-222804. [16] F. Bianchi, P. Hitzler, On the Capabilities of Logic Tensor Networks for Deductive Reason- ing, in: Proceedings of the AAAI-MAKE, volume 2350, 2019. [17] M. Ebrahimi, A. Eberhart, F. Bianchi, P. Hitzler, Towards Bridging the Neuro-Symbolic Gap: Deep Deductive Reasoners, Appl. Intell. 51 (2021) 6326–6348. [18] M. Ebrahimi, M. K. Sarker, F. Bianchi, et al., Neuro-Symbolic Deductive Reasoning for Cross-Knowledge Graph Entailment, in: Proceedings of the AAAI-MAKE, volume 2846, 2021. URL: http://ceur-ws.org/Vol-2846/paper8.pdf. [19] Z. Chen, J. Chen, et al., Zero-Shot Visual Question Answering using Knowledge Graph, CoRR (2021). URL: https://arxiv.org/abs/2107.05348. [20] Y. Geng, J. Chen, Z. Ye, Z. Yuan, W. Zhang, H. Chen, Explainable Zero-shot Learning via Attentive Graph Convolutional Network and Knowledge Graphs, Semantic Web 12 (2021) 741–765. URL: https://doi.org/10.3233/SW-210435. [21] J. Chen, Y. Geng, et al., Low-Resource Learning with Knowledge Graphs: A Comprehensive Survey, CoRR (2021). URL: https://arxiv.org/abs/2112.10006. [22] S. Wang, A. O. Pisco, A. McGeever, et al., Leveraging the Cell Ontology to Classify Unseen Cell Types, Nature Communications 12 (2021). [23] I. Donadello, L. Serafini, Compensating Supervision Incompleteness with Prior Knowledge in Semantic Image Interpretation, in: IJCNN, IEEE, 2019, pp. 1–8. [24] A. Daniele, L. Serafini, Knowledge Enhanced Neural Networks, in: PRICAI 2019: Trends in Artificial Intelligence, volume 11670, 2019, pp. 542–554. [25] F. van Harmelen, Preface. The 3rd AI Wave is Coming, and it Needs a Theory, in: Neuro-Symbolic Artificial Intelligence: The State of the Art, IOS Press, 2021. [26] M. van Bekkum, M. de Boer, F. van Harmelen, A. Meyer-Vitali, A. ten Teije, Modular Design Patterns for Hybrid Learning and Reasoning Systems, Appl. Intell. 51 (2021). [27] F. van Harmelen, A. ten Teije, A Boxology of Design Patterns for Hybrid Learning and Reasoning Systems, J. Web Eng. 18 (2019) 97–124.