<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AI?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Johanna Ott</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arthur Ledaguenel</string-name>
          <email>arthur.ledaguenel@irt-systemx.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Céline Hudelot</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mattis Hartwig</string-name>
          <email>mattis.hartwig@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>German Research Centre for Artificial Intelligence (DFKI)</institution>
          ,
          <addr-line>Lübeck</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IRT SystemX</institution>
          ,
          <addr-line>Paris-Saclay</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>MICS, CentraleSupélec, Université Paris-Saclay</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>singularIT GmbH</institution>
          ,
          <addr-line>Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Neurosymbolic artificial intelligence is a growing field of research aiming at combining neural networks with symbolic systems, including their respective learning and reasoning capabilities. This hybridization can take many shapes which adds to the fragmentation of the field and makes it dificult to compare the existing approaches. If some eforts have been made in the community to define archetypical means of hybridization, many elements are still missing to establish principled comparisons. Amongst those missing elements are formal and broadly accepted definitions of neurosymbolic tasks and their corresponding benchmarks. In this paper, we start from the specific task of multi-label classification with the integration of propositional background knowledge to illustrate how such a benchmarking framework could look like. Based on the benchmarking of one granular task we zoom out and discuss important elements and characteristics of building a full benchmarking suite for more than just one task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Neurosymbolic artificial intelligence (AI) is a trending research topic [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In general,
neurosymbolic AI focuses on bringing together concepts from the logic-focused symbolic world and the
neural or connectionist’s world [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ].
      </p>
      <p>
        The potential of the field is based on the “best of both worlds” perspective, i.e., that by
combining neural and symbolic, the respective strengths are maintained while the weaknesses
are minimized. Thus, the objectives are extensive and include, amongst others, improved
performance [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6, 7, 8, 9</xref>
        ], explainability [
        <xref ref-type="bibr" rid="ref10 ref11 ref6">10, 6, 11, 12, 13, 14</xref>
        ] and generalization [15, 10, 11, 12,
      </p>
      <p>
        Contrasting its promise of generalization, the field of neurosymbolic AI exhibits a
progresshampering level of fragmentation, e.g. in the evaluation and the architectural landscape. There
have been several attempts to structure the architectural approaches in the neurosymbolic AI
ifeld [
        <xref ref-type="bibr" rid="ref2">17, 2, 18, 19, 20, 21</xref>
        ]. In this paper, we focus on the fragmented evaluation landscape, i.e.
the tasks, datasets and metrics used to evaluate neurosymbolic systems. Although not the focus
of this paper, we believe that further work on a clear, unified architectural taxonomy is needed
nEvelop-O
(M. Hartwig)
∗Corresponding author. These authors contributed equally.
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
and that the current ambiguity about the separation of the neural, symbolic, and neurosymbolic
worlds adds to the fragmented evaluation landscape.
      </p>
      <p>
        Previous researchers have highlighted the fragmentation problem and emphasized the need
for a more systematic approach to evaluating neurosymbolic AI [
        <xref ref-type="bibr" rid="ref1">22, 19, 1</xref>
        ]. Although eforts
have been made to tackle this issue [23, 24, 25, 26, 27, 28], they have primarily remained at a
narrow and specific level, i.e. they propose specific tasks and benchmarks, including datasets
and evaluation metrics. Only a few exceptions, such as the panel discussion “The future of
(neuro-symbolic) AI” at the IBM Neuro-Symbolic AI Workshop 2022 [29] and the presentation
by Madhyastha and subsequent open discussion at NeSy2022 conference[22], have addressed
the neurosymbolic benchmark fragmentation issue on a level beyond a specific benchmark.
      </p>
      <p>In this position paper, we seek to complement the prior work tackling the fragmented
neurosymbolic benchmark landscape by facing the challenge on a higher-level, focusing on the
question of how to think about benchmarking neurosymbolic AI. We give an example for the
setup of a specific benchmark on the task of multi-label classification with symbolic background
knowledge. We include the thought process of coming up with a formal definition of the task,
a suitable dataset, and a selection of metrics. Additionally, we discuss the implications for
adding further benchmarks using our proposed thought process and thus contribute to a more
principled benchmarking landscape for neurosymbolic AI.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Benchmarking neurosymbolic systems on a specific task</title>
      <p>A neurosymbolic benchmark can be designed to answer two main questions: “What performance
level can neurosymbolic systems reach on a given task?” and “How does hybridization of neural
and symbolic components help on a given task?”. The first question takes an outside view
focusing on observable behavior while the second question takes an inside view focusing on
the design of agents. The inside and the outside view are two well-known and deeply grounded
perspectives in AI research [30]. We agree with Russell that in general artificial intelligence
should be measured taking an outside view. However, answering the second question with the
inside view can give further insights on how to design AI agents by understanding how and
when to use neurosymbolic architectures. Additionally it might help directing the research
eforts of the neurosymbolic community because advancement in task performance can be
better linked to the architectural setup of the agent.</p>
      <p>Hence, in this section, we describe the challenges of benchmarking the task of multi-label
classification with symbolic background knowledge so that the two questions (internal and
external) can be answered. We cover the formal definition of the task, the underlying dataset
and the metrics. Although a task is not per se neurosymbolic, our chosen task covers elements
that are linked to a neural (image classification) and to a symbolic (background knowledge)
domain. This setup makes relatively straight-forward to use agents with a neurosymbolic
architecture, and is suitable for a neurosymbolic benchmark that answers both questions.</p>
      <sec id="sec-2-1">
        <title>2.1. Task formalism</title>
        <p>Setting a formal definition of the task is a necessary preliminary step to compare neurosymbolic
systems in a principled way. To be practical, the formalism also has to be comprehensive enough
to incorporate diverse datasets (in terms of modality and background knowledge structure) and
avoid a fragmentation of the field into multiple narrower tasks definitions.</p>
        <p>Multi-label classification with background knowledge is mapping inputs  ∈ ℝ  to
binary labels y ∈ {0, 1} such that these labels satisfy some background knowledge. This
background knowledge is expressed as a propositional formula  using symbols from the
signature  ∶= {  }1≤≤ and logical connectors {¬, ∧, ∨} with their standard semantics. For
lighter notations, we identify a label y ∈ {0, 1} with the propositional valuation mapping each
  to   . Therefore, we note y ⊧  if the corresponding valuation models  . A dataset for that
task is  ∶= (  , y )1≤≤ with   ∈ ℝ , y ∈ {0, 1} such that all labels in the dataset satisfy the
background knowledge, i.e. ∀1 ≤  ≤ , y ⊧  .</p>
        <p>This formalism encompasses standard classification tasks like independent binary
classification (where  = ⊤ since every combination is valid) and multi-category classification (where
 = ( ⋁1≤≤   ) ∧ (⋀1≤&lt;≤ (¬  ∨ ¬  )) enforces that one and only one atom is true at a time).</p>
        <p>Since we formally introduced our task we need to discuss the dataset and the metrics to
complete our benchmark.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Datasets</title>
        <p>Building an appropriate dataset for multi-label classification with background knowledge poses
a substantial challenge. It must contain large amounts of data amenable to neural processing and
whose labels present some significant structure expressible in the language of propositional logic.
Eforts to build such datasets were often led by researchers trying to measure the performance
of their neurosymbolic system, meaning that diferent systems are rarely evaluated on the same
datasets and that datasets are often custom built to fit the capacity of a given system.</p>
        <p>We observed three patterns in how datasets were created: symbolic datasets where a
symbolic reasoning task is turned into a learning task (e.g. finding the shortest path in a weighted
graph [31]), compositional datasets where instances are tuples of a base sub-symbolic
classification dataset constrained to respect a given structure (e.g. the MNIST SUDOKU dataset [26])
and hierarchical datasets where classes of a sub-symbolic classification dataset are chosen in a
hierarchy of concepts (e.g. classes in ImageNet [32] are chosen amongst synsets of the WordNet
hierarchy [33]).</p>
        <p>To turn this collection of datasets into an eficient benchmark for multi-label classification
with background knowledge, further aspects need to be considered. On a fundamental level,
we observe an inverse relation between the complexity of the sub-symbolic features and the
complexity of the symbolic structure of the dataset, which means that the zone of complex
sub-symbolic features and complex symbolic structure is not well covered by existing datasets.
[25] is a dataset of trafic videos (complex sub-symbolic features) where labels satisfy a rich set
of constraints (complex symbolic structure). It constitutes a first step to cover that void and
more eforts should be invested in that direction. On the practical side, we need to set up a
standard on how to represent, store and operate neurosymbolic datasets and their corresponding
background knowledge, to allow rapid testing of any system on any dataset.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Metrics</title>
        <p>To evaluate neurosymbolic systems inside our benchmark we use a combination of
performance metrics (the outside view) and control metrics (the inside view). Examples of standard
performance metrics are cross-entropy loss, individual accuracy, f1-score, collective accuracy,
top-k accuracy. Likewise, for standard control metrics we have number of trainable parameters,
number of hyper-parameters, number of FLOPS.</p>
        <p>Besides, new control or performance metrics specific to neurosymbolic tasks might be
beneficial. One example for such a performance metric is semantic consistency which tracks
how many predictions of a given system match the constraints expressed by the background
knowledge (see [34] or [25] for instance).</p>
        <p>To settle on a limited set of metrics for the multi-label classification with background
knowledge task (which also can be used for other tasks), we suggest to use collective accuracy and
semantic consistency as performance metrics and network size (number of trainable parameters)
as a control metric. The semantic consistency metric helps us understand how much the system
integrates background knowledge. Collective accuracy is a very demanding metric that is robust
to imbalanced datasets: we generally observe a strong correlation between collective accuracy
and f1-score for instance. Eventually, network size is a good first order approximation for model
capacity.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Broaden the focus on a collection of tasks</title>
      <p>To extend the thoughts on the specific task from the previous section to cover more of the
neurosymbolic AI field, a natural next step is to focus on transferring the approach on more
tasks. We draw confidence in the transferability of the proposed thought process from the
observation that benchmarks cited in the preceding sections have already incorporated some
of our suggestions (e.g. implementing control metrics). Furthermore, existing benchmarks
may benefit from our thought process to improve their comparability. For instance, in visual
reasoning, CLEVR [35] and CLEVRER [36] benchmarks do not provide a formal definition of
the task, which makes comparisons between systems and with other datasets hard to establish.
Moreover, both underlying datasets lack sub-symbolic complexity compared to classic computer
vision datasets: the community could greatly benefit from filling that void.</p>
      <p>Expanding the focus from a specific task to a collection of tasks, i.e., creating a benchmarking
suite, raises another critical question: Which tasks should be included? The diversity of tasks has
been identified as a key consideration for a benchmarking suite by the discussion panel in [ 29].
Potential tasks should have ranging dificulties for both the neural and the symbolic architecture.
Also similar to the Glue [37] or GlueCon [38] benchmarking suites, several diferent capabilities
and skills should be needed to solve the tasks.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This position paper contributes an example thought process for designing neurosymbolic AI
benchmarks. Of course a single position paper cannot fully solve all questions around building
a unified benchmarking system, but, in contrast to other papers in the field so far, we refrained
from marketing an individual dataset and focused more on the questions around the design
phase of a benchmark. We also discussed the implications for broadening the approach to
multiple tasks which will be a valuable starting point for future benchmarking discussions
and designs. Next steps could include to validate our approach on more tasks and add further
thoughts to the discussion around important characteristics of a more holistic benchmarking
suite.
revision for natural language inference, Transactions of the Association for Computational
Linguistics 10 (2022) 240–256.
[12] K. Zheng, K.-Q. Zhou, J. Gu, Y. Fan, J. Wang, Z. xiao Li, X. He, X. E. Wang, Jarvis: A
neuro-symbolic commonsense reasoning framework for conversational embodied agents,
ArXiv abs/2208.13266 (2022).
[13] Y. Liang, J. Tenenbaum, T. A. Le, S. N, Drawing out of distribution with neuro-symbolic
generative models, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
(Eds.), Advances in Neural Information Processing Systems, volume 35, Curran Associates,
Inc., 2022, pp. 15244–15254. URL: https://proceedings.neurips.cc/paper_files/paper/2022/
file/6248a3b8279a39b3668a8a7c0e29164d-Paper-Conference.pdf.
[14] B. Finzel, A. Saranti, A. Angerschmid, D. Tafler, B. Pfeifer, A. Holzinger, Generating
explanations for conceptual validation of graph neural networks: An investigation of
symbolic predicates learned on relevance-ranked sub-graphs, KI - Künstliche Intelligenz
36 (2022) 271–285. doi:1 0 . 1 0 0 7 / s 1 3 2 1 8 - 0 2 2 - 0 0 7 8 1 - 7 .
[15] M. B. Ganapini, M. Campbell, F. Fabiano, L. Horesh, J. Lenchner, A. Loreggia, N. Mattei,
F. Rossi, B. Srivastava, K. B. Venable, Combining fast and slow thinking for human-like
and eficient decisions in constrained environments, in: International Workshop on
Neural-Symbolic Learning and Reasoning, 2022.
[16] X. Chen, C. Liang, A. W. Yu, D. Song, D. Zhou, Compositional generalization via
neuralsymbolic stack machines, in: Advances in Neural Information Processing Systems, volume
2020-December, 2020.
[17] S. Bader, P. Hitzler, Dimensions of neural-symbolic integration - a structured survey, 2005.</p>
      <p>URL: https://arxiv.org/abs/cs/0511042. doi:1 0 . 4 8 5 5 0 / A R X I V . C S / 0 5 1 1 0 4 2 .
[18] H. A. Kautz, The third ai summer: Aaai robert s. engelmore memorial lecture, AI Mag. 43
(2022) 93–104.
[19] A. d’Avila Garcez, L. C. Lamb, Neurosymbolic ai: the 3rd wave, Artificial Intelligence</p>
      <p>Review (2023). doi:1 0 . 1 0 0 7 / s 1 0 4 6 2 - 0 2 3 - 1 0 4 4 8 - w .
[20] F. V. Harmelen, A. ten Teije, A boxology of design patterns for hybrid learning and
reasoning systems, Journal of Web Engineering 18 (2019) 97–124.
[21] L. d. Raedt, S. Dumančić, R. Manhaeve, G. Marra, From statistical relational to
neurosymbolic artificial intelligence, in: C. Bessiere (Ed.), Proceedings of the Twenty-Ninth
International Joint Conference on Artificial Intelligence, IJCAI-20, International Joint
Conferences on Artificial Intelligence Organization, 2020, pp. 4943–4950. URL: https:
//doi.org/10.24963/ijcai.2020/688. doi:1 0 . 2 4 9 6 3 / i j c a i . 2 0 2 0 / 6 8 8 , survey track.
[22] P. Madhyastha, Towards a benchmark suite for neural-symbolic approaches for learning
and reasoning, 2022. URL: https://ijclr22.doc.ic.ac.uk/program/index.html, 16th
International Workshop on Neural-Symbolic Learning and Reasoning.
[23] Ö. Yılmaz, A. S. d’Avila Garcez, D. L. Silver, A proposal for common dataset in
neuralsymbolic reasoning studies, in: NeSy@HLAI, 2016.
[24] J. Johnson, B. Hariharan, L. van der Maaten, L. Fei-Fei, C. L. Zitnick, R. B. Girshick, Clevr:
A diagnostic dataset for compositional language and elementary visual reasoning, 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016) 1988–1997.
[25] E. Giunchiglia, M. C. Stoian, S. Khan, F. Cuzzolin, T. Lukasiewicz, Road-r: the
autonomous driving dataset with logical requirements, Machine Learning (2023). doi:1 0 .
1 0 0 7 / s 1 0 9 9 4 - 0 2 3 - 0 6 3 2 2 - z .
[26] E. Augustine, C. Pryor, C. Dickens, J. Pujara, W. Y. Wang, L. Getoor, Visual sudoku puzzle
classification: A suite of collective neuro-symbolic tasks, in: International Workshop on
Neural-Symbolic Learning and Reasoning, 2022.
[27] A. D. Lindström, S. S. Abraham, Clevr-math: A dataset for compositional language, visual
and mathematical reasoning, volume 3212, 2022.
[28] C. Cornelio, V. Thost, Synthetic Datasets and Evaluation Tools for Inductive Neural
Reasoning, in: N. Katzouris, A. Artikis (Eds.), Inductive Logic Programming, Springer
International Publishing, Cham, 2022, pp. 57–77.
[29] F. Rossi, H. Kautz, G. Marcus, L. Lamb, L. Kaelbling, Closing, 2022. URL: https://video.ibm.</p>
      <p>com/recorded/131288165, iBM Neuro-Symbolic AI Workshops.
[30] S. J. Russell, Artificial intelligence a modern approach, Pearson Education, Inc., 2010.
[31] J. Xu, Z. Zhang, T. Friedman, Y. Liang, G. V. D. Broeck, A semantic loss function for deep
learning with symbolic knowledge, volume 12, International Machine Learning Society
(IMLS), 2018, pp. 8752–8760.
[32] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei, Imagenet large scale visual recognition
challenge, International Journal of Computer Vision 115 (2015) 211–252. doi:1 0 . 1 0 0 7 /
s 1 1 2 6 3 - 0 1 5 - 0 8 1 6 - y .
[33] G. A. Miller, Wordnet, Communications of the ACM 38 (1995) 39–41. URL: https://dl.acm.</p>
      <p>org/doi/10.1145/219717.219748. doi:1 0 . 1 1 4 5 / 2 1 9 7 1 7 . 2 1 9 7 4 8 .
[34] K. Ahmed, S. Teso, K.-W. Chang, G. Van den Broeck, A. Vergari, Semantic probabilistic
layers for neuro-symbolic learning, in: S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave,
K. Cho, A. Oh (Eds.), Advances in Neural Information Processing Systems, volume 35,
Curran Associates, Inc., 2022, pp. 29944–29959. URL: https://proceedings.neurips.cc/paper_
files/paper/2022/file/c182ec594f38926b7fcb827635b9a8f4-Paper-Conference.pdf.
[35] J. Johnson, B. Hariharan, L. Van Der Maaten, L. Fei-Fei, C. Lawrence Zitnick, R. Girshick,
Clevr: A diagnostic dataset for compositional language and elementary visual reasoning,
in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017,
pp. 2901–2910.
[36] K. Yi*, C. Gan*, Y. Li, P. Kohli, J. Wu, A. Torralba, J. B. Tenenbaum, Clevrer: Collision
events for video representation and reasoning, in: International Conference on Learning
Representations, 2020. URL: https://openreview.net/forum?id=HkxYzANYDB.
[37] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, S. Bowman, GLUE: A multi-task benchmark
and analysis platform for natural language understanding, in: Proceedings of the 2018
EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP,
Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 353–355. URL:
https://aclanthology.org/W18-5446. doi:1 0 . 1 8 6 5 3 / v 1 / W 1 8 - 5 4 4 6 .
[38] H. R. Faghihi, A. Nafar, C. Zheng, R. Mirzaee, Y. Zhang, A. Uszok, A. Wan, T. Premsri,
D. Roth, P. Kordjamshidi, Gluecons: A generic benchmark for learning under constraints,
ArXiv abs/2302.10914 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nayak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bozic</surname>
          </string-name>
          , L. Longo,
          <article-title>Is neuro-symbolic ai meeting its promise in natural language processing? a structured review</article-title>
          ,
          <source>ArXiv abs/2202</source>
          .12205 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <source>Neuro-symbolic artificial intelligence: Current trends</source>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2105.05330.
          <source>doi:1 0 . 4 8</source>
          <volume>5 5</volume>
          <fpage>0</fpage>
          <string-name>
            <surname>/ A R X I</surname>
          </string-name>
          <article-title>V . 2 1 0 5 . 0 5 3 3 0</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic approaches in artificial intelligence</article-title>
          ,
          <source>National Science Review</source>
          <volume>9</volume>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.1093/ nsr/nwac035.
          <source>doi:1 0 . 1 0 9</source>
          <article-title>3 / n s r / n w a c 0 3 5 . a r X i v : h t t p s : / / a c a d e m i c . o u p . c o m / n s r / a r t i c l e - p d f / 9 / 6 / n w a c 0 3 5 / 4 3 9 5 2 9 5 4 / n w a c 0 3 5 _ s u p p l e m e n t a l _ f i l e</article-title>
          . p d f ,
          <year>nwac035</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Susskind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Arden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. K.</given-names>
            <surname>John</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stockton</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. B. John,</surname>
          </string-name>
          <article-title>Neuro-symbolic ai: An emerging class of ai workloads and their characterization</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2109.06133.
          <source>doi:1 0 . 4 8</source>
          <volume>5 5</volume>
          <fpage>0</fpage>
          <string-name>
            <surname>/ A R X I</surname>
          </string-name>
          <article-title>V . 2 1 0 9 . 0 6 1 3 3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Besold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Garcez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. U.</given-names>
            <surname>Kühnberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M. H. V.</given-names>
            <surname>Lima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Penning</surname>
          </string-name>
          , G. Pinkas,
          <string-name>
            <given-names>H.</given-names>
            <surname>Poon</surname>
          </string-name>
          , G. Zaverucha,
          <article-title>Neural-Symbolic Learning and Reasoning: A Survey and Interpretation</article-title>
          , volume
          <volume>342</volume>
          ,
          <year>2022</year>
          .
          <source>doi:1 0 . 3 2 3 3 / F A I A 2 1</source>
          <volume>0 3 4 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gustafson</surname>
          </string-name>
          ,
          <article-title>Sdrl: Interpretable and data-eficient deep reinforcement learning leveraging symbolic planning</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>33</volume>
          (
          <year>2019</year>
          )
          <fpage>2970</fpage>
          -
          <lpage>2977</lpage>
          . URL: https://ojs.aaai.org/index.php/AAAI/ article/view/4153. doi:
          <article-title>1 0 . 1 6 0 9 / a a a i</article-title>
          .
          <source>v 3 3 i 0 1 . 3 3</source>
          <volume>0 1 2 9 7 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Demeter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          ,
          <article-title>Just add functions: A neural-symbolic language model</article-title>
          ,
          <source>in: AAAI 2020 - 34th AAAI Conference on Artificial Intelligence</source>
          ,
          <year>2020</year>
          .
          <article-title>doi: 1 0 . 1 6 0 9 / a a a i . v 3 4 i 0 5 . 6 2 6 4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gustafson</surname>
          </string-name>
          ,
          <article-title>Peorl: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making</article-title>
          ,
          <source>in: IJCAI International Joint Conference on Artificial Intelligence</source>
          , volume
          <volume>2018</volume>
          <source>-July</source>
          ,
          <year>2018</year>
          .
          <source>doi: 1 0 . 2 4 9 6 3 / i j c a i . 2 0</source>
          <volume>1 8 / 6 7</volume>
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gurajada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Neelam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <article-title>Lnn-el: A neurosymbolic approach to short-text entity linking</article-title>
          ,
          <source>in: Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kohli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu,</surname>
          </string-name>
          <article-title>The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2019</year>
          . URL: https://openreview.net/forum?id= rJgMlhRctm.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Greenspan</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic natural logic with introspective</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>