<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>What makes Models Compositional? A Neuro-Symbolic Theoretical View (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Parikshit Ram</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Klinger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander G. Gray</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centaur AI Institute</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Purdue University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compositional structure of the models plays in these failures and how this structure relates to their expressivity and sample complexity. We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We then show how various existing general and special purpose sequence processing models (such as recurrent, convolution and attention-based ones) fit this definition and use it to analyze their compositional complexity.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>0:1 1:1</p>
      <p>(¯) (center left) in Example 1 and  () (center right) and  (¯) (right)
in Example 2. Nodes are labeled : (level , index ). Sources are Fuchsia, sinks Sepia, and internal nodes Blue.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Defining and Quantifying Compositionality</title>
      <p>for binary classification, or ℐ for next token prediction.</p>
      <p>We define compositional functions  :  →  with the domain  of input sequences  = {1, . . . , }
of atoms or tokens  ∈ ℐ from an input dictionary ℐ. The range  of  can be R for regression, {0, 1}
Definition 1.</p>
      <p>To define  , we need the following components:
• Token encoder  : ℐ ×</p>
      <p>N → ℋ (latent space), with  = (, ) ∈ ℋ encoding the th token in  ∈  .
• A computation directed acyclic graph (DAG) or cDAG  :  → , where  is the space of DAGs, and
() defines the hierarchical processing of a sequence
. () can also be viewed as the trace of
program used by function  to process . We will describe this in further detail soon.
• Span processor  : ℋ
• Read-out function ℎ : ℋ</p>
      <p>→ ℋ maps  terms in the latent space into a new term in the latent space.</p>
      <p>→  maps the final set of terms in the latent space to the output space .</p>
      <p>With ⊗ () denoting the recursive operation of  over (), we define a compositional function as:
 () = ℎ ︁( ⊗ ()((1, 1), . . . , (, )) .</p>
      <p>︁)
(1)</p>
      <p>A computation DAG or cDAG () ≜ { (), ()} for a specific input sequence  ∈  can
depend on  or be pre-specified. This cDAG is a leveled DAG with set of nodes  () and edges ().
Each node  ≜ ( : ) ∈  () has a level  and index . The recursive application of  over ()
induces a value : ∈ ℋ for each internal node  ∈  (). The sources is  () have level 0, and
there is one source for each  ∈ ,  ∈ JK ≜ {1, . . . , } with index  and value 0: = (, ) ∈ ℋ
There are  sinks in  (), and at most  incoming edges and  outgoing edges at any node. For an
.
internal node  ∈  () with  parents  (), the value : = (1:1 , . . . , : ) ∈ ℋ where : is
the value of the th parent in  (). One way to interpret this cDAG is as the trace of “forward-pass”
for inference.
element in ℋ</p>
      <p>
        We consider the explicit cDAG because it allows us to see how the diferent elements ,  ∈ JK of
the input sequence  are hierarchically composed to obtain the output. This will allow us to study
the complexity of any compositional function. A “simple” cDAG, where all source nodes just connect
to a single sink node, would be “applicable” to all functions, but it does not allow us to study it in an
interesting manner. When we study the compositional functions induced by general purpose models
(such as recurrent, convolutional or transformer models), we will see that some models have explicit
cDAGs with more structure, while others have less structured explicit cDAGs, but there are implicit
structures induced in the cDAG; whenever possible, we will explicitly state this implicit structure and
study its properties. From a neuro-symbolic perspective [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ], this explicit cDAG can be seen as the
symbolic part, while the , , ℎ are the neural; note that, in some models, this symbolic cDAG might be
created with neural elements, while in others, the cDAG might be obtained with a symbolic grammar.
This neuro-symbolic view ofers a novel theoretical understanding of compositionality.
      </p>
      <p>The span processor  : ℋ</p>
      <p>→ ℋ takes as input  elements from the latent space ℋ and outputs an
. While the definition implies that the same  needs to be operated recursively over the
cDAG (), there is no restriction on the inputs and output of  regarding the information encoded
in the latent space. For example, if the level  of any node : is encoded into its value :, then the 
will behave diferently across levels ( level-dependent); if the index  of the node : is encoded into its
value, then  will be sensitive to the positional information (order-dependent); if the value of a node
includes the type of the node (for example, a non-terminal in a grammar), then  can be type-dependent.
Our definition states that the arity of the span processor  : ℋ → ℋ is . We do so for the ease of
exposition, though our definition can incorporate more flexible span processors (see Ram et al. [10,
Appendix A.2]).</p>
      <p>The read-out function ℎ : ℋ →  finally maps  elements in the latent space to the output space
. This separation between  and ℎ was necessary in our proposed definition because we require  to
be operable recursively, and thus  can operate in a latent space ℋ distinct from . In some applications,
ℋ ⊇  , in which case, ℎ can be an identity function. There are couple of aspects of this read-out
function we wish to discuss explicitly – (i) We assume that ℎ is specifically non-compositional and
processes its input without breaking it up into any sub-problems; we explicitly define the compositional
function  separating out , , ℎ, where  (neural) and  (symbolic) represent the compositional part.
(ii) We require ℎ to have a fixed-arity of  since  and  are aggregating the information over the
input.</p>
      <p>In the following, we will illustrate Definition 1 with a couple of examples:
Example 1. Figure 1 (left) shows the cDAG () for a compositional  on  = [1, . . . , 5],
with  () = ℎ ( ( (1, 2) ,  ((3, 4), 5))),  = 2 in-degree,  = 1 out-degree,  = 1
sink,  = (, ) ∈ ℋ, span-processor  : ℋ2 → ℋ, and read-out function ℎ : ℋ → . The
values 0: =  for sources 0:,  ∈ {1, . . . , 5}, and the internal node values are: 1:1 ← (1, 2),
1:2 ← (3, 4), 2:1 ← (1:2, 5), 3:1 ← (1:1, 2:1). ℎ operates on 3:1 at sink 3:1. Figure 1
(center left) shows the cDAG (¯) of the same  on ¯ ̸=  with the same  = 2,  = 1,  = 1
and  (′) = ℎ (((1, (2, 3)), (4, 5))). Note that () is not the same as (¯).
Example 2. Figure 1 (center right) shows the cDAG D() for a compositional f on  = [1, . . . , 7],
with f() = h (4:1, 3:1),  = 3 maximum in-degree,  = 3 maximum out-degree,  = 2
sinks,  = (, ) ∈ ℋ, span processor g : ℋ3 → ℋ, and read-out function h : ℋ2 →
. The source values 0: =  for each  ∈ {1, . . . , 7}, and the internal node values are:
1:1 ← g(1, 2, 3), 1:2 ← g(2, 3, 4), 1:3 ← g(3, 5, 7), 1:4 ← g(4, 5, 6), 1:5 ←
g(5, 6, 7), 2:1 ← g(1:1, 1:2, 1:3), 2:2 ← g(1:1, 1:3, 1:4), 2:3 ← g(1:2, 1:4, 1:5),
2:4 ← g(1:3, 1:4, 1:5), 3:1 ← g(2:1, 2:2, 2:3), 3:2 ← g(2:2, 2:3, 2:4), 4:1 ←
g(3:2, 2:3, 2:4). h operates on 3:1 and 4:1 at sinks 3:1 and 4:1. Figure 1 (right) shows the
cDAG D(¯) of the same f on ¯ ̸=  with the same  = 3,  = 3,  = 2.</p>
      <p>While Example 1 is a simple compositional function on a sequence, Example 2 is a more sophisticated
one. This is to highlight that our proposed Definition 1 can handle functions which require more
complex interactions between the tokens in a sequence. Example 1 has a cDAG with a maximum
out-degree  = 1, implying a single path from any source to a sink. Example 2 has a cDAG with a
maximum out-degree  = 3 across all levels in the DAG, implying that there can be a large number of
paths to any sink from a source. This allows the definition to include functions where certain tokens in
the sequence are of much higher importance to the output than others. These examples also highlight
that edges in the cDAG are allowed to skip levels, and the sinks can be from diferent levels, further
highlighting the compositional flexibility.</p>
      <p>We like to remark on a couple of points here: (i) Through these examples, we show that our definition
explicitly considers how the problem of sequence processing is broken up into sub-problems – the
cDAG embodies how disjoint or intertwined these “sub-problems” are by explicitly considering the
computation hierarchy. (ii) For input sequences , ¯ from the same problem domain, and the same
compositional function  , we allow the cDAG to be diferent – cDAG () can be input-dependent –
thereby allowing diferent input sequences to have diferent sub-problem hierarchies. At a non-technical
level, we also believe that our proposed Definition 1 connects intuitively to existing definitions:
The meaning of the whole is a function of the meanings of the parts
⏟ : →⏞ ⏟ℎ:ℋ→⏞ ⏟</p>
      <p>:ℋ→⏞ ℋ
and of the way they are syntactically combined.</p>
      <p>⏟ : →⏞
Both Examples 1 and 2 can be seen as compositional functions, but Example 2 is clearly a more complex
composition. In addition to its intuitive nature, our proposed definition allows us to understand how
complex the compositionality is beyond just stating if a function is compositional. The compositional
complexity of  depends on the functions , ℎ,  as well as the cDAG function  that drives the
computation. For a sequence  of length , () has  source nodes, maximum in-degree of 
(controlling the span size for ),  sink nodes (controlling the capacity of ℎ), maximum out-degree of 
(quantifying the “localism” of the efect of a node). However, these do not explicitly incorporate the fact
that changes to nodes at lower levels of the cDAG can have a larger efect on the output than changes
to nodes at higher levels of the cDAG. We propose a new quantification – the locus of influence (LoI):
Definition 2 (LoI of a source node). Consider a function  with components , , , ℎ (Definition 1). Let
(1 , . . . ,  , . . . ,  ) ∈ ℋ be any input to the span processor , with  = (1 , . . . ,  , . . . ,  )
its output. Let  ∈ ℋ be a “perturbation” to the th argument to ,  ∈ JK, resulting in the perturbed
output () = (1 , . . . ,  + , . . . ,  ). Let  &gt; 0 be a constant such that ∀ ∈ JK, ∀ ∈ ℋ,
⃦  ⃦
⃦⃦  − ()⃦⃦ ≤ ‖‖. For a sequence  ∈  of length , and a source node 0: in (), let  () be
the set of all unique paths from 0: to any of the sink nodes in (). The absolute LoI of index  is   =
∑︀ ∈ () | |, with | | as the length of a path  ∈  (), and the relative LoI is   =  / ∑︀∈JK   .</p>
      <p>This definition of the complexity of composition incorporates both the complexity of the cDAG ()
and the complexity of the span processor  : ℋ → ℋ in terms of its smoothness, with higher values
of  indicating more complex (less smooth) . The absolute LoI   incorporates the efect of longer
paths, with the efect growing with path length, and corresponds to the sensitivity of the compositional
function output to any one input token in the sequence.</p>
      <p>The smaller the absolute LoI   of any input index , more local its efect, and thus more structure that
can be transferred between examples if  is replaced with something else. A relative LoI   greater than
1/ denotes that the input index  (and thus input token ) has an out-sized efect on () (and thus
the computation) compared to the other indices (tokens). In Example 1 (left),  1 = 2,  1 = 1/2+3 &lt; 1/5
while  3 = 3,  3 = /2+3 &gt; 1/5, implying that 3 has more influence (absolute and relative) function
than 1 (assuming  &gt; 1). In Example 2 (left),  1 = 4 + 23,  1 = +2/27+39 ≈ 1/22 &lt; 1/7, while
 5 = 74 + 93,  5 = 7+9/27+39 ≈ 1/4 &gt; 1/7, hence 5 has a significantly larger influence than 1. We
utilize the LoI to define the complexity of a compositional function, and a class of such compositional
functions:
Definition 3. A function  :  →  with components , ℎ, ,  is (, , ,  ,  )-compositional if, for
any  ∈  of length , the cDAG () has a in-degree of , maximum outgoing degree of , and
 sink nodes, and for ∀ ∈ JK,   ≤  , and   ≤  ∈ [1/, 1). We denote with ℱ a class of such
(, , ,  ,  )-compositional functions.</p>
      <p>A small  and a  close to 1/ signifies a function that possesses a high level of localism across all
input sequences and tokens in its domain. While this function has the most structure, it might not
be suitable for practical purposes. A high  and a  close to 1/ signifies a very complex function
where there is a lot of interaction between all the input tokens in all input sequences, making it hard to
exploit any compositional structure in the function. A high  and a  significantly higher than 1/
indicates an interesting class of functions where, some input tokens can have a high influence over
the function computation, but, for most tokens, there is a compositional structure in the function that
can be exploited. This intuitively seems to be an interesting and more practical class of compositional
functions since assuming all tokens have an equal level of relative influence seems quite restrictive.
for the ease of exposition. †: Convolve+Pool induces input-dependent cDAGs for max/min-pool, not for
avg/sumpool. ‡: The number of sinks  needs to be specified for Convolve+pool, and the model can handle arbitrary</p>
      <p>In Fig. 2, we re-express existing sequence processing models as per our definition, teasing out the
symbolic cDAG (and the neural , ℎ), and we present their (simplified) compositional complexity in
 for the ease of exposition. This highlights the flexibility and utility of our proposed quantification of
compositionality (see Ram et al. [10, Section 4] for more details and models). Beyond the properties
of the cDAG (the in-degree , out-degree  and number of sink nodes ) and the upper bounds on
the absolute LoI  and relative LoI  , we also highlight two properties: (i) Whether the model utilizes
(implicitly or explicitly) an input-dependent cDAG (that is, ( ) is not the same DAG for all  of length
), and (ii) Whether the same model is able to operate on arbitrary length input sequences. The use of
input-dependent cDAG has implications in terms of the expressivity of the model – it can be shown
that functions from a model class (with compositional complexites  ,  ) with a fixed input-agnostic
cDAG cannot approximate functions from a model class of matching compositional complexity (that
is, same compositional complexity  ,  ) that utilize input-dependent cDAGs. Ram et al. [10, Theorem
1] show that the approximation is lower and upper bounded by ( ) and ( / ) respectively. This
implies that a higher value of absolute compositional complexity  , and a smaller relative compositional
complexity  adversely afect the approximation guarantees. The absolute compositional complexity 
has been shown to be directly tied to the generalization gap for a learned compositional function, with
higher  implying worse systematic generalization guarantee [10, Theorem 2]. The ability to operate
on arbitrary length sequences is a prerequisite to the ability of a model to possess length generalization
or productivity – the ability to generalize to sequences of larger lengths than those seen during training.
We will be pursuing length generalization in our future work.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion</title>
      <p>In this paper, we briefly present our novel definition of compositional functions that explicitly separates
out the neural and symbolic aspects of a model for the ease of analysis. We also present a notion of
compositional complexity that quantifies the complexity with which the tokens in an input sequence
are put together to get to the output. We briefly highlight the generality and utility of this definition by
demonstrating how existing sequence processing models fit into this definition.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Westerståhl</surname>
          </string-name>
          ,
          <string-name>
            <surname>Compositionality</surname>
            <given-names>I</given-names>
          </string-name>
          :
          <article-title>Definitions and variants</article-title>
          ,
          <source>Philosophy Compass</source>
          <volume>5</volume>
          (
          <year>2010</year>
          )
          <fpage>250</fpage>
          -
          <lpage>264</lpage>
          . URL: https://compass.onlinelibrary.wiley.com/doi/abs/10.1111/j.1747-
          <fpage>9991</fpage>
          .
          <year>2009</year>
          .
          <volume>00228</volume>
          .x.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baroni</surname>
          </string-name>
          ,
          <article-title>Generalization without systematicity: On the compositional skills of sequenceto-sequence recurrent networks</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2873</fpage>
          -
          <lpage>2882</lpage>
          . URL: https://proceedings.mlr.press/v80/lake18a.html.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kim</surname>
          </string-name>
          , T. Linzen,
          <article-title>COGS: A compositional generalization challenge based on semantic interpretation</article-title>
          ,
          <source>in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>9087</fpage>
          -
          <lpage>9105</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>731</volume>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hupkes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dankers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mul</surname>
          </string-name>
          , E. Bruni,
          <article-title>Compositionality decomposed: how do neural networks generalise?</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>67</volume>
          (
          <year>2020</year>
          )
          <fpage>757</fpage>
          -
          <lpage>795</lpage>
          . URL: https://jair.org/ index.php/jair/article/view/11674.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Klinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Adjodah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Marois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Joseph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riemer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Pentland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Campbell</surname>
          </string-name>
          ,
          <article-title>A study of compositional generalization in neural models</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>09437</volume>
          (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2006</year>
          .09437.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hestness</surname>
          </string-name>
          ,
          <article-title>Compositional generalization for primitive substitutions</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4284</fpage>
          -
          <lpage>4293</lpage>
          . URL: https://aclanthology.org/D19-1438/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-G.</given-names>
            <surname>Lou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Compositional generalization by learning analytical expressions</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          ). URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/ 83adc9225e4deb67d7ce42d58fe5157c-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Solar-Lezama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <article-title>Learning compositional rules via neural program synthesis</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>10832</fpage>
          -
          <lpage>10842</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2020/file/ 7a685d9edd95508471a9d3d6fcace432-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Klinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <article-title>How compositional is a model?</article-title>
          ,
          <source>in: International Joint Conference on Artificial Intelligence 2023 Workshop on Knowledge-Based Compositional Generalization</source>
          ,
          <year>2023</year>
          . URL: https://openreview.net/forum?id=
          <fpage>OImyRhNLv3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Klinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <article-title>What makes Models Compositional? A Theoretical View: With Supplement</article-title>
          ,
          <source>arXiv preprint arXiv:2405.02350</source>
          (
          <year>2024</year>
          ). URL: https://arxiv.org/abs/2405.02350.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Lstm can solve hard long time lag problems</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>9</volume>
          (
          <year>1996</year>
          ). URL: https://proceedings.neurips.cc/paper/1996/file/ a4d2f0d23dcc84ce983f9157f8b7f88-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ). URL: https://proceedings.neurips.cc/paper_files/paper/2017/file/ 3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. K. Sarker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Eberhart</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Hitzler</surname>
          </string-name>
          ,
          <article-title>Neuro-symbolic artificial intelligence</article-title>
          ,
          <source>AI</source>
          Communications
          <volume>34</volume>
          (
          <year>2021</year>
          )
          <fpage>197</fpage>
          -
          <lpage>209</lpage>
          . URL: https://arxiv.org/pdf/2105.05330.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. d.</given-names>
            <surname>Garcez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Lamb</surname>
          </string-name>
          , L. de Penning,
          <string-name>
            <given-names>B.</given-names>
            <surname>Illuminoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Poon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. G.</given-names>
            <surname>Zaverucha</surname>
          </string-name>
          ,
          <article-title>Neural-symbolic learning and reasoning: A survey and interpretation</article-title>
          ,
          <source>Neuro-Symbolic Artificial Intelligence: The State of the Art</source>
          <volume>342</volume>
          (
          <year>2022</year>
          )
          <article-title>327</article-title>
          . URL: https://arxiv.org/pdf/1711.03902.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>