<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">What makes Models Compositional? A Neuro-Symbolic Theoretical View (Extended Abstract)</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Parikshit</forename><surname>Ram</surname></persName>
							<email>parikshit.ram@ibm.com</email>
							<affiliation key="aff0">
								<orgName type="institution">IBM Research</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Tim</forename><surname>Klinger</surname></persName>
							<email>tklinger@us.ibm.com</email>
							<affiliation key="aff0">
								<orgName type="institution">IBM Research</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alexander</forename><forename type="middle">G</forename><surname>Gray</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Centaur AI Institute</orgName>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Purdue University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">What makes Models Compositional? A Neuro-Symbolic Theoretical View (Extended Abstract)</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">B75E5E825C71F5E53FE252784E278B7A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Compositionality is thought to be a key component of language, and various compositional benchmarks have been developed to empirically probe the compositional generalization of existing sequence processing models. These benchmarks often highlight failures of existing models, but it is not clear why these models fail in this way. In this paper, we seek to theoretically understand the role the compositional structure of the models plays in these failures and how this structure relates to their expressivity and sample complexity. We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We then show how various existing general and special purpose sequence processing models (such as recurrent, convolution and attention-based ones) fit this definition and use it to analyze their compositional complexity.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Compositionality is assumed to be integral to language processing <ref type="bibr" target="#b0">[1]</ref>. Generalizing in a compositional manner or compositional generalization is of high interest when learning with sequences since it can enable a (learned) model to generalize well to a possibly infinite domain of sequences while learning from only a small number of examples. With this motivation, there has been interest in quantifying the compositional generalization (CG) of sequence or language models. This has led to various language modeling benchmarks <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>. These benchmarks empirically probe the compositionality of language models, often demonstrating the lack of CG. These highlight examples on which the models fail, but there is no precise understanding of why the failures occur. Nonetheless, various novel methods with improved compositional generalization (as measured by these benchmarks) have been developed <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref>, utilizing specialized models with compositional inductive biases. However, the area of CG is still lacks a mathematical definition and measure of compositionality.</p><p>Our contributions. Inspired by existing discussions, and recent solutions for CG benchmarks, We make the following contributions: 1  • We propose a general modular definition of "compositional functions" to facilitate concrete understanding of the expressiveness and generalization of such functions, and propose the notion of "compositional complexity" to quantify the complexity of such functions. Our definition teases out what we believe to be the "neural" (continuous functions) and the "symbolic" (discrete functions) parts of a model. • We demonstrate the flexibility of this definition by highlighting how various existing sequence processing models such as recurrent models <ref type="bibr" target="#b10">[11]</ref>, convolutional models and transformer based models <ref type="bibr" target="#b11">[12]</ref> fit this definition, and how complex their compositions are. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Defining and Quantifying Compositionality</head><p>We define compositional functions 𝑓 : 𝒳 → 𝒴 with the domain 𝒳 of input sequences 𝑋 = {𝑥 1 , . . . , 𝑥 𝐿 } of atoms or tokens 𝑥 𝑖 ∈ ℐ from an input dictionary ℐ. The range 𝒴 of 𝑓 can be R for regression, {0, 1} for binary classification, or ℐ for next token prediction.</p><p>Definition 1. To define 𝑓 , we need the following components:</p><p>• Token encoder 𝑒 : ℐ × N → ℋ (latent space), with 𝑒 𝑖 = 𝑒(𝑥 𝑖 , 𝑖) ∈ ℋ encoding the 𝑖 th token in 𝑋 ∈ 𝒳 .</p><p>• A computation directed acyclic graph (DAG) or cDAG 𝐷 : 𝒳 → 𝒟, where 𝒟 is the space of DAGs, and 𝐷(𝑋) defines the hierarchical processing of a sequence 𝑋. 𝐷(𝑋) can also be viewed as the trace of program used by function 𝑓 to process 𝑋. We will describe this in further detail soon. • Span processor 𝑔 : ℋ 𝑘 → ℋ maps 𝑘 terms in the latent space into a new term in the latent space.</p><p>• Read-out function ℎ : ℋ 𝑚 → 𝒴 maps the final set of terms in the latent space to the output space 𝒴. With 𝑔 ⊗𝐷(𝑋) denoting the recursive operation of 𝑔 over 𝐷(𝑋), we define a compositional function as:</p><formula xml:id="formula_0">𝑓 (𝑋) = ℎ (︁ 𝑔 ⊗𝐷(𝑋) (𝑒(𝑥 1 , 1), . . . , 𝑒(𝑥 𝐿 , 𝐿)) )︁ .<label>(1)</label></formula><p>A computation DAG or cDAG 𝐷(𝑋) ≜ {𝑁 (𝑋), 𝐸(𝑋)} for a specific input sequence 𝑋 ∈ 𝒳 can depend on 𝑋 or be pre-specified. This cDAG is a leveled DAG with set of nodes 𝑁 (𝑋) and edges 𝐸(𝑋). Each node 𝑛 ≜ (𝑙 : 𝑖) ∈ 𝑁 (𝑋) has a level 𝑙 and index 𝑖. The recursive application of 𝑔 over 𝐷(𝑋) induces a value 𝑣 𝑙:𝑖 ∈ ℋ for each internal node 𝑛 ∈ 𝑁 (𝑋). The sources is 𝑁 (𝑋) have level 0, and there is one source for each 𝑥 𝑖 ∈ 𝑋, 𝑖 ∈ 𝐿 ≜ {1, . . . , 𝐿} with index 𝑖 and value 𝑣 0:𝑖 = 𝑒(𝑥 𝑖 , 𝑖) ∈ ℋ. There are 𝑚 sinks in 𝑁 (𝑋), and at most 𝑘 incoming edges and 𝑞 outgoing edges at any node. For an internal node 𝑛 ∈ 𝑁 (𝑋) with 𝑘 parents 𝑃 (𝑛), the value 𝑣 𝑙:𝑖 = 𝑔(𝑣 𝑙 1 :𝑖 1 , . . . , 𝑣 𝑙 𝑘 :𝑖 𝑘 ) ∈ ℋ where 𝑣 𝑙 𝑗 :𝑖 𝑗 is the value of the 𝑗 th parent in 𝑃 (𝑛). One way to interpret this cDAG is as the trace of "forward-pass" for inference.</p><p>We consider the explicit cDAG because it allows us to see how the different elements 𝑥 𝑖 , 𝑖 ∈ 𝐿 of the input sequence 𝑋 are hierarchically composed to obtain the output. This will allow us to study the complexity of any compositional function. A "simple" cDAG, where all source nodes just connect to a single sink node, would be "applicable" to all functions, but it does not allow us to study it in an interesting manner. When we study the compositional functions induced by general purpose models (such as recurrent, convolutional or transformer models), we will see that some models have explicit cDAGs with more structure, while others have less structured explicit cDAGs, but there are implicit structures induced in the cDAG; whenever possible, we will explicitly state this implicit structure and study its properties. From a neuro-symbolic perspective <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b13">14]</ref>, this explicit cDAG can be seen as the symbolic part, while the 𝑒, 𝑔, ℎ are the neural; note that, in some models, this symbolic cDAG might be created with neural elements, while in others, the cDAG might be obtained with a symbolic grammar. This neuro-symbolic view offers a novel theoretical understanding of compositionality.</p><p>The span processor 𝑔 : ℋ 𝑘 → ℋ takes as input 𝑘 elements from the latent space ℋ and outputs an element in ℋ. While the definition implies that the same 𝑔 needs to be operated recursively over the cDAG 𝐷(𝑋), there is no restriction on the inputs and output of 𝑔 regarding the information encoded in the latent space. For example, if the level 𝑙 of any node 𝑙:𝑖 is encoded into its value 𝑣 𝑙:𝑖 , then the 𝑔 will behave differently across levels (level-dependent); if the index 𝑖 of the node 𝑙:𝑖 is encoded into its value, then 𝑔 will be sensitive to the positional information (order-dependent); if the value of a node includes the type of the node (for example, a non-terminal in a grammar), then 𝑔 can be type-dependent. Our definition states that the arity of the span processor 𝑔 : ℋ 𝑘 → ℋ is 𝑘. We do so for the ease of exposition, though our definition can incorporate more flexible span processors (see Ram et al. <ref type="bibr" target="#b9">[10,</ref><ref type="bibr">Appendix A.2]</ref>).</p><p>The read-out function ℎ : ℋ 𝑚 → 𝒴 finally maps 𝑚 elements in the latent space to the output space 𝒴. This separation between 𝑔 and ℎ was necessary in our proposed definition because we require 𝑔 to be operable recursively, and thus 𝑔 can operate in a latent space ℋ distinct from 𝒴. In some applications, ℋ ⊇ 𝒴, in which case, ℎ can be an identity function. There are couple of aspects of this read-out function we wish to discuss explicitly -(i) We assume that ℎ is specifically non-compositional and processes its input without breaking it up into any sub-problems; we explicitly define the compositional function 𝑓 separating out 𝑔, 𝐷, ℎ, where 𝑔 (neural) and 𝐷 (symbolic) represent the compositional part. (ii) We require ℎ to have a fixed-arity of 𝑚 since 𝑔 and 𝐷 are aggregating the information over the input.</p><p>In the following, we will illustrate Definition    While Example 1 is a simple compositional function on a sequence, Example 2 is a more sophisticated one. This is to highlight that our proposed Definition 1 can handle functions which require more complex interactions between the tokens in a sequence. Example 1 has a cDAG with a maximum out-degree 𝑞 = 1, implying a single path from any source to a sink. Example 2 has a cDAG with a maximum out-degree 𝑞 = 3 across all levels in the DAG, implying that there can be a large number of paths to any sink from a source. This allows the definition to include functions where certain tokens in the sequence are of much higher importance to the output than others. These examples also highlight that edges in the cDAG are allowed to skip levels, and the sinks can be from different levels, further highlighting the compositional flexibility. We like to remark on a couple of points here: (i) Through these examples, we show that our definition explicitly considers how the problem of sequence processing is broken up into sub-problems -the cDAG embodies how disjoint or intertwined these "sub-problems" are by explicitly considering the computation hierarchy. (ii) For input sequences 𝑋, 𝑋 ¯from the same problem domain, and the same compositional function 𝑓 , we allow the cDAG to be different -cDAG 𝐷(𝑋) can be input-dependentthereby allowing different input sequences to have different sub-problem hierarchies. At a non-technical level, we also believe that our proposed Definition 1 connects intuitively to existing definitions: Both Examples 1 and 2 can be seen as compositional functions, but Example 2 is clearly a more complex composition. In addition to its intuitive nature, our proposed definition allows us to understand how complex the compositionality is beyond just stating if a function is compositional. The compositional complexity of 𝑓 depends on the functions 𝑔, ℎ, 𝑒 as well as the cDAG function 𝐷 that drives the computation. For a sequence 𝑋 of length 𝐿, 𝐷(𝑋) has 𝐿 source nodes, maximum in-degree of 𝑘 (controlling the span size for 𝑔), 𝑚 sink nodes (controlling the capacity of ℎ), maximum out-degree of 𝑞 (quantifying the "localism" of the effect of a node). However, these do not explicitly incorporate the fact that changes to nodes at lower levels of the cDAG can have a larger effect on the output than changes to nodes at higher levels of the cDAG. We propose a new quantification -the locus of influence (LoI): This definition of the complexity of composition incorporates both the complexity of the cDAG 𝐷(𝑋) and the complexity of the span processor 𝑔 : ℋ 𝑘 → ℋ in terms of its smoothness, with higher values of 𝑐 indicating more complex (less smooth) 𝑔. The absolute LoI 𝛿 𝑖 incorporates the effect of longer paths, with the effect growing with path length, and corresponds to the sensitivity of the compositional function output to any one input token in the sequence.</p><p>The smaller the absolute LoI 𝛿 𝑖 of any input index 𝑖, more local its effect, and thus more structure that can be transferred between examples if 𝑥 𝑖 is replaced with something else. A relative LoI 𝛽 𝑖 greater than 1/𝐿 denotes that the input index 𝑖 (and thus input token 𝑥 𝑖 ) has an out-sized effect on 𝐷(𝑋) (and thus the computation) compared to the other indices (tokens). In Example 1 (left), 𝛿 1 = 𝑐 2 , 𝛽 1 = 1 /2𝑐+3 &lt; 1 /5 while 𝛿 3 = 𝑐 3 , 𝛽 3 = 𝑐 /2𝑐+3 &gt; 1 /5, implying that 𝑥 3 has more influence (absolute and relative) function than 𝑥 1 (assuming 𝑐 &gt; 1). In Example 2 (left), 𝛿 1 = 𝑐 4 + 2𝑐 3 , 𝛽 1 = 𝑐+2 /27𝑐+39 ≈ 1 /22 &lt; 1 /7, while 𝛿 5 = 7𝑐 4 + 9𝑐 3 , 𝛽 5 = 7𝑐+9 /27𝑐+39 ≈ 1 /4 &gt; 1 /7, hence 𝑥 5 has a significantly larger influence than 𝑥 1 . We utilize the LoI to define the complexity of a compositional function, and a class of such compositional functions: Definition 3. A function 𝑓 : 𝒳 → 𝒴 with components 𝑔, ℎ, 𝑒, 𝐷 is (𝑘, 𝑞, 𝑚, 𝛿, 𝛽)-compositional if, for any 𝑋 ∈ 𝒳 of length 𝐿, the cDAG 𝐷(𝑋) has a in-degree of 𝑘, maximum outgoing degree of 𝑞, and 𝑚 sink nodes, and for ∀𝑖 ∈ 𝐿 , 𝛿 𝑖 ≤ 𝛿, and 𝛽 𝑖 ≤ 𝛽 ∈ [1/𝐿, 1). We denote with ℱ a class of such (𝑘, 𝑞, 𝑚, 𝛿, 𝛽)-compositional functions.</p><p>A small 𝛿 and a 𝛽 close to 1/𝐿 signifies a function that possesses a high level of localism across all input sequences and tokens in its domain. While this function has the most structure, it might not be suitable for practical purposes. A high 𝛿 and a 𝛽 close to 1/𝐿 signifies a very complex function where there is a lot of interaction between all the input tokens in all input sequences, making it hard to exploit any compositional structure in the function. A high 𝛿 and a 𝛽 significantly higher than 1/𝐿 indicates an interesting class of functions where, some input tokens can have a high influence over the function computation, but, for most tokens, there is a compositional structure in the function that can be exploited. This intuitively seems to be an interesting and more practical class of compositional functions since assuming all tokens have an equal level of relative influence seems quite restrictive.  The cDAG for various existing sequence processing models such as the unidirectional and bidirectional recurrence models, convolutional models and attention based transformer models.  In Fig. <ref type="figure" target="#fig_4">2</ref>, we re-express existing sequence processing models as per our definition, teasing out the symbolic cDAG (and the neural 𝑔, ℎ), and we present their (simplified) compositional complexity in Table <ref type="table" target="#tab_5">1</ref> assuming that all models classes utilize span processors 𝑔 with the same smoothness parameter 𝑐 for the ease of exposition. This highlights the flexibility and utility of our proposed quantification of compositionality (see <ref type="bibr">Ram et al. [10,</ref><ref type="bibr">Section 4]</ref> for more details and models). Beyond the properties of the cDAG (the in-degree 𝑘, out-degree 𝑞 and number of sink nodes 𝑚) and the upper bounds on the absolute LoI 𝛿 and relative LoI 𝛽, we also highlight two properties: (i) Whether the model utilizes (implicitly or explicitly) an input-dependent cDAG (that is, 𝐷(𝑋) is not the same DAG for all 𝑋 of length 𝐿), and (ii) Whether the same model is able to operate on arbitrary length input sequences. The use of input-dependent cDAG has implications in terms of the expressivity of the model -it can be shown that functions from a model class (with compositional complexites 𝛿, 𝛽) with a fixed input-agnostic cDAG cannot approximate functions from a model class of matching compositional complexity (that is, same compositional complexity 𝛿, 𝛽) that utilize input-dependent cDAGs. <ref type="bibr">Ram et al. [10,</ref><ref type="bibr">Theorem 1]</ref> show that the approximation is lower and upper bounded by 𝒪(𝛿) and 𝒪(𝛿/𝛽) respectively. This implies that a higher value of absolute compositional complexity 𝛿, and a smaller relative compositional complexity 𝛽 adversely affect the approximation guarantees. The absolute compositional complexity 𝛿 has been shown to be directly tied to the generalization gap for a learned compositional function, with higher 𝛿 implying worse systematic generalization guarantee <ref type="bibr" target="#b9">[10,</ref><ref type="bibr">Theorem 2]</ref>. The ability to operate on arbitrary length sequences is a prerequisite to the ability of a model to possess length generalization or productivity -the ability to generalize to sequences of larger lengths than those seen during training. We will be pursuing length generalization in our future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Conclusion</head><p>In this paper, we briefly present our novel definition of compositional functions that explicitly separates out the neural and symbolic aspects of a model for the ease of analysis. We also present a notion of compositional complexity that quantifies the complexity with which the tokens in an input sequence are put together to get to the output. We briefly highlight the generality and utility of this definition by demonstrating how existing sequence processing models fit into this definition.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: cDAGs for 𝑓 (𝑋) (left) and 𝑓 (𝑋 ¯) (center left) in Example 1 and 𝑓 (𝑋) (center right) and 𝑓 (𝑋 ¯) (right) in Example 2. Nodes are labeled 𝑙:𝑖 (level 𝑙, index 𝑖). Sources are Fuchsia, sinks Sepia, and internal nodes Blue.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 (Example 2 .</head><label>12</label><figDesc>center left) shows the cDAG 𝐷(𝑋 ¯) of the same 𝑓 on 𝑋 ¯̸ = 𝑋 with the same 𝑘 = 2, 𝑞 = 1, 𝑚 = 1 and 𝑓 (𝑋 ′ ) = ℎ (𝑔(𝑔(𝑒 1 , 𝑔(𝑒 2 , 𝑒 3 )), 𝑔(𝑒 4 , 𝑒 5 ))). Note that 𝐷(𝑋) is not the same as 𝐷(𝑋 ¯). Figure 1 (center right) shows the cDAG D(𝑋) for a compositional f on 𝑋 = [𝑥 1 , . . . , 𝑥 7 ], with f(𝑋) = h (𝑣 4:1 , 𝑣 3:1 ), 𝑘 = 3 maximum in-degree, 𝑞 = 3 maximum out-degree, 𝑚 = 2 sinks, 𝑒 𝑖 = 𝑒(𝑥 𝑖 , 𝑖) ∈ ℋ, span processor g : ℋ 3 → ℋ, and read-out function h : ℋ 2 → 𝒴. The source values 𝑣 0:𝑖 = 𝑒 𝑖 for each 𝑖 ∈ {1, . . . , 7}, and the internal node values are: 𝑣 1:1 ← g(𝑒 1 , 𝑒</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 1 (</head><label>1</label><figDesc>right) shows the cDAG D(𝑋 ¯) of the same f on 𝑋 ¯̸ = 𝑋 with the same 𝑘 = 3, 𝑞 = 3, 𝑚 = 2.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2:The cDAG for various existing sequence processing models such as the unidirectional and bidirectional recurrence models, convolutional models and attention based transformer models.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head></head><label></label><figDesc>Definition 2 (LoI of a source node). Consider a function 𝑓 with components 𝑒, 𝐷, 𝑔, ℎ (Definition 1). Let (𝑣 𝑛 1 , . . . , 𝑣 𝑛 𝑗 , . . . , 𝑣 𝑛 𝑘 ) ∈ ℋ 𝑘 be any input to the span processor 𝑔, with 𝑣 𝑛 = 𝑔(𝑣 𝑛 1 , . . . , 𝑣 𝑛 𝑗 , . . . , 𝑣 𝑛 𝑘 ) its output. Let 𝜀 ∈ ℋ be a "perturbation" to the 𝑗 th argument to 𝑔, 𝑗 ∈ 𝑘 , resulting in the perturbed output 𝑣 𝑗 𝑛 (𝜀) = 𝑔(𝑣 𝑛 1 , . . . , 𝑣 𝑛 𝑗 + 𝜀, . . . , 𝑣 𝑛 𝑘 ). Let 𝑐 &gt; 0 be a constant such that ∀𝑗 ∈ 𝑘 , ∀𝜀 ∈ ℋ, For a sequence 𝑋 ∈ 𝒳 of length 𝐿, and a source node 0:𝑖 in 𝐷(𝑋), let 𝑃 (𝑥 𝑖 ) be the set of all unique paths from 0:𝑖 to any of the sink nodes in 𝐷(𝑋). The absolute LoI of index 𝑖 is 𝛿 𝑖 = ∑︀</figDesc><table><row><cell>⃦ ⃦ ⃦𝑣 𝑛 − 𝑣 𝑗 𝑛 (𝜀) ⃦ ⃦ ⃦ ≤ 𝑐‖𝜀‖.</cell></row></table><note>𝑃 ∈𝑃 (𝑥 𝑖 ) 𝑐 |𝑃 | , with |𝑃 | as the length of a path 𝑃 ∈ 𝑃 (𝑥 𝑖 ), and the relative LoI is 𝛽 𝑖 = 𝛿 𝑖 / ∑︀ 𝑗∈ 𝐿 𝛿 𝑗 .</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 1</head><label>1</label><figDesc>Compositional complexities of existing models with input sequences of length 𝐿. LoI is specified approximately for the ease of exposition. †: Convolve+Pool induces input-dependent cDAGs for max/min-pool, not for avg/sumpool. ‡: The number of sinks 𝑚 needs to be specified for Convolve+pool, and the model can handle arbitrary length if it can recursively apply Convolve+pool until the number of nodes at a level is reduced to 𝑚.</figDesc><table /></figure>
		</body>
		<back>

			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"> <ref type="bibr" target="#b0">1</ref> <p>A preliminary version of this work <ref type="bibr" target="#b8">[9]</ref> was previously presented at the KBCG@IJCAI23 workshop, while the extended version for this submission <ref type="bibr" target="#b9">[10]</ref> can be found at https://www.arxiv.org/abs/2405.02350.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Compositionality I: Definitions and variants</title>
		<author>
			<persName><forename type="first">P</forename><surname>Pagin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Westerståhl</surname></persName>
		</author>
		<idno type="DOI">10.1111/j.1747-9991.2009.00228.x</idno>
		<ptr target="https://compass.onlinelibrary.wiley.com/doi/abs/10.1111/j.1747-9991.2009.00228.x" />
	</analytic>
	<monogr>
		<title level="j">Philosophy Compass</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="250" to="264" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Generalization without systematicity: On the compositional skills of sequenceto-sequence recurrent networks</title>
		<author>
			<persName><forename type="first">B</forename><surname>Lake</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Baroni</surname></persName>
		</author>
		<ptr target="https://proceedings.mlr.press/v80/lake18a.html" />
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="2873" to="2882" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">COGS: A compositional generalization challenge based on semantic interpretation</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Linzen</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2020.emnlp-main.731.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="9087" to="9105" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Compositionality decomposed: how do neural networks generalise?</title>
		<author>
			<persName><forename type="first">D</forename><surname>Hupkes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dankers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bruni</surname></persName>
		</author>
		<ptr target="https://jair.org/index.php/jair/article/view/11674" />
	</analytic>
	<monogr>
		<title level="j">Journal of Artificial Intelligence Research</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="page" from="757" to="795" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Klinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Adjodah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Marois</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Joseph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Riemer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Pentland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Campbell</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2006.09437</idno>
		<ptr target="https://arxiv.org/abs/2006.09437" />
		<title level="m">A study of compositional generalization in neural models</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Compositional generalization for primitive substitutions</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hestness</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/D19-1438/" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="4284" to="4293" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Compositional generalization by learning analytical expressions</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>An</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-G</forename><surname>Lou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhang</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2020/file/83adc9225e4deb67d7ce42d58fe5157c-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Learning compositional rules via neural program synthesis</title>
		<author>
			<persName><forename type="first">M</forename><surname>Nye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Solar-Lezama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tenenbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">M</forename><surname>Lake</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2020/file/7a685d9edd95508471a9d3d6fcace432-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="10832" to="10842" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">How compositional is a model?</title>
		<author>
			<persName><forename type="first">P</forename><surname>Ram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Klinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Gray</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=OImyRhNLv3" />
	</analytic>
	<monogr>
		<title level="m">International Joint Conference on Artificial Intelligence 2023 Workshop on Knowledge-Based Compositional Generalization</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Ram</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Klinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Gray</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2405.02350</idno>
		<ptr target="https://arxiv.org/abs/2405.02350" />
		<title level="m">What makes Models Compositional? A Theoretical View: With Supplement</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Lstm can solve hard long time lag problems</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidhuber</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper/1996/file/a4d2f0d23dcc84ce983ff9157f8b7f88-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="1996">1996</date>
			<biblScope unit="volume">9</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Attention is all you need, Advances in neural information processing systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf" />
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">30</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Neuro-symbolic artificial intelligence</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Sarker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Eberhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</author>
		<ptr target="https://arxiv.org/pdf/2105.05330.pdf" />
	</analytic>
	<monogr>
		<title level="j">AI Communications</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="197" to="209" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Neural-symbolic learning and reasoning: A survey and interpretation</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">D</forename><surname>Garcez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bowman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Lamb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>De Penning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Illuminoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Poon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">G</forename><surname>Zaverucha</surname></persName>
		</author>
		<ptr target="https://arxiv.org/pdf/1711.03902.pdf" />
	</analytic>
	<monogr>
		<title level="j">Neuro-Symbolic Artificial Intelligence: The State of the Art</title>
		<imprint>
			<biblScope unit="volume">342</biblScope>
			<biblScope unit="page">327</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
