<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Modular Design Patterns for Generative Neuro-Symbolic Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Maaike</forename><forename type="middle">H T</forename><surname>De Boer</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">dep. Data Science</orgName>
								<orgName type="institution">TNO</orgName>
								<address>
									<settlement>The Hague</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Quirine</forename><forename type="middle">S</forename><surname>Smit</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">dep. Data Science</orgName>
								<orgName type="institution">TNO</orgName>
								<address>
									<settlement>The Hague</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michael</forename><surname>Van Bekkum</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">dep. Data Science</orgName>
								<orgName type="institution">TNO</orgName>
								<address>
									<settlement>The Hague</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">André</forename><surname>Meyer-Vitali</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI)</orgName>
								<address>
									<settlement>Saarbrücken</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Thomas</forename><surname>Schmid</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Martin Luther University Halle-Wittenberg</orgName>
								<address>
									<settlement>Halle (Saale)</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="institution">Lancaster University in Leipzig</orgName>
								<address>
									<settlement>Leipzig</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff4">
								<orgName type="institution">Leipzig University</orgName>
								<address>
									<settlement>Leipzig</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Modular Design Patterns for Generative Neuro-Symbolic Systems</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">16331795E99360A6924599511477CC37</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>design patterns</term>
					<term>neuro-symbolic AI</term>
					<term>generative models</term>
					<term>Large Language Models</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Developing systems that are able to generate novel outputs is one of the dominating trends in current Artificial Intelligence (AI) research. Both capabilities and availability of such generative systems, in particular of so-called Large Language Models (LLMs), have been exploding in recent years. While Neuro-Symbolic generative models offer advantages over purely statistical generative models, it is currently difficult to compare the different ways in which the training, fine-tuning and usage of the growing variety of such approaches is carried out. In this work, we use the modular design patterns and Boxology language of van Bekkum et al for this purpose and extend those to enable the representation of generative models, specifically LLMs. These patterns provide a general language to describe, compare and understand the different architectures and methods used. Our main aim is to support better understanding of generative models as well as to support engineering of LLM-based systems. In order to demonstrate the usefulness of this approach, we explore generative Neuro-Symbolic architectures and approaches as use cases for these generative design patterns.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Recently, Artificial Intelligence (AI) has taken a leap in the form of generative models. Prominently, multimodal statistical models, such as DALL-E <ref type="bibr" target="#b0">[1]</ref> and Stable Diffusion <ref type="bibr" target="#b1">[2]</ref> have changed the world of image generation, and with the release of OpenAI's ChatGPT system 1 , the world of text generation has changed forever. Targeting text generation tasks in particular, both the development and the number of Large Language Models (LLMs) has increased enormously. Currently, many different generative models are popping up, both open-source and proprietary <ref type="bibr" target="#b2">[3]</ref>. Moreover, due to open challenges of LLMs, such as hallucination <ref type="bibr" target="#b3">[4]</ref>, explainability <ref type="bibr" target="#b4">[5]</ref> and trustworthiness, novel Neuro-Symbolic generative approaches have emerged <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>.</p><p>GeNeSy <ref type="bibr">'24:</ref> First International Workshop on Generative Neuro-Symbolic AI, May 2024, Hersonissos, Crete, Greece * Corresponding author; both authors contributed equally. Envelope maaike.deboer@tno.nl (M. H. T. d. Boer); quirine.smit@tno.nl (Q. S. Smit); andre.meyer-vitali@dfki.de (A. Meyer-Vitali); thomas.schmid@medizin.uni-halle.de (T. Schmid) Orcid 0000-0002-2775-8351 (M. H. T. d. Boer); 0000-0002-5242-1443 (A. Meyer-Vitali) Not only several LLMs, but also a large number of so-called foundation models dealing with various input and output modalities have entered the scene in recent years. Due to the quantity and diversity of emerging generative techniques, it becomes more and more challenging to keep track of the ever-growing variety of models with different architectures and capabilities. One of the solutions to tackle this issue is to create a high-level conceptual framework to discuss, compare, configure and combine different models is using a Boxology. The Boxology started in the field of Neuro-Symbolic systems, by Van Harmelen and Ten Teije <ref type="bibr" target="#b7">[8]</ref> in 2019. This work is extended in 2021 by van Bekkum et al. <ref type="bibr" target="#b8">[9]</ref> by providing a taxonomically organised vocabulary to describe both processes and data structures used in hybrid systems.</p><p>Here, we propose to use and extend the Boxology to gain insights in a variety of generative models, specifically on LLMs. To this end, we test validity and usefulness of the Boxology in this field on example architectures and applications, such as ChatGPT, KnowGL, GENOME and Logic-LM. Our modular approach supports new architectures and engineering approaches to systems based on generative AI models. Our pattern extensions promote transparency and trustworthiness in system design, by providing interpretable, high-level component descriptions of generative AI models.</p><p>The rest of the paper is organized as follows. In the next section, we give a more detailed overview of the Boxology. In the third section, we propose to extend the Boxology by three novel patterns in order to be able to handle generative models. In section 4, we dive into specific applications and tasks in which generative models, specifically in Neuro-Symbolic systems, are used. We conclude with summarizing our key findings and outlining future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work on the Boxology</head><p>We will base our paper on the paper by van Bekkum et al. <ref type="bibr" target="#b8">[9]</ref>, in which the authors provide a taxonomically organised vocabulary to describe both processes and data structures used in hybrid systems. The highest level of this taxonomy contains instances, models, processes and actors, which may be described as follows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Instances:</head><p>The two main classes of instances are data and symbols. Symbols are defined as to have a designation to an object, class or a relation in the world, which can be either atomic or complex, and when a new symbol is created from another symbol and a system of operations, it should have a designation. Examples of symbols are labels (short descriptions), relations (connections between data items, such as triples) and traces (records of data and events). Data is defined as not symbolic. Examples are numbers, texts, tensors or streams.</p><p>Models: Models are descriptions of entities and their relationships, which can be statistical or semantic. Statistical models represent dependencies between statistical variables, such as LLMs or Bayesian Networks. Semantic models specify concepts, attributes and relationships to represent the implicit meaning of symbols, such as ontologies, taxonomies, knowledge graphs or rule bases.</p><p>Processes: Processes are operations instances and models. Three types of processes are defined: generation, transformation and inference. Generation can be done using, for example, the training of a model or by knowledge engineering. Transformation is the transformation of data, for example from knowledge graph to vector space. Inference can be inductive or deductive, in which induction generalises instances and deduction reaches conclusions on specific instances, such as with classification.</p><p>Actors: Actors can be humans, (software) agents or robots (physically embedded agents). Meyer-Vitali et al. <ref type="bibr" target="#b9">[10]</ref> extended the original paper with a definition of teams of actors in the Boxology.</p><p>Besides the vocabulary, the visual language is defined in van Bekkum et al. <ref type="bibr" target="#b8">[9]</ref>, as an extension on Van Harmelen and Ten Teije <ref type="bibr" target="#b7">[8]</ref>. The visual language consists of rectangular boxes (instances), hexagonal boxes (models), ovals (processes) and triangles (actors) and unspecified arrows between them. Within the boxes the concept will be noted by each level in the vocabulary using colon-separation from most generic to most-specific, for example a neural network will be model:stat:NN.</p><p>van Bekkum et al. <ref type="bibr" target="#b8">[9]</ref> present elementary patterns, which can then be combined into more complex patterns. Patterns 1a and 2a from Figure <ref type="figure" target="#fig_0">1</ref>, for example, can be combined into a pattern which is named 3a in the paper (depicted in Figure <ref type="figure" target="#fig_1">2</ref>). Whereas 1a describes the pattern of training a model based on data (data generates a model), 2a describes the usage of the model in deducing a symbol (data and model deduce a symbol), such as a prediction. The combination in 3a describes a basic structure for a (statistical) Machine Learning (ML) model depicting the training (creating the model) and testing or application phase (applying the model on new data).</p><p>In the past years, the Boxology has been used and extended in different ways. Three of the most influential papers are the formalisation of the notions from the Boxology and implementation in the heterogeneous tool set (Hets) <ref type="bibr" target="#b10">[11]</ref>, the extension of the Boxology for (teams of) actors <ref type="bibr" target="#b9">[10]</ref> and the systematic study of nearly 500 papers published in the past decade in the area of Semantic Web Machine Learning <ref type="bibr" target="#b11">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Design Patterns for Generative Models</head><p>While Generative AI originates in the realm of data-driven AI, it has demonstrated capabilities that exceed classical machine learning tasks like classification and regression by far. In particular, such generative systems specialise in the generation of content, such as images <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>, videos <ref type="bibr" target="#b12">[13]</ref>, or text <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref>. In the original, purely statistical setting, these capabilities are acquired during a so-called (pre-)training phase <ref type="bibr" target="#b16">[17]</ref> where a representation of a large data body is learned and in a second phase used to process input to output that has not explicitly been specified but follows the characteristics of the data body (application phase).</p><p>However, specific arrangements for both (pre-)training and representation usage in downstream tasks vary for different approaches and systems <ref type="bibr" target="#b17">[18]</ref>. In order to allow for a coherent description of the generative paradigm, we propose to extend the elementary patterns of van Bekkum et al. <ref type="bibr" target="#b8">[9]</ref> that describe the generic pattern for instances, models, processes and actors (Figure <ref type="figure" target="#fig_0">1</ref> 1a-1d and 2a-d). Please note that while patterns 1e and 1f are required for certain aspects of the generative paradigm, their usage is not limited to this. Data generation and labelling by humans may also be employed work with any statistical approach. In particular, when describing classical machine learning systems, mostly pattern 2a is used, where the output is a symbol, such as a classification or a label. However, the key concept in generative models is that the output is not a symbol, but data; this can be an image, video or text, depending on the model. Additionally, actors play an important role in Generative AI, by creating prompts or label data. To this end, we here propose three new elementary patterns: pattern 1e, in which an actor can generate data, pattern 1f, in which an actor labels data, and 2e, in which a model can deduce data from data. In the remainder of this section we mainly focus on Large Language Models (LLMs). Please note, however, that the patterns proposed in this section are transferable to other data types, for example to vision transformers, which follow a similar architecture paradigm as transformers but operate on image data .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Transformer Models</head><p>The key technology behind basically all current LLMs is the so-called transformer architecture. The original transformer paper by Vaswani et al. <ref type="bibr" target="#b18">[19]</ref> proposed to use two interacting models, an encoder and a decoder. In the transformer family, some models, however, only use the encoder or the decoder part <ref type="bibr" target="#b17">[18]</ref>. Figure <ref type="figure" target="#fig_2">3A</ref> shows the architecture of a transformer model as a design pattern. Transformers are made up of two parts, an encoder and a decoder. These are usually trained end-to-end (such as flan-T5 <ref type="bibr" target="#b19">[20]</ref>), but can also be used separately as encoderonly (Figure <ref type="figure" target="#fig_2">3B</ref>) or decoder-only (Figure <ref type="figure" target="#fig_2">3C</ref>) models. In the following sections, we focus on an encoder-only and a decoder-only family. Other sections focus on instructions and prompting of different models and the interaction with actors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Encoder only: BERT (base)</head><p>Some systems are encoder-only. These systems are specialised in contextual encoding, often named a base model. They can 'understand' and encode input sentences. An encoder model is trained using data, pattern 1a. It is often connected to other systems, such as a classification system, pattern 3a (see Figure <ref type="figure" target="#fig_2">3B</ref>), to be useful for tasks other than the encoding input sentences. An example of this is BERT <ref type="bibr" target="#b20">[21]</ref>. Encoders are transformer models, but not generative models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2.">Decoder only: GPT</head><p>Other transformer based systems have decoder-only architectures. This approach is complementary to the encoder-only paradigm, but structurally different <ref type="bibr" target="#b17">[18]</ref>: an encoder processes the input data (in these cases text) and transforms it into a different, machine interpretable, representation, often a vector representation. A decoder-only system, on the other hand, decodes the input data directly, without being transformed into a higher, more abstract representation, to the desired representation (text or images). Examples of this are generative models from the GPT family <ref type="bibr" target="#b13">[14]</ref>.</p><p>In the Boxology, both encoders and decoders have a similar representation. For generative models from the GPT family, we suggest pattern 3c, (see Figure <ref type="figure" target="#fig_1">2</ref>), which is a combination of 1a and 2e, as presented in Figure <ref type="figure" target="#fig_0">1</ref>: data is used to train a decoder model, which does not use an encoder as input as well, such as with other transformers. This decoder model can be used to deduce output data from input data directly.</p><p>Decoder-only architectures may be further divided into causal decoder architectures and prefix decoder architectures. Causal decoder architectures, such as GPT <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b13">14]</ref> and BLOOMZ <ref type="bibr" target="#b22">[23]</ref>, use only unidirectional attention to the input sequence by using a specific mask. Prefix decoder architectures, such as PaLM <ref type="bibr" target="#b23">[24]</ref>, uses the bidirectional attention for tokens in the prefix while maintaining unidirectional attention for generating subsequent tokens. Both architectures follow the elementary pattern 2e.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3.">Prompts and Instructions</head><p>One of the main differences between current LLMs and earlier BERT or other transformer models is that the model is fine-tuned on instructions <ref type="bibr" target="#b17">[18]</ref>. Multi-task fine-tuning or instruction tuning, is currently often done using a collection of datasets phrased as instructions, to improve model performance and generalisation to unseen tasks <ref type="bibr" target="#b19">[20]</ref>. The original model is often referred to as foundation model <ref type="bibr" target="#b24">[25]</ref>, whereas the fine-tuned model is an adjusted model. In the Boxology, we define this adjusted model as another model as we did with the encoder and decoder model in Figure <ref type="figure" target="#fig_2">3</ref>, but then stacking two decoder models. This instruction tuning also follows pattern 1a, but this data is different as it also contains instructions.</p><p>Next to instruction learning LLMs can also be tweaked by in-context learning. Here examples are used as part of the prompt to give context for the answers to the instructions. In this case the model weights are not changed. This optimizes the performance of models on different tasks <ref type="bibr" target="#b25">[26]</ref>, but does not need as much training data as training a model from scratch. These prompts can include a few (training) examples of the input and output (few-shot) or no examples (zero-shot). These few-shot examples do not train the foundational or instruction model, and therefore we model them as input data that is used to deduce data (text), which is pattern 2e. Assistants or GPTs could, however, be seen as a new model, especially if they perform other tasks, such as Retrieval Augmented Generation (RAG).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Actor Interaction</head><p>Actors play a large role in the current generative models. In the original paper by van Bekkum et al. <ref type="bibr" target="#b8">[9]</ref>, patterns using actors are underspecified. On the one hand, actors often create data, not only in the interaction with an agent that uses generative models, but also in common Machine Learning approaches. Many of the created textual datasets are written, pre-processed and labelled by actors. A first proposed pattern is pattern 1e, in which an actor creates data. The second proposed pattern is pattern 1f, in which an actor generates a label, or annotates data. Both patterns are depicted in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>Generative models are often not used only once. With the current chat functions, actors are interacting with the model multiple times. The main difference with other Machine Learning models, where also more data is inputted and symbols are outputted, the data inputted is often not dependent on the output of the previous data point. However, with conversational generative models, prompts can be related to the previous response. Currently, recurrent or iterative behaviour is not yet part of the pattern concepts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Design Patterns for Generative Neuro-Symbolic AI</head><p>In this section, we describe and explore several papers that use generative models in a Neuro-Symbolic system. The selected papers are chosen, as they represent a diverse set of possibilities to use a generative model, at the start of the system, in the middle and at the end, but also to act as a fluent language interface or a formal language interface. We also included ChatGPT, which is the most famous generative AI system, and although mainly data driven, includes a symbolic component in the reward modelling part of the training phase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">(Training of) ChatGPT</head><p>ChatGPT is an application of the foundational model GPT3 <ref type="bibr" target="#b13">[14]</ref>, and later GPT4 <ref type="bibr" target="#b26">[27]</ref>. It is trained further to be of aid in the setting of an assistant. The architecture of the training phases is represented in Figure <ref type="figure" target="#fig_3">4</ref>. The foundational model GPT3 is used as a basis for further training  To further train ChatGPT to give the desired responses the reward model is added. The reward model is a separate model, which can judge if a response is a good one, given the instructions. The reward model is trained by people annotating the multiple answers to instructions. To train the reward model, the model trained on instructions is asked to output multiple answers. These answers are then ranked by annotators to generate a training set for the reward model (1f). The reward model is trained to compare answers of ChatGPT and return their score (3a). This is then used in a loop with the ChatGPT to improve the instruction answering process. As one can view, we have adapted Boxology patterns to be able to accept multiple inputs.</p><p>When applying ChatGPT in a pipeline, it suffices to show only pattern 3c, the block containing ChatGPT and 1e to show the user writing the prompt.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">KnowGL</head><p>Figure <ref type="figure" target="#fig_4">5</ref> shows KnowGL Parser <ref type="bibr" target="#b27">[28]</ref>, a NeSy system combining a generative module and symbolic methods. The KnowGL Parser can be used to automatically extract knowledge graphs from collections of documents. It is based on BART-large, which has an encoder-decoder architecture. The encoder receives a sentence (1a) and the decoder generates a list of 'subject, relation, object' (3c). These are then parsed (transformed) in preparation of the next step, fact ranking (1d). Here a ranked list is created of distinct facts and their scores (2b). In the final step the generated facts are linked to Wikidata. This is done using a mapping of labels to Wikidata IDs (2b). In the case that the generative model has created a new entity, type or relation label that are not in Wikidata it returns 'null'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">KnowBERT</head><p>While knowledge is mostly injected to statistical generative models either during the input or during the output stage, also approaches to inject knowledge inside the model have been proposed. A prominent example is KnowBERT, a modified variant of the transformer architecture BERT <ref type="bibr" target="#b28">[29]</ref>. Although not a generative model, it stands out for its fusion of contextual and graph representations, attention-enhanced entity spanned knowledge infusion, and flexibility in injecting multiple Knowledge Graphs at various model levels. By integrating so-called Knowledge Attention and Recontextualization (KAR) layers <ref type="bibr" target="#b29">[30]</ref>, graph entity embeddings are utilized that are processed through an attention mechanism to enhance entity span embeddings. This happens in later layers of the model to stabilize training but may potentially also used to inject knowledge at earlier stages <ref type="bibr" target="#b5">[6]</ref>. The Boxology pattern for KnowBERT is depicted in Figure <ref type="figure" target="#fig_5">6</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Mathematical Conjecturing and LLMs</head><p>The system proposed by Johansson and Smallbone <ref type="bibr" target="#b30">[31]</ref> assigns the generative task of discovery of mathematical conjectures to a LLM (3c), while the results can be checked afterwards using a symbolic theorem prover or counter-example finder (2b). The system is prompted with a formal theory (e.g. a sort function), and has the LLM generate lemmas from the theory. These generated lemmas are transformed from data to symbol and can then be used by the semantic model(s). The pattern is depicted in Figure <ref type="figure" target="#fig_6">7</ref>. The approach taken in Yang et al. <ref type="bibr" target="#b31">[32]</ref> is also captured by this pattern. The system proposed uses a LLM component to produce Prolog code  (3c) and a symbolic inference engine to produce answers and reasoning traces by executing the aforementioned code (1d, 2b).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">GENOME</head><p>Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules (GENOME) <ref type="bibr" target="#b32">[33]</ref> focuses on the task of generative software module learning, based on a LLM generating signatures (input/output) and reasoning steps, then have a LLM create the software module based on those and evaluate the module on test cases.</p><p>The system consists of three stages: module initialization, module generation, and module execution. The design pattern is depicted in Figure <ref type="figure" target="#fig_7">8</ref>. First a LLM assesses a visual-language question and outputs new module signatures and operation steps as a response to the query (3c), if current modules cannot provide an adequate response. In the next step, the LLM creates a module (software code) based on the signature/test case (3c). Finally the module is executed by passing it a visual query (2a). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6.">Logic-LM</head><p>Logic-LM <ref type="bibr" target="#b33">[34]</ref> integrates LLMs with symbolic solvers to improve logical problem-solving. The pattern is depicted in Figure <ref type="figure" target="#fig_8">9</ref>: the system utilizes LLMs to translate a problem stated in natural language problem into a symbolic formulation (3c). In the next step, a symbolic reasoner performs logical inference on the formulated problem (1d, 2b, 1d). Finally, an LLM interprets the results and outputs natural language (3c). The LLM thus functions as a fluent language interface (both on input and output) to a symbolic reasoner component.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Work</head><p>Generative AI is currently a major technology with many applications and combining datadriven approaches with knowledge-based techniques is a promising development to this end. In this paper, we propose new design patterns for modular generative Neuro-Symbolic systems to be included into the design pattern approach for Neuro-Symbolic systems as proposed by van Bekkum et al. <ref type="bibr" target="#b8">[9]</ref>. We show how the composition of elementary patterns can be used to describe generative models, and we explore several specific generative models, such as ChatGPT, as well as several generative NeSy papers, such as KnowGL, GENOME and Logic-LM.</p><p>We acknowledge that this is only the first step in a more elaborate exploration on generative design patterns and the description of generative Neuro-Symbolic architectures. In future work, we would like to validate our proposals for extending the Boxology, by applying them to more examples from additional papers. In addition, we expect to further extend and deepen the Boxology itself. In this paper, it became clear that the temporal or iterative aspect is not yet visualised well, as well as the naming and formalisation of the Boxology, including the do's and don'ts: which pattern combinations are allowed and which are not? The importance of modelling datasets for generative AI may be taken into account in future specifications of particular subtypes of Instances and Models in the taxonomy. Additionally, the use of graphical tools for software development is well-known from the Unified Modelling Language (UML) and visual programming tools, such as LabView or Scratch. We are mostly concerned with graphical representations of design patterns for system design and documentation, but the promise of templates, low-code or no-code development is appealing for the future.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: All elementary design patterns, including proposed additions 1e, 1f and 2e</figDesc><graphic coords="4,89.29,84.19,416.70,348.58" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Compositional design patterns, including proposed addition 3c made by combining elementary pattern 1a and 2e.</figDesc><graphic coords="5,89.29,84.19,416.71,167.27" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Three uses of transformers. A shows the traditional encoder-decoder architecture. B shows an encoder-only model applied to a classification task. C shows a decoder-only architecture.</figDesc><graphic coords="6,151.80,84.19,291.70,328.11" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Training phase of ChatGPT</figDesc><graphic coords="8,89.29,84.19,416.69,162.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Boxology representation of KnowGL<ref type="bibr" target="#b27">[28]</ref> </figDesc><graphic coords="8,89.29,285.55,416.70,103.18" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Boxology representation of KnowBERT<ref type="bibr" target="#b28">[29]</ref> </figDesc><graphic coords="9,120.54,84.19,354.21,156.92" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Boxology representation of using LLMs for discovery of mathematical conjectures<ref type="bibr" target="#b30">[31]</ref> </figDesc><graphic coords="10,89.29,84.19,416.70,151.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 8 :</head><label>8</label><figDesc>Figure 8: Boxology representation of GENOME<ref type="bibr" target="#b32">[33]</ref> </figDesc><graphic coords="10,89.29,275.23,416.69,147.97" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 9 :</head><label>9</label><figDesc>Figure 9: Boxology representation of Logic-LM<ref type="bibr" target="#b33">[34]</ref> </figDesc><graphic coords="11,151.80,84.19,291.71,359.13" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We would like to thank the TNO project GRAIL for their financial support, as well as Frank van Harmelen and Annette ten Teije for their feedback. We would also like to thank Daan Di Scala for his contribution to the KnowGL pattern.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Improving image generation with better captions</title>
		<author>
			<persName><forename type="first">J</forename><surname>Betker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Goh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brooks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Science</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page">8</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Rombach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Blattmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lorenz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Esser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ommer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2112.10752</idno>
		<title level="m">High-resolution image synthesis with latent diffusion models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Jiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ravaut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Joty</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.16989</idno>
		<title level="m">Chatgpt&apos;s one-year anniversary: Are open-source large language models catching up?</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Survey of hallucination in natural language generation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Frieske</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ishii</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">J</forename><surname>Bang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madotto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Explainability for large language models: A survey</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Du</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Intelligent Systems and Technology</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="1" to="38" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Colon-Hernandez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Havasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Huggins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Breazeal</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2101.12294</idno>
		<title level="m">Combining pre-trained language models and structured knowledge</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bhatia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Arnold</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2110.08455</idno>
		<title level="m">Knowledge enhanced pretrained language models: A compreshensive survey</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Ten Teije, A boxology of design patterns for hybrid learning and reasoning systems</title>
		<author>
			<persName><forename type="first">F</forename><surname>Van Harmelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Web Engineering</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="97" to="123" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Modular design patterns for hybrid learning and reasoning systems: a taxonomy, patterns and use cases</title>
		<author>
			<persName><forename type="first">M</forename><surname>Van Bekkum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Boer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Van Harmelen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Meyer-Vitali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T</forename><surname>Teije</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Intelligence</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="page" from="6528" to="6546" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Modular design patterns for hybrid actors</title>
		<author>
			<persName><forename type="first">A</forename><surname>Meyer-Vitali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Mulder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">H T</forename><surname>De Boer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2109.09331</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Mossakowski</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2206.04724</idno>
		<title level="m">Modular design patterns for neural-symbolic integration: refinement and combination</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Combining machine learning and semantic web: A systematic mapping study</title>
		<author>
			<persName><forename type="first">A</forename><surname>Breit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Waltersdorfer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Ekaputra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sabou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ekelhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Iana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Portisch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Revenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T</forename><surname>Teije</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.17177</idno>
		<title level="m">Sora: A review on background, technology, limitations, and opportunities of large vision models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rozière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hambro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Azhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.13971</idno>
		<title level="m">Llama: Open and efficient foundation language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Introducing gemini: our largest and most capable ai model</title>
		<author>
			<persName><forename type="first">S</forename><surname>Pichai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hassabis</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Why does unsupervised pre-training help deep learning?</title>
		<author>
			<persName><forename type="first">D</forename><surname>Erhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings</title>
				<meeting>the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="201" to="208" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Recent advances in natural language processing via large pre-trained language models: A survey</title>
		<author>
			<persName><forename type="first">B</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sulem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P B</forename><surname>Veyseh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Sainz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Agirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Heintz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page" from="1" to="40" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Longpre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zoph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fedus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dehghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brahma</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2210.11416</idno>
		<title level="m">Scaling instruction-finetuned language models</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">OpenAI blog</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">9</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Muennighoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sutawika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Biderman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">L</forename><surname>Scao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Bari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-X</forename><surname>Yong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schoelkopf</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2211.01786</idno>
		<title level="m">Crosslingual generalization through multitask finetuning</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Palm: Scaling language modeling with pathways</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chowdhery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Barham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sutton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gehrmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="1" to="113" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Bommasani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Hudson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Adeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Altman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Arora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Arx</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bohg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bosselut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Brunskill</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2108.07258</idno>
		<title level="m">On the opportunities and risks of foundation models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhong</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2401.02038</idno>
		<title level="m">Understanding llms: A comprehensive overview from training to inference</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q.-L</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tang</surname></persName>
		</author>
		<title level="m">A brief overview of chatgpt: The history, status quo and potential future development</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Knowgl: Knowledge generation and linking from text</title>
		<author>
			<persName><forename type="first">G</forename><surname>Rossiello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F M</forename><surname>Chowdhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Mihindukulasooriya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Cornec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Gliozzo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AAAI</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="16476" to="16478" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Logan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">V</forename></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schwartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.04164</idno>
		<title level="m">Knowledge enhanced contextual word representations</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Balažević</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Allen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Hospedales</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1901.09590</idno>
		<title level="m">Tucker: Tensor factorization for knowledge graph completion</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Exploring mathematical conjecturing with large language models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Johansson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Smallbone</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NeSy</title>
				<meeting>NeSy</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lam</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.09802</idno>
		<title level="m">Neuro-symbolic integration brings causal and reliable reasoning proofs</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">Genome: Generative neuro-symbolic visual reasoning by growing and reusing modules</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.04901</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Albalak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.12295</idno>
		<title level="m">Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
