<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Modular Design Patterns for Generative Neuro-Symbolic Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maaike H. T. de Boer</string-name>
          <email>maaike.deboer@tno.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Quirine S. Smit</string-name>
          <email>quirine.smit@tno.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael van Bekkum</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>André Meyer-Vitali</string-name>
          <email>andre.meyer-vitali@dfki.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Schmid</string-name>
          <email>thomas.schmid@medizin.uni-halle.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Deutsches Forschungszentrum für Künstliche Intelligenz GmbH (DFKI)</institution>
          ,
          <addr-line>Saarbrücken</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>GeNeSy'24: First International Workshop on Generative Neuro-Symbolic AI</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lancaster University in Leipzig</institution>
          ,
          <addr-line>Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Leipzig University</institution>
          ,
          <addr-line>Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Martin Luther University Halle-Wittenberg</institution>
          ,
          <addr-line>Halle (Saale)</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>TNO, dep. Data Science</institution>
          ,
          <addr-line>The Hague</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Developing systems that are able to generate novel outputs is one of the dominating trends in current Artificial Intelligence (AI) research. Both capabilities and availability of such generative systems, in particular of so-called Large Language Models (LLMs), have been exploding in recent years. While Neuro-Symbolic generative models ofer advantages over purely statistical generative models, it is currently dificult to compare the diferent ways in which the training, fine-tuning and usage of the growing variety of such approaches is carried out. In this work, we use the modular design patterns and Boxology language of van Bekkum et al for this purpose and extend those to enable the representation of generative models, specifically LLMs. These patterns provide a general language to describe, compare and understand the diferent architectures and methods used. Our main aim is to support better understanding of generative models as well as to support engineering of LLM-based systems. In order to demonstrate the usefulness of this approach, we explore generative Neuro-Symbolic architectures and approaches as use cases for these generative design patterns.</p>
      </abstract>
      <kwd-group>
        <kwd>design patterns</kwd>
        <kwd>neuro-symbolic AI</kwd>
        <kwd>generative models</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Recently, Artificial Intelligence (AI) has taken a leap in the form of generative models.
Prominently, multimodal statistical models, such as DALL-E [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Stable Difusion [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] have changed
the world of image generation, and with the release of OpenAI’s ChatGPT system1, the world
of text generation has changed forever. Targeting text generation tasks in particular, both the
development and the number of Large Language Models (LLMs) has increased enormously.
Currently, many diferent generative models are popping up, both open-source and proprietary
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Moreover, due to open challenges of LLMs, such as hallucination [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], explainability [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
trustworthiness, novel Neuro-Symbolic generative approaches have emerged [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
nEvelop-O
∗Corresponding author; both authors contributed equally.
      </p>
      <p>
        Not only several LLMs, but also a large number of so-called foundation models dealing with
various input and output modalities have entered the scene in recent years. Due to the quantity
and diversity of emerging generative techniques, it becomes more and more challenging to keep
track of the ever-growing variety of models with diferent architectures and capabilities. One
of the solutions to tackle this issue is to create a high-level conceptual framework to discuss,
compare, configure and combine diferent models is using a Boxology. The Boxology started in
the field of Neuro-Symbolic systems, by Van Harmelen and Ten Teije [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in 2019. This work is
extended in 2021 by van Bekkum et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] by providing a taxonomically organised vocabulary
to describe both processes and data structures used in hybrid systems.
      </p>
      <p>Here, we propose to use and extend the Boxology to gain insights in a variety of generative
models, specifically on LLMs. To this end, we test validity and usefulness of the Boxology in
this field on example architectures and applications, such as ChatGPT, KnowGL, GENOME and
Logic-LM. Our modular approach supports new architectures and engineering approaches to
systems based on generative AI models. Our pattern extensions promote transparency and
trustworthiness in system design, by providing interpretable, high-level component descriptions
of generative AI models.</p>
      <p>The rest of the paper is organized as follows. In the next section, we give a more detailed
overview of the Boxology. In the third section, we propose to extend the Boxology by three
novel patterns in order to be able to handle generative models. In section 4, we dive into specific
applications and tasks in which generative models, specifically in Neuro-Symbolic systems, are
used. We conclude with summarizing our key findings and outlining future work.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work on the Boxology</title>
      <p>
        We will base our paper on the paper by van Bekkum et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], in which the authors provide
a taxonomically organised vocabulary to describe both processes and data structures used in
hybrid systems. The highest level of this taxonomy contains instances, models, processes and
actors, which may be described as follows.
      </p>
      <p>Instances: The two main classes of instances are data and symbols. Symbols are defined
as to have a designation to an object, class or a relation in the world, which can be
either atomic or complex, and when a new symbol is created from another symbol and
a system of operations, it should have a designation. Examples of symbols are labels
(short descriptions), relations (connections between data items, such as triples) and traces
(records of data and events). Data is defined as not symbolic. Examples are numbers,
texts, tensors or streams.</p>
      <p>Models: Models are descriptions of entities and their relationships, which can be statistical
or semantic. Statistical models represent dependencies between statistical variables,
such as LLMs or Bayesian Networks. Semantic models specify concepts, attributes and
relationships to represent the implicit meaning of symbols, such as ontologies, taxonomies,
knowledge graphs or rule bases.</p>
      <p>Processes: Processes are operations instances and models. Three types of processes are defined:
generation, transformation and inference. Generation can be done using, for example, the
training of a model or by knowledge engineering. Transformation is the transformation
of data, for example from knowledge graph to vector space. Inference can be inductive or
deductive, in which induction generalises instances and deduction reaches conclusions
on specific instances, such as with classification.</p>
      <p>Actors: Actors can be humans, (software) agents or robots (physically embedded agents).</p>
      <p>
        Meyer-Vitali et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] extended the original paper with a definition of teams of actors in
the Boxology.
      </p>
      <p>
        Besides the vocabulary, the visual language is defined in van Bekkum et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], as an extension
on Van Harmelen and Ten Teije [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The visual language consists of rectangular boxes (instances),
hexagonal boxes (models), ovals (processes) and triangles (actors) and unspecified arrows
between them. Within the boxes the concept will be noted by each level in the vocabulary
using colon-separation from most generic to most-specific, for example a neural network will
be model:stat:NN.
      </p>
      <p>
        van Bekkum et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] present elementary patterns, which can then be combined into more
complex patterns. Patterns 1a and 2a from Figure 1, for example, can be combined into a pattern
which is named 3a in the paper (depicted in Figure 2). Whereas 1a describes the pattern of
training a model based on data (data generates a model), 2a describes the usage of the model in
deducing a symbol (data and model deduce a symbol), such as a prediction. The combination
in 3a describes a basic structure for a (statistical) Machine Learning (ML) model depicting the
training (creating the model) and testing or application phase (applying the model on new data).
      </p>
      <p>
        In the past years, the Boxology has been used and extended in diferent ways. Three of the
most influential papers are the formalisation of the notions from the Boxology and
implementation in the heterogeneous tool set (Hets) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the extension of the Boxology for (teams of)
actors [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and the systematic study of nearly 500 papers published in the past decade in the
area of Semantic Web Machine Learning [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Design Patterns for Generative Models</title>
      <p>
        While Generative AI originates in the realm of data-driven AI, it has demonstrated capabilities
that exceed classical machine learning tasks like classification and regression by far. In particular,
such generative systems specialise in the generation of content, such as images [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], videos
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], or text [
        <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16</xref>
        ]. In the original, purely statistical setting, these capabilities are acquired
during a so-called (pre-)training phase [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] where a representation of a large data body is
learned and in a second phase used to process input to output that has not explicitly been
specified but follows the characteristics of the data body (application phase).
      </p>
      <p>
        However, specific arrangements for both (pre-)training and representation usage in
downstream tasks vary for diferent approaches and systems [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In order to allow for a coherent
description of the generative paradigm, we propose to extend the elementary patterns of van
Bekkum et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] that describe the generic pattern for instances, models, processes and actors
(Figure 1 1a-1d and 2a-d). Please note that while patterns 1e and 1f are required for certain
aspects of the generative paradigm, their usage is not limited to this. Data generation and
labelling by humans may also be employed work with any statistical approach.
      </p>
      <p>In particular, when describing classical machine learning systems, mostly pattern 2a is used,
where the output is a symbol, such as a classification or a label. However, the key concept in
generative models is that the output is not a symbol, but data; this can be an image, video or
text, depending on the model. Additionally, actors play an important role in Generative AI, by
creating prompts or label data. To this end, we here propose three new elementary patterns:
pattern 1e, in which an actor can generate data, pattern 1f, in which an actor labels data, and 2e,
in which a model can deduce data from data. In the remainder of this section we mainly focus
on Large Language Models (LLMs). Please note, however, that the patterns proposed in this
section are transferable to other data types, for example to vision transformers, which follow a
similar architecture paradigm as transformers but operate on image data .</p>
      <sec id="sec-4-1">
        <title>3.1. Transformer Models</title>
        <p>
          The key technology behind basically all current LLMs is the so-called transformer architecture.
The original transformer paper by Vaswani et al. [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] proposed to use two interacting models,
an encoder and a decoder. In the transformer family, some models, however, only use the
encoder or the decoder part [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Figure 3A shows the architecture of a transformer model as a
design pattern. Transformers are made up of two parts, an encoder and a decoder. These are
usually trained end-to-end (such as flan-T5 [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]), but can also be used separately as
encoderonly (Figure 3B) or decoder-only (Figure 3C) models. In the following sections, we focus on an
encoder-only and a decoder-only family. Other sections focus on instructions and prompting of
diferent models and the interaction with actors.
        </p>
        <sec id="sec-4-1-1">
          <title>3.1.1. Encoder only: BERT (base)</title>
          <p>
            Some systems are encoder-only. These systems are specialised in contextual encoding, often
named a base model. They can ‘understand’ and encode input sentences. An encoder model is
trained using data, pattern 1a. It is often connected to other systems, such as a classification
system, pattern 3a (see Figure 3B), to be useful for tasks other than the encoding input sentences.
An example of this is BERT [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ]. Encoders are transformer models, but not generative models.
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3.1.2. Decoder only: GPT</title>
          <p>
            Other transformer based systems have decoder-only architectures. This approach is
complementary to the encoder-only paradigm, but structurally diferent [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]: an encoder processes the
input data (in these cases text) and transforms it into a diferent, machine interpretable,
representation, often a vector representation. A decoder-only system, on the other hand, decodes the
input data directly, without being transformed into a higher, more abstract representation, to
the desired representation (text or images). Examples of this are generative models from the
GPT family [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ].
          </p>
          <p>In the Boxology, both encoders and decoders have a similar representation. For generative
models from the GPT family, we suggest pattern 3c, (see Figure 2), which is a combination of 1a
and 2e, as presented in Figure 1: data is used to train a decoder model, which does not use an
encoder as input as well, such as with other transformers. This decoder model can be used to
deduce output data from input data directly.</p>
          <p>
            Decoder-only architectures may be further divided into causal decoder architectures and
prefix decoder architectures. Causal decoder architectures, such as GPT [
            <xref ref-type="bibr" rid="ref14 ref22">22, 14</xref>
            ] and BLOOMZ
[
            <xref ref-type="bibr" rid="ref23">23</xref>
            ], use only unidirectional attention to the input sequence by using a specific mask. Prefix
decoder architectures, such as PaLM [
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], uses the bidirectional attention for tokens in the prefix
while maintaining unidirectional attention for generating subsequent tokens. Both architectures
follow the elementary pattern 2e.
          </p>
        </sec>
        <sec id="sec-4-1-3">
          <title>3.1.3. Prompts and Instructions</title>
          <p>
            One of the main diferences between current LLMs and earlier BERT or other transformer
models is that the model is fine-tuned on instructions [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. Multi-task fine-tuning or instruction
tuning, is currently often done using a collection of datasets phrased as instructions, to improve
model performance and generalisation to unseen tasks [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ]. The original model is often referred
to as foundation model [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ], whereas the fine-tuned model is an adjusted model. In the Boxology,
we define this adjusted model as another model as we did with the encoder and decoder model
in Figure 3, but then stacking two decoder models. This instruction tuning also follows pattern
1a, but this data is diferent as it also contains instructions.
          </p>
          <p>Next to instruction learning LLMs can also be tweaked by in-context learning. Here examples
are used as part of the prompt to give context for the answers to the instructions. In this case
the model weights are not changed. This optimizes the performance of models on diferent
tasks [26], but does not need as much training data as training a model from scratch. These
prompts can include a few (training) examples of the input and output (few-shot) or no examples
(zero-shot). These few-shot examples do not train the foundational or instruction model, and
therefore we model them as input data that is used to deduce data (text), which is pattern 2e.
Assistants or GPTs could, however, be seen as a new model, especially if they perform other
tasks, such as Retrieval Augmented Generation (RAG).</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Actor Interaction</title>
        <p>
          Actors play a large role in the current generative models. In the original paper by van Bekkum
et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], patterns using actors are underspecified. On the one hand, actors often create data,
not only in the interaction with an agent that uses generative models, but also in common
Machine Learning approaches. Many of the created textual datasets are written, pre-processed
and labelled by actors. A first proposed pattern is pattern 1e, in which an actor creates data.
The second proposed pattern is pattern 1f, in which an actor generates a label, or annotates
data. Both patterns are depicted in Figure 1.
        </p>
        <p>Generative models are often not used only once. With the current chat functions, actors are
interacting with the model multiple times. The main diference with other Machine Learning
models, where also more data is inputted and symbols are outputted, the data inputted is
often not dependent on the output of the previous data point. However, with conversational
generative models, prompts can be related to the previous response. Currently, recurrent or
iterative behaviour is not yet part of the pattern concepts.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Design Patterns for Generative Neuro-Symbolic AI</title>
      <p>In this section, we describe and explore several papers that use generative models in a
NeuroSymbolic system. The selected papers are chosen, as they represent a diverse set of possibilities
to use a generative model, at the start of the system, in the middle and at the end, but also to
act as a fluent language interface or a formal language interface. We also included ChatGPT,
which is the most famous generative AI system, and although mainly data driven, includes a
symbolic component in the reward modelling part of the training phase.</p>
      <sec id="sec-5-1">
        <title>4.1. (Training of) ChatGPT</title>
        <p>
          ChatGPT is an application of the foundational model GPT3 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], and later GPT4 [27]. It is
trained further to be of aid in the setting of an assistant. The architecture of the training phases
is represented in Figure 4. The foundational model GPT3 is used as a basis for further training
(1a). Instructions and answers are used to train what will become ChatGPT. Then, based on
new prompts the model generates a response (3c).
        </p>
        <p>To further train ChatGPT to give the desired responses the reward model is added. The reward
model is a separate model, which can judge if a response is a good one, given the instructions.
The reward model is trained by people annotating the multiple answers to instructions. To train
the reward model, the model trained on instructions is asked to output multiple answers. These
answers are then ranked by annotators to generate a training set for the reward model (1f). The
reward model is trained to compare answers of ChatGPT and return their score (3a). This is
then used in a loop with the ChatGPT to improve the instruction answering process. As one
can view, we have adapted Boxology patterns to be able to accept multiple inputs.</p>
        <p>When applying ChatGPT in a pipeline, it sufices to show only pattern 3c, the block containing
ChatGPT and 1e to show the user writing the prompt.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. KnowGL</title>
        <p>Figure 5 shows KnowGL Parser [28], a NeSy system combining a generative module and symbolic
methods. The KnowGL Parser can be used to automatically extract knowledge graphs from
collections of documents. It is based on BART-large, which has an encoder-decoder architecture.
The encoder receives a sentence (1a) and the decoder generates a list of ‘subject, relation, object’
(3c). These are then parsed (transformed) in preparation of the next step, fact ranking (1d). Here
a ranked list is created of distinct facts and their scores (2b). In the final step the generated
facts are linked to Wikidata. This is done using a mapping of labels to Wikidata IDs (2b). In the
case that the generative model has created a new entity, type or relation label that are not in
Wikidata it returns ‘null’.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. KnowBERT</title>
        <p>
          While knowledge is mostly injected to statistical generative models either during the input
or during the output stage, also approaches to inject knowledge inside the model have been
proposed. A prominent example is KnowBERT, a modified variant of the transformer
architecture BERT [29]. Although not a generative model, it stands out for its fusion of contextual and
graph representations, attention-enhanced entity spanned knowledge infusion, and flexibility in
injecting multiple Knowledge Graphs at various model levels. By integrating so-called
Knowledge Attention and Recontextualization (KAR) layers [30], graph entity embeddings are utilized
that are processed through an attention mechanism to enhance entity span embeddings. This
happens in later layers of the model to stabilize training but may potentially also used to inject
knowledge at earlier stages [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The Boxology pattern for KnowBERT is depicted in Figure 6.
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Mathematical Conjecturing and LLMs</title>
        <p>The system proposed by Johansson and Smallbone [31] assigns the generative task of discovery
of mathematical conjectures to a LLM (3c), while the results can be checked afterwards using
a symbolic theorem prover or counter-example finder (2b). The system is prompted with a
formal theory (e.g. a sort function), and has the LLM generate lemmas from the theory. These
generated lemmas are transformed from data to symbol and can then be used by the semantic
model(s). The pattern is depicted in Figure 7. The approach taken in Yang et al. [32] is also
captured by this pattern. The system proposed uses a LLM component to produce Prolog code
(3c) and a symbolic inference engine to produce answers and reasoning traces by executing the
aforementioned code (1d, 2b).</p>
      </sec>
      <sec id="sec-5-5">
        <title>4.5. GENOME</title>
        <p>Generative Neuro-Symbolic Visual Reasoning by Growing and Reusing Modules (GENOME)
[33] focuses on the task of generative software module learning, based on a LLM generating
signatures (input/output) and reasoning steps, then have a LLM create the software module
based on those and evaluate the module on test cases.</p>
        <p>The system consists of three stages: module initialization, module generation, and module
execution. The design pattern is depicted in Figure 8. First a LLM assesses a visual-language
question and outputs new module signatures and operation steps as a response to the query
(3c), if current modules cannot provide an adequate response. In the next step, the LLM creates
a module (software code) based on the signature/test case (3c). Finally the module is executed
by passing it a visual query (2a).
Logic-LM [34] integrates LLMs with symbolic solvers to improve logical problem-solving. The
pattern is depicted in Figure 9: the system utilizes LLMs to translate a problem stated in natural
language problem into a symbolic formulation (3c). In the next step, a symbolic reasoner
performs logical inference on the formulated problem (1d, 2b, 1d). Finally, an LLM interprets
the results and outputs natural language (3c). The LLM thus functions as a fluent language
interface (both on input and output) to a symbolic reasoner component.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion and Future Work</title>
      <p>
        Generative AI is currently a major technology with many applications and combining
datadriven approaches with knowledge-based techniques is a promising development to this end. In
this paper, we propose new design patterns for modular generative Neuro-Symbolic systems to
be included into the design pattern approach for Neuro-Symbolic systems as proposed by van
Bekkum et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We show how the composition of elementary patterns can be used to describe
generative models, and we explore several specific generative models, such as ChatGPT, as well
as several generative NeSy papers, such as KnowGL, GENOME and Logic-LM.
      </p>
      <p>We acknowledge that this is only the first step in a more elaborate exploration on generative
design patterns and the description of generative Neuro-Symbolic architectures. In future work,
we would like to validate our proposals for extending the Boxology, by applying them to more
examples from additional papers. In addition, we expect to further extend and deepen the
Boxology itself. In this paper, it became clear that the temporal or iterative aspect is not yet
visualised well, as well as the naming and formalisation of the Boxology, including the do’s
and don’ts: which pattern combinations are allowed and which are not? The importance of
modelling datasets for generative AI may be taken into account in future specifications of
particular subtypes of Instances and Models in the taxonomy. Additionally, the use of graphical
tools for software development is well-known from the Unified Modelling Language (UML) and
visual programming tools, such as LabView or Scratch. We are mostly concerned with graphical
representations of design patterns for system design and documentation, but the promise of
templates, low-code or no-code development is appealing for the future.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>We would like to thank the TNO project GRAIL for their financial support, as well as Frank van
Harmelen and Annette ten Teije for their feedback. We would also like to thank Daan Di Scala
for his contribution to the KnowGL pattern.
J. Bohg, A. Bosselut, E. Brunskill, et al., On the opportunities and risks of foundation
models, arXiv:2108.07258 (2021).
[26] Y. Liu, H. He, T. Han, X. Zhang, M. Liu, J. Tian, Y. Zhang, J. Wang, X. Gao, T. Zhong, et al.,
Understanding llms: A comprehensive overview from training to inference, arXiv:2401.02038
(2024).
[27] T. Wu, S. He, J. Liu, S. Sun, K. Liu, Q.-L. Han, Y. Tang, A brief overview of chatgpt: The
history, status quo and potential future development, 2023.
[28] G. Rossiello, M. F. M. Chowdhury, N. Mihindukulasooriya, O. Cornec, A. M. Gliozzo,</p>
      <p>Knowgl: Knowledge generation and linking from text, in: AAAI, 2023, pp. 16476–16478.
[29] M. E. Peters, M. Neumann, R. L. Logan IV, R. Schwartz, V. Joshi, S. Singh, N. A. Smith,</p>
      <p>Knowledge enhanced contextual word representations, arXiv:1909.04164 (2019).
[30] I. Balažević, C. Allen, T. M. Hospedales, Tucker: Tensor factorization for knowledge graph
completion, arXiv:1901.09590 (2019).
[31] M. Johansson, N. Smallbone, Exploring mathematical conjecturing with large language
models, Proceedings of NeSy (2023).
[32] S. Yang, X. Li, L. Cui, L. Bing, W. Lam, Neuro-symbolic integration brings causal and
reliable reasoning proofs, 2023. arXiv:2311.09802.
[33] Z. Chen, R. Sun, W. Liu, Y. Hong, C. Gan, Genome: Generative neuro-symbolic visual
reasoning by growing and reusing modules, 2023. arXiv:2311.04901.
[34] L. Pan, A. Albalak, X. Wang, W. Y. Wang, Logic-lm: Empowering large language models
with symbolic solvers for faithful logical reasoning, arXiv:2305.12295 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Betker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Goh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brooks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          , et al.,
          <article-title>Improving image generation with better captions</article-title>
          ,
          <source>Computer Science</source>
          <volume>2</volume>
          (
          <year>2023</year>
          )
          <article-title>8</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rombach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Blattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Esser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ommer</surname>
          </string-name>
          ,
          <article-title>High-resolution image synthesis with latent difusion models</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2112</volume>
          .
          <fpage>10752</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ravaut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joty</surname>
          </string-name>
          ,
          <article-title>Chatgpt's one-year anniversary: Are open-source large language models catching up?</article-title>
          ,
          <source>arXiv:2311.16989</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of hallucination in natural language generation</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <article-title>Explainability for large language models: A survey</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Colon-Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Havasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Huggins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Breazeal</surname>
          </string-name>
          ,
          <article-title>Combining pre-trained language models and structured knowledge</article-title>
          ,
          <source>arXiv preprint arXiv:2101.12294</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arnold</surname>
          </string-name>
          ,
          <article-title>Knowledge enhanced pretrained language models: A compreshensive survey</article-title>
          ,
          <source>arXiv:2110.08455</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Van Harmelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ten</given-names>
            <surname>Teije</surname>
          </string-name>
          ,
          <article-title>A boxology of design patterns for hybrid learning and reasoning systems</article-title>
          ,
          <source>Journal of Web Engineering</source>
          <volume>18</volume>
          (
          <year>2019</year>
          )
          <fpage>97</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. van Bekkum</given-names>
            ,
            <surname>M. de Boer</surname>
          </string-name>
          , F. van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Meyer-Vitali</surname>
          </string-name>
          , A. t. Teije,
          <article-title>Modular design patterns for hybrid learning and reasoning systems: a taxonomy, patterns</article-title>
          and use cases,
          <source>Applied Intelligence</source>
          <volume>51</volume>
          (
          <year>2021</year>
          )
          <fpage>6528</fpage>
          -
          <lpage>6546</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Meyer-Vitali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mulder</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. H. T. de Boer</surname>
          </string-name>
          ,
          <article-title>Modular design patterns for hybrid actors</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2109</volume>
          .
          <fpage>09331</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mossakowski</surname>
          </string-name>
          ,
          <article-title>Modular design patterns for neural-symbolic integration: refinement and combination</article-title>
          ,
          <source>arXiv:2206.04724</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Breit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Waltersdorfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Ekaputra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Portisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Revenko</surname>
          </string-name>
          , A. t. Teije, et al.,
          <article-title>Combining machine learning and semantic web: A systematic mapping study</article-title>
          ,
          <source>ACM Computing Surveys</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , et al.,
          <article-title>Sora: A review on background, technology, limitations, and opportunities of large vision models</article-title>
          ,
          <source>arXiv:2402.17177</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, G. Lample,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2302</volume>
          .
          <fpage>13971</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pichai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hassabis</surname>
          </string-name>
          ,
          <article-title>Introducing gemini: our largest and most capable ai model</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <article-title>Why does unsupervised pre-training help deep learning?</article-title>
          ,
          <source>in: Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR Workshop and Conference Proceedings</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>201</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P. B.</given-names>
            <surname>Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sainz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Heintz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Recent advances in natural language processing via large pre-trained language models: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Longpre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fedus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brahma</surname>
          </string-name>
          , et al.,
          <article-title>Scaling instruction-finetuned language models</article-title>
          ,
          <source>arXiv:2210.11416</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sutawika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Biderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-X.</given-names>
            <surname>Yong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schoelkopf</surname>
          </string-name>
          , et al.,
          <article-title>Crosslingual generalization through multitask ifnetuning</article-title>
          ,
          <source>arXiv:2211.01786</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chowdhery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Narang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gehrmann</surname>
          </string-name>
          , et al.,
          <article-title>Palm: Scaling language modeling with pathways</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>24</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bommasani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hudson</surname>
          </string-name>
          , E. Adeli,
          <string-name>
            <given-names>R.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Arora</surname>
          </string-name>
          , S. von Arx, M. S. Bernstein,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>