<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Model transformations using LLMs out-of-the-box: can accidental complexity be reduced?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriel Kazai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronnie Agyeiwaa Osei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Bucaioni</string-name>
          <email>alessio.bucaioni@mdu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Cicchetti</string-name>
          <email>antonio.cicchetti@mdu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Joint Proceedings of the STAF 2025 Workshops: OCL</institution>
          ,
          <addr-line>OOPSLE, LLM4SE, ICMM, AgileMDE, AI4DPS, and TTC. Koblenz</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mälardalen University</institution>
          ,
          <addr-line>Box 883, 721 23 Västerås</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0</fpage>
      <lpage>13</lpage>
      <kwd-group>
        <kwd>Model-driven engineering</kwd>
        <kwd>model transformation</kwd>
        <kwd>accidental complexity</kwd>
        <kwd>large language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Model-Driven Engineering (MDE) advances software engineering by shifting the focus from coding
to modelling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It relies on two pillars: models, which are well-defined abstractions of reality [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
and model transformations, which automate model manipulation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Models simplify complexity
by emphasizing relevant details [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], while transformations enhance automation by synchronizing
development stages, generating code, and enabling early validation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. MDE provides powerful
      </p>
      <p>ISSN1613-0073</p>
      <p>To address this problem, this paper investigates the use of LLMs for performing model transformations
out-of-the-box. Specifically, we conducted a systematic literature review and designed an experiment
to evaluate the precision of ChatGPT-4 in translating UML class diagram models into corresponding
Java programs. Our study is supported by an experimental pipeline that automates data collection,
transformation execution, and result analysis. It is important to note that this work focuses on LLMs’
ability to perform model transformations out-of-the-box, not on code generation. Translating UML
diagrams into Java programs was used solely to establish the ground truth, as described in Section 4.
Our findings show a cumulative success rate of 94% for transformed models out of 99 input cases, with
most generation errors being resolved during the process. However, the experiment also highlights
significant issues when dealing with complex models, for which the cumulative success rate drops to
only 17%. By providing a systematic approach and a publicly available replication package in Section A,
we enable the research community to experiment with ChatGPT-4 in additional transformation tasks.
Furthermore, our experimental pipeline can be refined to support alternative LLMs, modelling languages,
or datasets.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Research Process</title>
      <sec id="sec-2-1">
        <title>2.1. Systematic literature review</title>
        <p>
          We conducted an automated search of peer-reviewed literature across four major scientific databases:
IEEE Xplore, ACM Digital Library, SCOPUS, and Web of Science. To ensure both rigour and inclusiveness,
we used the search string: (”llm” OR ”large language model*”) AND (”mde” OR ”model-driven engineering”),
querying all fields without applying filters. This search yielded 35 potential studies 2. After removing
non-research articles and duplicates, we identified 10 primary studies (Table 1). These were analysed
using the guidelines of Cruzes et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. While we omit details for brevity, all search and selection data,
along with the full list of primary studies, are available in our public replication package in Section A.
A discussion of the selected studies is presented in Section 3.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Experiment</title>
        <p>
          Given the absence of prior research on using LLMs to mitigate accidental complexity in model
transformation processes, we designed and conducted an experiment to explore this hypothesis. Inspired
by recent studies on LLMs for code generation [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ], we developed a semi-automated experimental
pipeline to streamline data collection, execution, and analysis
        </p>
        <p>Dataset. The dataset needed to be independent, publicly available, and contain a large number of
UML class diagram models. To satisfy these criteria, we selected the Lindholmen dataset3, curated by</p>
        <sec id="sec-2-2-1">
          <title>2Our review does not include any work published after May 2024 3http://models-db.com</title>
          <p>the University of Rostock. This dataset links to GitHub repositories that use UML, ofering access to
over 93,000 potential UML-related artefacts4.</p>
          <p>Ground truth. To assess ChatGPT-4’s performance in translating UML class diagrams into Java
programs, we required a reliable ground truth dataset. For this, we leveraged the code generation
capabilities of Modelio, a widely recognized modelling tool5. Our choice was driven by two key
requirements: handling XMI files and automatically generating Java models. We evaluated
alternative tools such as Astah, Papyrus, ArgoUML, and Visual Paradigm, but each presented
limitations6. Using Modelio ensured that the Java programs serving as benchmarks were accurate
translations of the UML class diagrams, mitigating potential threats to construct and internal validity.</p>
          <p>Automation. The experimental pipeline Ground truth
was designed to minimize manual
intervention. To achieve this, we integrated various Dataset ChatGPT Comparison Errors
tools and technologies to automate key steps.</p>
          <p>Python scripts handled multiple automation
tasks, Modelio was used for generating ground
truth models, AutoHotKey7 automated repet- Figure 2: Simplified overview of the experimental
itive keyboard and mouse interactions, and pipeline
Beyond Compare8 facilitated automated file comparisons. This automation streamlined the workflow,
ensuring consistency and eficiency in executing the experiment.</p>
          <p>
            Reproducibility and Verifiability. To ensure that our experimental pipeline and results can be
independently replicated and verified, we provide a public replication package containing all artefacts
used in this work in Section A. Figure 2 presents a simplified overview of our experimental pipeline.
The process starts with a dataset of UML class diagram models, which are input into ChatGPT-4 for
translation into Java programs. The generated Java programs are then compared against ground truth
Java models produced by Modelio. Identified discrepancies are used to iteratively refine the
ChatGPT4 output, with a maximum of two re-prompting cycles, as previous studies suggest that additional
iterations beyond the third yield minimal improvements [
            <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
            ]. Section 4 provides a detailed discussion
on the pipeline’s definition and execution.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Threats to validity</title>
        <p>
          To mitigate threats to conclusion validity, we meticulously followed well-defined research processes
and provided a public replication package to ensure reproducibility. A key limitation is the statistical
validity of our dataset, which includes 99 models. However, the scarcity of large, high-quality datasets
for software engineering research remains a well-known challenge, and previous studies have often
relied on even smaller datasets [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Potential threats to internal validity stem from the use of Modelio
to establish ground truth. However, this risk is minimal, as Modelio is an industry-standard tool for
MDE and Java. Our selection was driven by two core requirements: XMI file handling and automatic
Java model generation. Alternative tools, such as Astah, Papyrus, ArgoUML, and Visual Paradigm,
had limitations preventing their integration into our pipeline. Additionally, while class diagrams may
not fully represent all transformation scenarios, they are fundamental to modelling, as they define a
system’s structural backbone [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. Construct validity threats relate to model selection and experimental
design. Although we ensured variation in structural complexity, our dataset may not fully reflect
the complexity of real-world transformations. Furthermore, since ChatGPT-4’s training data is not
publicly available, there is a potential risk that it may have been trained on similar UML models,
4It should be noted that these files may include various UML diagrams, not just class diagrams, and some may contain artefacts
such as images rather than serialised UML class diagrams.
5https://www.modeliosoft.com/en/products/modelio-sd-java.html
6Astah’s free version lacked XMI import support, Papyrus required file renaming and additional steps for Java generation,
ArgoUML failed to import XMI files, and Visual Paradigm lacked Java model generation capabilities. These constraints made
them unsuitable for our automated pipeline.
7https://www.autohotkey.com
8https://www.scootersoftware.com
which could influence results. Our study intentionally adopted a zero-shot approach to evaluate LLMs’
out-of-the-box capabilities, prioritising simplicity to minimise accidental complexity, though more
advanced prompting strategies (e.g., Chain of Thought, Tree of Thought) could improve performance.
External validity may be afected by the modelling and programming languages used. The dataset’s size
also poses a limitation, as curating high-quality models required a labour-intensive process involving
ChatGPT-4, Modelio, and the Lindholmen dataset. Despite these constraints, we aimed to enhance
generalisability by including models with varying structural complexities.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>This section reviews the primary studies identified through our SLR. While an exhaustive analysis is
beyond the scope of this paper, Table 1 classifies these studies based on their research focus. Notably,
no prior research has explored the use of LLMs to reduce the accidental complexity of model
transformation processes, establishing our study as a pioneering contribution. Additionally, despite the
increasing number of studies on LLM-based code generation, generating code from natural language is
fundamentally diferent from performing model transformations out-of-the-box. Consequently, we do
not include such works in this review.</p>
      <sec id="sec-3-1">
        <title>3.1. Large Language Models and Model-Driven Engineering</title>
        <p>
          Cañizares et al. explored chatbot design measurement and classification, introducing a suite of metrics
and clustering methods [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Their tool, Asymob, facilitates the translation of chatbot platforms
into a neutral notation, improving comparability. Oakes et al. examined domain-specific machine
learning workflows and identified six key challenges in workflow development [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Their study
highlighted gaps in tool support and recommended future research to reduce accidental complexity,
aligning with our study’s focus. Rajbhoj et al. investigated systematic prompting strategies for software
development life cycle tasks, validating their approach using ChatGPT [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Their results show that
generative AI can significantly reduce skill barriers in MDE, supporting our premise that LLMs can
simplify complex transformation tasks. Tamenaoul et al. developed a meta-model for user prompts,
formalizing several prompt engineering patterns [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Similarly, Clariso et al. proposed Impromptu, a
domain-specific language for platform-independent prompt generation and adaptation [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Qasse et
al. explored chatbots as an interactive alternative for MDE, developing a framework that generates
platform-independent code from conversational inputs, with a focus on smart contracts [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Their
ifndings suggest that chatbot-based development can improve accessibility in software engineering.
Petrović et al. investigated ChatGPT for automating smart contract generation, proposing a
modeldriven framework for treating smart contract creation as an interactive dialogue [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Their evaluation
highlighted ChatGPT’s adaptability but also noted challenges such as increased response times and
costs. Arulmohan et al. examined LLMs for extracting domain models from textual documents like
product backlogs [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. They compared GPT-3.5 against a state-of-the-practice tool and a CRF-based
NLP approach, finding that while GPT-3.5 outperformed standard tools, the CRF approach achieved
higher accuracy with minimal training. Chen et al. developed a framework for automated taxonomy
construction, comparing LLM prompting with fine-tuning [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. Their results showed that prompting
often outperforms fine-tuning, particularly for smaller datasets, though fine-tuning allows easier
post-processing. Chen et al. also explored the automation of domain modelling using LLMs [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
While GPT-3.5 and GPT-4 demonstrated strong understanding capabilities, they struggled with full
automation, achieving only moderate F1 scores in class, attribute, and relationship generation. This
complements our study, as it focuses on automating one pillar of MDE—modelling—while we focus on
model transformations. In addition to the works identified through our opportunistic SLR, recent studies
have begun exploring the use of LLMs for model transformation tasks, for example, the transformation
of UML state diagrams into Rebeca models using a few-shot learning approach [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Simplifying Model Transformations</title>
        <p>
          Over the years, many studies have aimed to make model transformations more accessible to users
unfamiliar with transformation languages and related technologies. One of the most notable approaches
is Varrò’s transformation by example method [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. This method generalizes XML transformation
techniques for model transformations, deriving transformation rules from initial interrelated source
and target models. These rules can be refined iteratively by adding more source-target model pairs,
eliminating the need to learn a dedicated transformation language. Building on Varrò’s work, Kappel
et al. surveyed early Model Transformation By-Example approaches [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. Kessentini et al. extended
this method with an optimization-based approach, using search-based algorithms (particle swarm
optimization and simulated annealing) to determine the best transformation fragment combinations [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ].
Among AI-driven solutions, Burgueño et al. proposed a neural network-based model transformation
approach [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. Their encoder-decoder LSTM architecture with an attention mechanism learns
transformation patterns from input-output examples, automatically generating transformation outputs once
trained.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental pipeline</title>
      <p>This section presents the experimental pipeline, aligning with the requirements outlined in Section 2. We
detail its configuration, including the tools, technologies, and artefacts used, all of which are available
in our public replication package in Section A.The pipeline consists of four main phases: cleaning,
ground truth generation, generation, and comparison. Figure 3 provides a comprehensive overview,
with execution flow indicated by black-circled numbers. We initiated our experimental pipeline with the
publicly available Lindholmen dataset, which contained 93,608 files, including 3,722 XMI UML diagrams.
To ensure data quality and manageability, we performed a thorough cleaning process (black-circled 1 in
Figure 3). This involved removing duplicates, inaccessible files, incompatible character sets, and files
exceeding 200KB—beyond the input limit of ChatGPT-4’s web version. We also excluded corrupted
ifles that could not be imported into Modelio, as they would prevent ground truth generation. After
this screening, 99 usable XMI files remained, forming our initial dataset of textual UML class diagram
representations. We imported the XMI files into Modelio to generate the corresponding Java programs,
serving as the ground truth (black-circled 2 in Figure 3). During this process, some elements—such as
names or attributes—were occasionally missing due to incomplete XMI data or Modelio’s interpretation.
In such cases, we inserted custom strings to ensure Java grammar compliance. Once exported, Modelio
generated each class as a separate file. To streamline comparisons, we used a Python script to merge
these files into a single Java file per XMI instance, preserving their content.</p>
      <p>Notably,
tion code,
this
which</p>
      <sec id="sec-4-1">
        <title>Java output serves would be further as a model for the eventual implementarefined or manually written in later stages.</title>
      </sec>
      <sec id="sec-4-2">
        <title>To minimize threats to validity, we fed</title>
        <p>
          ilngaenC lindPhLyoitCnhdlmdloaehnteaaonnssl1mceedrdtaeiptnatsset (XdGiMeaUngICersMhaeraamLrtitaGcemlldai3PsosJTadsateivolasn) itraeonenG [attem7pts &lt;= 3]
(icCMnbrhlteoaeapdcrtaepkGnlr-iPeccotTiiaerr-ctsa4ilotebhndewestiwr.2tWhtehieneanXtnhMFCeiuhgInsuaipfiltnreGregsoPm3rTea)px-,w4tperoaeddrndatCduethadcMasinetfoGtgrdoPfileemdTlsii-so4ttrrnduouhG itreanegon GrMo(Juoandvdea2lt)iro uth 2 BeyonRdeCpoo4rmtpare 5 AEnrarloyrssis 6 irsoaponCm iflitsenoessrgti(rebuanlclaeticrizokaan-tect:iirJcoalnveadoTpfhre3oagfirnUoaMlmFLlisogCwuflirroeanmsg3s)ithDuseissiaanegngrXaXtMmhMIeI
Automatic Manual auStoemmaitic Artefact Tool Jmaovdaelp.roGgernaemr.atDeotnhoetciomrprleesmpeonntdignegt
and set functions. Do not add any
Figure 3: Detailed overview of the experimental comments. To automate this process, we used
pipeline AutoHotKey for interacting with ChatGPT-4
via pre-programmed inputs and formatting responses. AutoHotKey also facilitated storing the
generated outputs in our replication package. After generating the Java programs, we compared
the ChatGPT-4-generated Java programs with the ground truth, Java programs created by Modelio
(black-circled 4 in Figure 3). It is important to clarify that we do not compare ChatGPT-4 with Modelio
regarding the transformation process. Instead, we used Modelio solely to generate the ground truth
Java models. Before this comparison, we took additional steps to format the programs and eliminate
impurities that might lead to false positives. For example, we removed IDs that Modelio adds to
the Java programs using a python script or fixed the formatting using AutoHotKey. Additionally,
we standardised the formatting by employing the Language Support for Java extension developed
by Red Hat in Visual Studio Code, which helped us eliminate extra spaces and correct indentation
errors. For the comparison, we employed Beyond Compare, a tool that facilitates side-by-side
comparison of two files. It reads the files and highlights the diferences in a detailed comparison
report. After generating the comparison report with Beyond Compare, we manually analysed it
(black-circled 5 in Figure 3) to filter out irrelevant diferences, such as capitalization or ordering
variations, while identifying substantial discrepancies like missing classes, incorrect attributes, and
erroneous cardinality. We logged all significant diferences in a separate error log (black-circled
6 in Figure 3) and used this to create a refined prompt for a subsequent ChatGPT-4 generation
attempt (black-circled 7 in Figure 3) similar to this: In the previous response, the following
errors were discovered. The attribute readWriteSingleValuedEnumerationAttribute
is missing an enum instance. The function opParameters has the wrong return type.
Regenerate the response fixing the errors above. This cycle was repeated up to two times,
as previous research indicates that additional attempts rarely produce further improvements [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>This section presents our study’s findings, starting with the success rate of the ChatGPT-4 transformation
process and its variation with the structural complexity of UML class diagram models. Additionally, we
identify and discuss the most common errors encountered during transformation.</p>
      <sec id="sec-5-1">
        <title>5.1. Success rate</title>
        <p>In our experiment, we implemented the semi-automatic pipeline described in Section 2 and applied it
to 99 UML class diagram models, allowing up to three iterations. Table 2 summarises the outcomes,
reporting the absolute number of successfully and unsuccessfully transformed models, the single success
rate per iteration, and the cumulative success rate (total percentage of models successfully transformed
up to the given iteration.). In the first iteration, 67 out of 99 models were successfully transformed (67%
single and cumulative success rate). The second iteration transformed 21 additional models (66% single
success rate), raising the cumulative success rate to 89%. In the third and final iteration, 5 more models
were transformed (45% single success rate), bringing the cumulative success rate to 94%. Table 2 also
notes that 6 models remained untransformed after three iterations.</p>
        <p>
          We categorized the XMI files by structural complexity to evaluate ChatGPT-4’s performance
across diferent model types. First, we quantified complexity by counting the various structural
elements—Classes, Primitive Types, Enumerations, Interfaces, and Associations—in each XMI file and
summing them. This metric revealed a wide range of complexity, from a few elements to over 80.
To reduce categorization bias, we
applied the K-means clustering
algorithm [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] with  = 3 based on 60 57
the total number of elements. This 50 47
yielded three clusters: low complex- 40 36
ity models with up to 9 elements (57 30
weemllissot)hdw,emailtsnho)d,r1em0hteihtgodahin3u6cm3o6emlceeolpmemlmeepxnelitnetsxyts(i3tm(y66ommmdoooeddlds--- 12000 9 1 0 19 12 4 1 6 1 0 0 5
els). Low Medimum High
        </p>
        <p>Figures 4 and 5 show the transfor- 2Tnodtailteration absolute value 13rsdt iitteerraattiioonn aabbssoolluuttee vvaalluuee
mation success rates for each clus- Failed
ter. In the low complexity group, 47
out of 57 models (82%) were success- Figure 4: Bar chart showing the total number of models per
fully transformed in the first itera- cluster, the absolute number of model successfully
tion. A further 9 models succeeded transformed in each iteration, and the absolute
numin the second iteration (90% success ber of failed models.
for that round), boosting the
cumulative rate to 98%, and the final model succeeded in the third iteration, reaching a 100% cumulative
success rate. In the medium complexity group, 19 of 36 models (53%) transformed in the first iteration.
An additional 12 models succeeded in the second iteration (71% for that round), raising the cumulative
rate to 87%, and 4 more models succeeded in the third iteration (80% for that round), with a final
cumulative success rate of 97%. One model, however, failed to transform after three iterations. For the
high complexity group, only 1 of 6 models (17%) transformed in the first iteration. Due to this low rate,
we conducted further analysis, which indicated that a high number of associations—specifically, more
than 40—was correlated with transformation failures, while the number of classes showed no direct
correlation.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Errors</title>
        <p>During the three executions, we collected all significant diferences that emerged from comparing
the ChatGPT-4-generated Java programs with the ground truth Java programs created by Modelio, as
described in Section 4. We then categorised these diferences into error types, which are summarised in
Table 3, along with the count of individual occurrences.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>
        Choosing UML and Java notations may be seen as restrictive, particularly since our case focuses
on model-to-text transformations [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. While Czarnecki and Helsen treated model-to-model and
model-to-text transformations uniformly, diferentiating them mainly by the availability of mature tool
support [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], focusing on a single transformation type may limit the broader applicability of our findings.
However, UML class diagrams hold a special role in modelling as the most widely used and structurally
significant diagrams [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], suggesting a degree of generalizability. Additionally, their selection was
instrumental in ensuring a well-sized dataset and ground truth models, reducing potential threats to
validity (see Section 2). The lack of large-scale benchmark datasets for software engineering and MDE
research remains a well-documented challenge [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ].
      </p>
      <p>
        Our experimental pipeline, based on prior studies [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], establishes ground-truth models using
Modelio, an industry-standard tool, but allows for substitution with other tools if needed. The choice of
a zero-shot learning strategy may be viewed as a limitation. While advanced prompting techniques like
Chain of Thought and Tree of Thought could improve reasoning, we prioritized a zero-shot approach
to assess LLMs’ out-of-the-box capabilities, minimizing additional complexity. Few-shot learning could
further improve success rates, particularly for models with more than 36 elements. Additionally, our
experiment shows that just 12 error types emerged during generation, with four accounting for the
majority. Addressing these through refined prompting could improve both single and cumulative
success rates. Our decision to limit the process to three iterations aligns with prior research [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
Our findings suggest that ChatGPT-4 can help reducing accidental complexity in model transformation
by eliminating the need for transformation languages and related tools. However, it has limitations
in handling complex models with many associations. Notably, even for a widely used language like
Java—on which ChatGPT-4 has likely been trained—generating fully correct target models remains
challenging.
      </p>
      <p>
        As a pioneering study on LLM-driven model transformation, some aspects beyond this work warrant
further investigation. Transformation testing remains an open research area, particularly beyond
functional testing [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. Our black-box testing approach, based on an oracle, could be extended with
mutation techniques to improve evaluation. More advanced checks, such as semantic correctness,
would require explainability features within LLMs.
      </p>
      <p>Our findings have promising implications for MDE and automated software engineering. They show
the potential to simplify model transformations, making them more accessible to non-experts. Notably,
all generated target models conformed to the target metamodel, and in some cases, ChatGPT-4 correctly
inferred missing information, such as attribute names, improving resilience against domain evolution.
However, LLMs are not yet mature enough for unsupervised transformation tasks, particularly with
structurally complex models. While they significantly reduce accidental complexity, mechanisms are
needed to check for errors and omissions in generated results. Few-shot and multi-shot learning may
improve performance for complex models, but systematic research is required to characterize and
address potential generation issues. Interestingly, adopting a model transformation chain, as in this
study, may simplify such analysis compared to tasks like generating code from natural language, as
intermediate steps provide better traceability of patterns.</p>
      <p>In conclusion, LLMs like ChatGPT-4 present a novel alternative to traditional DSL-based model
transformations. While DSLs ensure precise, deterministic, and reproducible transformations with
mature debugging tools, they require significant expertise and upfront efort. In contrast, LLMs
lower entry barriers, enabling natural language interaction and flexibility, but they lack the precision,
transparency, and scalability of DSLs, particularly for complex models. LLMs excel in adaptability,
handling evolving requirements and inferring missing details, yet their black-box nature complicates
error tracing. DSLs remain preferable for high-precision, repeatable transformations. The choice
between LLMs and DSLs depends on the use case: LLMs enable rapid prototyping and reduce accidental
complexity, while DSLs ofer fine-grained control. Future research should explore hybrid approaches,
integrating DSL precision with the accessibility and adaptability of LLMs.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion and Future Work</title>
      <p>Many studies have aimed to reduce the accidental complexity in model transformation processes, yet
no research prior to April 2024 has systematically explored the use of LLMs for performing model
transformations out of the box. This work investigates ChatGPT-4’s potential to address this challenge.
We conducted a systematic literature review and designed an experiment to assess ChatGPT-4’s
efectiveness in automating model transformations. Using a semi-automated pipeline, we applied ChatGPT-4
to 99 UML class diagram models, generating Java programs and comparing them against ground truth
programs from a state-of-the-art modeling tool. Our findings indicate a cumulative success rate of
94% after three iterations, with most errors resolved. However, complex models remained a challenge,
achieving a cumulative success rate of only 17%.</p>
      <p>
        Future research should explore LLMs in diferent transformation scenarios, including model-to-model
and model-to-text transformations. Expanding the dataset with more diverse models and leveraging
few-shot and multi-shot learning strategies could improve success rates, particularly for complex models.
Additionally, developing advanced error handling and correction mechanisms [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] will be essential to
enhancing accuracy and reliability.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This research work has been funded by Vinnova through the iSecure (202301899) and AIDA (202402068)
projects, and by the KDT Joint Undertaking through the MATISSE project (101140216).</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Chat-GPT-4 to support the improvement of
grammar and spelling checking. After using these tool(s)/service(s), the author(s) reviewed and edited
the content as needed and take(s) full responsibility for the publication’s content.</p>
      <sec id="sec-9-1">
        <title>The replication package is</title>
        <p>ModelTransformationWithLLMs-FD3F/
available
at:</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , et al.,
          <article-title>Model-driven engineering</article-title>
          , Computer-IEEE Computer Society-
          <volume>39</volume>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bézivin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Gerbé</surname>
          </string-name>
          ,
          <article-title>Towards a precise definition of the omg/mda framework</article-title>
          ,
          <source>in: Proceedings 16th Annual International Conference on Automated Software Engineering (ASE</source>
          <year>2001</year>
          ), IEEE,
          <year>2001</year>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sendall</surname>
          </string-name>
          , W. Kozaczynski,
          <article-title>Model transformation: The heart and soul of model-driven software development</article-title>
          ,
          <source>IEEE software 20</source>
          (
          <year>2003</year>
          )
          <fpage>42</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucaioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cicchetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciccozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mubeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sjödin</surname>
          </string-name>
          ,
          <article-title>A metamodel for the rubus component model: extensions for timing and model transformation from east-adl, IEEE Access 5 (</article-title>
          <year>2016</year>
          )
          <fpage>9005</fpage>
          -
          <lpage>9020</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucaioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mubeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cicchetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sjödin</surname>
          </string-name>
          ,
          <article-title>Exploring timing model extractions at east-adl design-level using model transformations</article-title>
          ,
          <source>in: 2015 12th international conference on information technology-new generations, IEEE</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>595</fpage>
          -
          <lpage>600</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Liebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Marko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tichy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Leitner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hansson</surname>
          </string-name>
          ,
          <article-title>Assessing the state-of-practice of model-based engineering in the embedded systems domain</article-title>
          ,
          <source>in: Model-Driven Engineering Languages and Systems: 17th International Conference, MODELS</source>
          <year>2014</year>
          , Valencia, Spain,
          <source>September 28-October 3</source>
          ,
          <year>2014</year>
          . Proceedings 17, Springer,
          <year>2014</year>
          , pp.
          <fpage>166</fpage>
          -
          <lpage>182</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mohagheghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gilani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stefanescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nordmoen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fritzsche</surname>
          </string-name>
          ,
          <article-title>Where does model-driven engineering help? experiences from three industrial cases</article-title>
          ,
          <source>Software &amp; Systems Modeling</source>
          <volume>12</volume>
          (
          <year>2013</year>
          )
          <fpage>619</fpage>
          -
          <lpage>639</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucaioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cicchetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciccozzi</surname>
          </string-name>
          ,
          <article-title>Modelling in low-code development: a multi-vocal systematic review</article-title>
          ,
          <source>Software and Systems Modeling</source>
          <volume>21</volume>
          (
          <year>2022</year>
          )
          <fpage>1959</fpage>
          -
          <lpage>1981</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Di Ruscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kolovos</surname>
          </string-name>
          , J. de Lara,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pierantonio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wimmer</surname>
          </string-name>
          ,
          <article-title>Low-code development and model-driven engineering: Two sides of the same coin?</article-title>
          ,
          <source>Software and Systems Modeling</source>
          <volume>21</volume>
          (
          <year>2022</year>
          )
          <fpage>437</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Crnkovic</surname>
          </string-name>
          ,
          <article-title>Constructive research and info-computational knowledge generation</article-title>
          ,
          <source>in: Model-Based Reasoning in Science and Technology</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>359</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lukka</surname>
          </string-name>
          ,
          <article-title>The constructive research approach, Case study research in logistics</article-title>
          .
          <source>Publications of the Turku School of Economics and Business Administration, Series B</source>
          <volume>1</volume>
          (
          <year>2003</year>
          )
          <fpage>83</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kitchenham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brereton</surname>
          </string-name>
          ,
          <article-title>A systematic review of systematic review process research in software engineering, Information and software technology (</article-title>
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Cruzes</surname>
          </string-name>
          , T. Dyba,
          <article-title>Recommended steps for thematic synthesis in software engineering</article-title>
          , in: Procs of
          <string-name>
            <surname>ESEM</surname>
          </string-name>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lertbanjongngam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chinthanet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ishio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. G.</given-names>
            <surname>Kula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Leelaprute</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Manaskasemsak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rungsawang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Matsumoto</surname>
          </string-name>
          ,
          <article-title>An empirical evaluation of competitive programming ai: A case study of alphacode, 2022</article-title>
          . ArXiv preprint arXiv:
          <volume>2208</volume>
          .
          <fpage>08603</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucaioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ekedahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Helander</surname>
          </string-name>
          , P. T. Nguyen,
          <article-title>Programming with chatgpt: How far can we go?</article-title>
          ,
          <source>Machine Learning with Applications</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>100526</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          ,
          <article-title>Towards automatically extracting uml class diagrams from natural language specifications</article-title>
          ,
          <source>in: Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>396</fpage>
          -
          <lpage>403</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Combemale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Rumpe</surname>
          </string-name>
          ,
          <article-title>Model-based code generation works: But how far does it go?-on the role of the generator</article-title>
          ,
          <source>Software and Systems Modeling</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Canizares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>López-Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pérez-Soler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Guerra</surname>
          </string-name>
          , J. de Lara,
          <article-title>Measuring and clustering heterogeneous chatbot designs</article-title>
          ,
          <source>ACM Transactions on Software Engineering and Methodology</source>
          <volume>33</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B. J.</given-names>
            <surname>Oakes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Famelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          ,
          <article-title>Building domain-specific machine learning workflows: A conceptual framework for the state of the practice</article-title>
          ,
          <source>ACM Transactions on Software Engineering and Methodology</source>
          <volume>33</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rajbhoj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Somase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <article-title>Accelerating software development using generative ai: Chatgpt case study</article-title>
          ,
          <source>in: Proceedings of the 17th Innovations in Software Engineering Conference</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tamenaoul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Hamlaoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nassar</surname>
          </string-name>
          ,
          <article-title>Prompt engineering: User prompt meta model for gpt based models</article-title>
          ,
          <source>in: The International Conference on Artificial Intelligence and Smart Environment</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>428</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>I.</given-names>
            <surname>Qasse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            þór
            <surname>Jónsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Khomh</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Hamdaqa, Chat2code: A chatbot for model specification and code generation, the case of smart contracts</article-title>
          ,
          <source>in: 2023 IEEE International Conference on Software Services Engineering (SSE)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N.</given-names>
            <surname>Petrović</surname>
          </string-name>
          , I. Al-Azzoni,
          <article-title>Model-driven smart contract generation leveraging chatgpt</article-title>
          ,
          <source>in: International Conference On Systems Engineering</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>387</fpage>
          -
          <lpage>396</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Clarisó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cabot</surname>
          </string-name>
          ,
          <article-title>Model-driven prompt engineering</article-title>
          ,
          <source>in: 2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Arulmohan</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-J. Meurs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Mosser</surname>
          </string-name>
          ,
          <article-title>Extracting domain models from textual requirements in the era of large language models</article-title>
          ,
          <source>in: 2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems Companion</source>
          <volume>(</volume>
          <string-name>
            <surname>MODELS-C)</surname>
          </string-name>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>580</fpage>
          -
          <lpage>587</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Varró</surname>
          </string-name>
          ,
          <article-title>Prompting or fine-tuning? a comparative study of large language models for taxonomy construction</article-title>
          , in: 2023 ACM/IEEE International Conference on Model Driven Engineering Languages and Systems
          <string-name>
            <surname>Companion (MODELS-C)</surname>
          </string-name>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>588</fpage>
          -
          <lpage>596</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A. H.</given-names>
            <surname>López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Mussbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Varró</surname>
          </string-name>
          ,
          <article-title>Automated domain modeling with large language models: A comparative study</article-title>
          ,
          <source>in: 2023 ACM/IEEE 26th International Conference on Model Driven Engineering Languages and Systems (MODELS)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>162</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Moezkarimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eriksson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Johansson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucaioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sirjani</surname>
          </string-name>
          ,
          <article-title>Harnessing chatgpt for model transformation in software architecture: From uml state diagrams to rebeca models for formal verification</article-title>
          , in: 4th International Workshop of Model-Driven Engineering for Software Architecture, ???? URL: http://www.es.mdu.se/publications/7130-.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>D.</given-names>
            <surname>Varró</surname>
          </string-name>
          ,
          <article-title>Model transformation by example</article-title>
          ,
          <source>in: Model Driven Engineering Languages and Systems: 9th International Conference, MoDELS</source>
          <year>2006</year>
          , Genova, Italy, October 1-
          <issue>6</issue>
          ,
          <year>2006</year>
          . Proceedings 9, Springer,
          <year>2006</year>
          , pp.
          <fpage>410</fpage>
          -
          <lpage>424</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kappel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Langer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Retschitzegger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Schwinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wimmer</surname>
          </string-name>
          ,
          <article-title>Model transformation by-example: a survey of the first wave, Conceptual Modelling and Its Theoretical Foundations: Essays Dedicated to Bernhard Thalheim on the Occasion of His 60th Birthday (</article-title>
          <year>2012</year>
          )
          <fpage>197</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kessentini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Boukadoum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. B.</given-names>
            <surname>Omar</surname>
          </string-name>
          ,
          <article-title>Search-based model transformation by example</article-title>
          ,
          <source>Software &amp; Systems Modeling</source>
          <volume>11</volume>
          (
          <year>2012</year>
          )
          <fpage>209</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>L.</given-names>
            <surname>Burgueno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gérard</surname>
          </string-name>
          ,
          <article-title>A generic lstm neural network architecture to infer heterogeneous model transformations</article-title>
          ,
          <source>Software and Systems Modeling</source>
          <volume>21</volume>
          (
          <year>2022</year>
          )
          <fpage>139</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>A.</given-names>
            <surname>Likas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vlassis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <article-title>The global k-means clustering algorithm</article-title>
          ,
          <source>Pattern recognition 36</source>
          (
          <year>2003</year>
          )
          <fpage>451</fpage>
          -
          <lpage>461</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>K.</given-names>
            <surname>Czarnecki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Helsen</surname>
          </string-name>
          , et al.,
          <article-title>Classification of model transformation approaches</article-title>
          ,
          <source>in: Proceedings of the 2nd OOPSLA Workshop on Generative Techniques in the Context of the Model Driven Architecture</source>
          , volume
          <volume>45</volume>
          , USA,
          <year>2003</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>G.</given-names>
            <surname>Reggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leotta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ricca</surname>
          </string-name>
          ,
          <article-title>Who knows/uses what of the uml: A personal opinion survey</article-title>
          ,
          <source>in: Model-Driven Engineering Languages and Systems: 17th International Conference, MODELS</source>
          <year>2014</year>
          , Valencia, Spain,
          <source>September 28-October 3</source>
          ,
          <year>2014</year>
          . Proceedings 17, Springer,
          <year>2014</year>
          , pp.
          <fpage>149</fpage>
          -
          <lpage>165</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bozyigit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Bardakci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khalilipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Challenger</surname>
          </string-name>
          , G. Ramackers, Ö. Babur,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Chaudron</surname>
          </string-name>
          ,
          <article-title>Generating domain models from natural language text using nlp: a benchmark dataset and experimental comparison of tools, Software and Systems Modeling (</article-title>
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>J.</given-names>
            <surname>Troya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Segura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Burgueño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wimmer</surname>
          </string-name>
          ,
          <article-title>Model transformation testing and debugging: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          . URL: http://dx.doi.org/10.1145/3523056. doi:
          <volume>10</volume>
          .1145/3523056.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bucaioni</surname>
          </string-name>
          , G. Gualandi,
          <string-name>
            <given-names>J.</given-names>
            <surname>Toma</surname>
          </string-name>
          ,
          <article-title>Benchmarking large language models for autonomous run-time error repair: Toward self-healing software systems</article-title>
          ,
          <source>in: International Conference on Evaluation and Assessment in Software Engineering</source>
          ,
          <year>2025</year>
          . URL: http://www.es.mdu.se/publications/7185-.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>