<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>DPS, and TTC. Koblenz, Germany,
June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Weixing Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Regina Hebig</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Strüber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chalmers University of Technology and University of Gothenburg</institution>
          ,
          <addr-line>Hörselgången 5, 417 56 Göteborg</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Radboud University</institution>
          ,
          <addr-line>Toernooiveld 212, 6525 EC Nijmegen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universität Rostock</institution>
          ,
          <addr-line>Albert-Einstein-Straße 22, 18059 Rostock</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>Software languages evolve over time for various reasons, such as the addition of new features. When the language's grammar definition evolves, textual instances that originally conformed to the grammar become outdated. For DSLs in a model-driven engineering context, there exists a plethora of techniques to co-evolve models with the evolving metamodel. However, these techniques are not geared to support DSLs with a textual syntax - applying them to textual language definitions and instances may lead to the loss of information from the original instances, such as comments and layout information, which are valuable for software comprehension and maintenance. This study explores the potential of Large Language Model (LLM)-based solutions in achieving grammar and instance co-evolution, with attention to their ability to preserve auxiliary information when directly processing textual instances. By applying two advanced language models, Claude-3.5 and GPT-4o, and conducting experiments across seven case languages, we evaluated the feasibility and limitations of this approach. Our results indicate a good ability of the considered LLMs for migrating textual instances in small-scale cases with limited instance size, which are representative of a subset of cases encountered in practice. In addition, we observe significant challenges with the scalability of LLM-based solutions to larger instances, leading to insights that are useful for informing future research.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Co-Evolution</kwd>
        <kwd>textual DSLs</kwd>
        <kwd>LLM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Domain-specific languages (DSLs) are useful tools to describe and solve problems in a specific application
domain. As domain knowledge evolves and requirements change, DSLs often need to evolve accordingly
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For example, features may be added and existing functionality may be adjusted, leading to a need
to update the definition of the DSL, to introduce new language constructs and modify existing ones.
When the definition of a DSL evolves, existing instances face challenges: they may contain constructs
that no longer conform to the new definition and require appropriate modification, or they may need to
additions to support newly introduced required language elements [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. While the model-driven
engineering community has developed numerous approaches for metamodel-instance co-evolution [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
these works are generally focused on metamodel-based language definitions, usually in the context of
graphical DSLs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In practice, there is an ongoing trend towards textual DSLs, which can emulate the
look and feel of familiar general-purpose languages and are easy to integrate with standard developer
tools for versioning, diferencing, and merging. These textual DSLs, developed in frameworks like Xtext,
Langium, and textX, are technically defined through grammars and instantiated as textual instances.
      </p>
      <p>
        Dedicated approaches to co-evolving textual instances are scarce. One possible way to address
co-evolution of textual instances is by using the available metamodel-based approaches. To that end, the
original instance needs to be parsed into the form of a model and transformed back into textual form after
the model is co-evolved. However, this approach leads to information loss: during the transformation
process between textual instance and model, auxiliary information in the original instances, such as
code comments and formatting styles, cannot be retained [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ][
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. While this information does not afect
program functionality, it serves a critical purpose during tasks such as code maintenance, debugging,
and understanding design intent [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Hence, there is arguably a need to preserve such information
during the co-evolution of instances.
      </p>
      <p>
        In recent years, Large Language Models (LLMs) have demonstrated exceptional capabilities in code
understanding, transformation, and generation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These models not only perform well at tasks
that require understanding of code structure, but also capture contextual information like comments. As
such, they seem a particularly well-suited for addressing the co-evolution problem for textual languages.
      </p>
      <p>
        In this paper, we investigate the use of LLMs to support the co-evolution of grammar definitions and
instances for textual DSLs. We focus on grammar definitions and instances developed using Xtext [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a
framework that is rooted in the Eclipse ecosystem and is particularly widely used in the MDE community.
We harness our recently published dataset on Xtext-based language evolution cases retrieved from
GitHub [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which allowed us to identify a collection of critical real-life cases in which automated
support for co-evolution would be desirable. We address and answer the following the following
research questions:
RQ1: How can we use LLMs to automate the co-evolution of instances with evolving Xtext grammars?
      </p>
      <p>To address this RQ, we explored an approach that uses LLMs to analyze diferences between original
and evolved grammars, and generate evolved instances that conform to the new grammar while
preserving auxiliary information. We implemented this approach using GPT-4o and Claude-3.5, with a
dedicated prompt and automated workflow.</p>
      <p>We evaluate the two LLM-based solutions on seven case languages, focusing on the following research
questions: RQ2: How does instance size afect the capability of LLMs to produce correct solutions
when co-evolving textual instances? RQ3: How does instance size afect LLMs’ capability in preserving
auxiliary information during DSL co-evolution?</p>
      <p>The contribution of this paper is an exploration of the potential of LLM for the co-evolution of
grammar and instances of DSLs, including auxiliary information in instances such as comments. We
evaluated capability diferences between two mainstream LLMs in DSL co-evolution tasks and analyzed
the advantages and limitations of purely LLM-based DSL co-evolution methods, providing direction for
future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem Description</title>
      <p>
        Xtext is an Eclipse-based framework for developing languages1. In Xtext projects, there is a close
relationship between grammar and metamodel: when the grammar evolves, a corresponding new
metamodel can be automatically generated from the new grammar, and vice versa. In our previous
work, we showed how the customized adaptations in the original grammar can be preserved [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ][
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to
complete the grammar co-evolution. However, after the grammar evolves, the instances that originally
conformed to it may no longer conform to it. It would be possible to use existing techniques to co-evolve
models with evolving metamodels. For that, the textual instance can be parsed using the original
grammar to gain the model representation (i.e., a .xmi file) that conforms to the original metamodel.
Then existing model migration techniques (such as EMFMigrate [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]) can be used to transform this
model into a model that conforms to the new metamodel. Finally, we can transform the migrated model
back into a textual instance that conforms to the new grammar.
      </p>
      <p>However, in this process, the auxiliary information (e.g., comments and code formatting) in the
original instance is discarded during the transformations. As an example, there is a “15 minutes tutorial”2
on the oficial Xtext website which provides an example called Domainmodel, which includes two
versions of grammar and their corresponding instances, where the second version adds five grammar
1https://eclipse.dev/Xtext/index.html
2https://eclipse.dev/Xtext/documentation/102_domainmodelwalkthrough.html
rules. Consider a slightly modified version of the instance from the tutorial example with auxiliary
information in diferent places in the instance, shown in Listing 1. Line 10 is empty, which, in the
case of entities with many more attributes, is a useful way to group them. The definition of instance
HasAuthor has been compressed from originally four lines to a single line (line 14) by removing
whitespace, making the overall instance more compact and easier to overview. Line 19 uses comments
as a way to discard a part from the instance that might potentially be included again at a later time
(outcommenting). Line 24 contains an additional comment in a style commonly used to add rationale
and context to individual statements. More subtlely, lines 9 and 11 use a diferent type of indentation
than the rest of the instance, based on tabs, which could be the result of an ongoing manual review and
refactoring.</p>
      <p>Listing 1: Instance conforming to grammar before
evolution.
/**
* This is the example before the evolution.
* This is the header.
* */
datatype String
/* this is the first comment, added by me */
entity Blog {
title: String
many posts: Post
}
entity HasAuthor { author: String }
entity Post extends HasAuthor {
title: String
content: String
//many comment: Comment
many comments: Comment
}
entity Comment extends HasAuthor {</p>
      <p>content: String // this is the second comment
}
Listing 2: Co-evolved instance following grammar
after evolution. Layout and comments are lost.
datatype String;
entity Blog {
title: String,
many posts: Post
}
entity HasAuthor {</p>
      <p>author: String
}
}
}
entity Post extends HasAuthor {
title: String,
content: String,
many comments: Comment
entity Comment extends HasAuthor {
content: String</p>
      <p>Traditional co-evolution approaches in the MDE sphere focus on co-evolution on the abstract syntax
level, i.e., the impact of new and changed meta-model classes and relationships to model elements.
Concrete syntax information without an abstract syntax counterpart – that is, auxiliary information
such as comments and whitespace – is not covered, and hence lost in the process. This is illustrated by
Listing 2 showing the result of applying such an approach to the exam instance, which leads to the loss
of all comments and formatting information.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>
        DSL co-evolution. Hebig et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] present a survey of approaches to co-evolve models with evolving
metamodels, summarizing a multitude of approaches ranging from languages specialized to the
automated generation of model transformations for dealing with non-breaking changes [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], to approaches
that allow the definition of migration strategies, such as EMFMigrate[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and Epsilon Flock [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
Tolvanen et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] proposed a framework for evaluating tool support for DSL co-evolution. but primarily
focuses on graphical DSLs and evaluates tool-level support capabilities. In contrast, our work focuses on
textual DSLs, exploring the potential of LLMs as a novel technical approach for textual DSL evolution
challenges, particularly emphasizing the preservation of auxiliary information such as comments and
layout during co-evolution.
      </p>
      <p>
        Application of LLM in MDE. Di Rocco et al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] conducted a systematic review of LLM applications
in Model-Driven Engineering (MDE). They analyzed the current state of LLM applications in tasks
such as model completion, generation, and evolution and proposed a technical framework to guide
LLM adoption in MDE. Although their research focuses on model-level evolution, their methodological
framework, particularly the insights into prompt engineering design, provides a valuable reference
for our handling of textual instance evolution. Kebaili et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] explored the use of Large Language
Models to address the co-evolution of code impacted by metamodel evolution. They proposed a prompt
engineering-based approach which achieved an accuracy of 88.7%, reaching 95.2% for complex change
scenarios across seven Eclipse projects. While their work focuses on co-evolution between metamodels
and generated code, our study addresses co-evolution between grammar definitions and textual instances.
Although the research objectives difer, their results confirm the potential of LLMs in handling software
artifact evolution problems, which aligns with our findings.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>The research methodology consists of three major steps: First, we selected seven diverse DSLs as
evaluation objects. We then designed an LLM-based method to co-evolve grammar and instances and
implemented it as two solutions based on two LLMs (i.e., Claude-3.5 and GPT-4o). Third, we applied
these two solutions to the selected case languages and analyze the results to evaluate their co-evolution
potential. The following subsections detail each step.</p>
      <sec id="sec-4-1">
        <title>4.1. Case Language Selection</title>
        <p>
          We searched for case languages from an available dataset [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], since this dataset is specifically dedicated
to Xtext-based DSLs. We limited our selection to repositories that contain both Xtext files and instance
ifles. From the commit records of the repository, we can see that the Xtext files and instance files
contained may have many commits. For example, in the language elite-se.xtext.languages.plantuml, the
grammar file PlantUML.xtext has 63 commits. For each language, we decided to pick a grammar from
the commit that is closest to the present and ensure that there must be an instance that complies with
the grammar, otherwise, we will look for the grammar in earlier commits. The found grammar is the
evolved grammar. Then, we continue to look for a grammar that difers from this version in earlier
commits, and which has instances that comply with it. We identified seven case languages from the
dataset. Their basic information is shown in Table 1, and the grammars before/after evolution and the
instances that comply with this grammar found in their repositories are shown in Table 2.
        </p>
        <p>In Table 2, Grammar 1 is a grammar with an earlier commit time, which is regarded as the grammar
before the evolution, while Grammar 2 is a grammar with a later commit time, which is referred to
as the evolved grammar. Similarly, Instance 1 is an instance with an earlier commit time, which is the
object of LLM evolution operation (we call it the original instance), while Instance 2 is an instance
with a later commit time, which conforms to the evolved grammar, but may not be an evolved version
of Instance 1, because the author of the instance may add or delete content irrelevant to the evolution.
Instance 2 serves only as a reference. We will discuss this situation in the discussion section.
1 To shorten the table, we abbreviate the language name “elite-se.xtext.languages.plantuml” to “plantuml”.
2 “Date” = the commit date of the grammar file.</p>
        <p>3 “ID” = the commit ID of the grammar file’s commit.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Solution Design</title>
        <p>To implement and evaluate this approach, we selected two mainstream LLMs: Claude-3.5 and GPT-4o,
given their demonstrated capabilities in code understanding and generation tasks. We developed
Python-based automation scripts that interact with these models through their respective APIs, using
specially designed prompts to guide the LLMs in analyzing grammar diferences and performing instance
evolution. The Python-based scripts can be found in the supplemental materials at https://osf.io/dhjw2/.
The approach takes three inputs: the original grammar, an instance conforming to it (i.e., the instance to
be evolved), and the evolved grammar. The scripts handle input file processing, manage communication
with the LLMs, and save the generated evolved instances. The following section details our prompting
strategy, which is critical to achieving successful co-evolution.</p>
        <p>Prompt Optimization. To obtain a prompt that can efectively guide the LLM to co-evolve an instance,
we started with an initial prompt that we iteratively refined in the two LLM solutions based on the
example Domainmodel in the oficial “15 minutes tutorial” of Xtext. In this example, we made some
changes to the instance before evolution and the grammar after evolution. The changes made to the
instance before evolution have been introduced in Section 2, and we will introduce the changes to
the grammar after evolution below. In each iteration, we used the prompt to drive the LLM to evolve
the instance and then observe whether there are problems with the output instance, e.g., incorrectly
modified elements. If there were problems with the output instance, we adjusted and optimized the
prompt according to the problem and entered the next iteration.</p>
        <p>LLMs are generally afected by non-determinism, which we need to account for when evaluating the
capability of the resulting approach. To this end, we repeated the co-evolution. When the instance was
correctly co-evolved, we performed nine more co-evolution runs with the same prompt. Considering
the uncertainty of LLM outputs, we decided that when at least six of the ten runs output good instances,
we would use this version of the prompt as the final version. An instance is considered good if it follows
the evolved grammar and retains auxiliary information. The grammar before evolution is shown in
List 3, which contains five grammar rules.</p>
        <p>The same tutorial provides an evolved grammar. This evolution is adding five new grammar rules,
but does not involve any changes to the symbols in the grammar. To make the evolution changes
also reflected in the symbol changes, we make two modifications to the evolved version, i.e., 1) in the
Entity rule, we add commas (‘,’) to the attribute features to separate it, instead of just spaces; 2) add a
semicolon (‘;’) as a terminator at the end of the DataType rule. In addition, we also deliberately added
an optional attribute called default in the grammar rule Feature which means that LLMs need to
identify more changes in the grammar. The final evolved grammar is shown in Listing 4.</p>
        <p>The tutorial also provides an instance that conforms to the grammar before the evolution. But as we</p>
        <sec id="sec-4-2-1">
          <title>Listing 3: The grammar of Domainmodel before the</title>
          <p>evolution.</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Listing 4: The grammar of Domainmodel after the</title>
          <p>evolution.
...</p>
          <p>Domainmodel:</p>
          <p>(elements+=Type)*;
Type:</p>
          <p>DataType | Entity;
DataType:</p>
          <p>’datatype’ name=ID;
Entity:
’entity’ name=ID (’extends’ superType=[Entity])?
’{’
(features+=Feature)*
’}’;
Feature:
(many?=’many’)? name=ID ’:’ type=[Type];
...</p>
          <p>Domainmodel: (elements+=AbstractElement)*;
PackageDeclaration:
’package’ name=QualifiedName ’{’</p>
          <p>(elements+=AbstractElement)* ’}’;
AbstractElement:</p>
          <p>PackageDeclaration | Type | Import;
QualifiedName: ID (’.’ ID)*;
Import:
’import’ importedNamespace=</p>
          <p>QualifiedNameWithWildcard;
QualifiedNameWithWildcard: QualifiedName ’.*’?;
Type: DataType | Entity;
DataType: ’datatype’ name=ID ’;’;
Entity:
’entity’ name=ID (’extends’ superType=[Entity|</p>
          <p>QualifiedName])? ’{’
(features+=Feature (’,’ features+=Feature)*)</p>
          <p>?’}’;
Feature:
(many?=’many’)? name=ID ’:’ type=[Type|</p>
          <p>QualifiedName] (’(’ default=ID ’)’)?;
mentioned in Section 2, we added comments and format information to the instance before evolution
(as shown in Listing 1) to evaluate whether the solution can preserve auxiliary information during
co-evolution. We added four comments, one of which is changed from a normal instance line. In
addition, we added an empty line and two tabs at diferent locations and put a multi-line code block
into one line. Under the guidance of the prompting text, LLMs are expected to correctly identify this
formatting information and replay it in the evolved instance.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Evaluation</title>
        <p>In Step 3, we executed the solution and evaluated the results. We applied the optimized Python script
and prompt text to seven case languages, using two models, Claude-3.5 and GPT-4o, to generate two
evolution instances for each language. In order to comprehensively evaluate the quality of the evolution
instances, we established a multi-dimensional evaluation index system covering three key aspects:
grammar correctness, evolution accuracy, and auxiliary information retention ability:
Grammar Correctness Metrics: We consider one metric, #LineErr: The number of lines containing
grammar errors in the evolved instance. This metric measures the conformity of the instance generated
by LLM with the evolved grammar (Grammar 2). A value of “0” indicates full conformity with the new
grammar, and a larger value indicates a lower degree of grammar conformity.</p>
        <p>Evolution Accuracy Metrics: We consider two metrics: (i.) #LineEvl: Count of lines of instance 1
that required change and are correctly evolved. This metric reflects the ability of LLM to correctly
identify and process the grammar elements that need to be changed. (ii.) #LineEvlWrg: Count of lines
of instance 1 that are lost (or incorrectly evolved). This metric measures the number of rows that were
incorrectly modified, missed necessary modifications, or introduced unnecessary modifications during
the evolution process.</p>
        <p>Auxiliary Information Preservation Metrics: We consider four metrics: (i.) #LineCmtLost: Count of
lines of instance 1 with comments that are lost. (ii.) #LineCmtSave: Count of lines of instance 1 with
comments that are maintained. (iii.) #LineFmtLost: Count of lines of instance 1 with format information
that is lost. (iv.) #LineFmtSave: Count of lines of instance 1 with format information that is maintained.</p>
        <p>All Python scripts, prompts, grammars, original instances, and LLM-generated instances are available
in our supporting materials https://osf.io/dhjw2/.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We now present the final version of the used prompt, and the results obtained from applying our
LLM-based solutions to the seven cases.</p>
      <p>Finalization of Prompting Text. Following the method described in Step 2, we obtained a version of
the prompting text after two rounds of “adjustment-verification” on the example Domainmodel, through
which we obtained instances from the LLMs (i.e., GPT-3 and Claude-4) that complied with the evolved
grammar. We verified this version of the prompt text 10 times for both LLM solutions, and in most of
these ten times, the instances obtained complied with the evolved grammar. Therefore, we decided to
adopt it as the final version of the prompting text, as follows:</p>
      <p>Final prompt: &lt;GRAMMAR_1&gt; is the initial grammar of the DSL. We evolved it to get &lt;GRAMMAR_2&gt;.
&lt;INSTANCE_1&gt; was originally a text instance that followed &lt;GRAMMAR_1&gt;. Now I want you to analyze the
diferences between the two versions of the grammar and, based on these diferences, modify &lt;INSTANCE_1&gt;
and get &lt;INSTANCE_2&gt;, which will follow &lt;GRAMMAR_2&gt;. Please address the following things:
1. When evolving the instance, please do not omit any mandatory elements, such as characters enclosed by
single quotes.
2. If &lt;GRAMMAR_2&gt; adds a new grammar rule or a new attribute that is optional or in an “OR” relationship
(i.e., |), then please do not instantiate it.
3. Do not miss or add any auxiliary information in the instance, e.g., comments, formats (white space, indents,
tabs, empty lines, etc.).</p>
      <p>Compared to the first version, the final version of the prompting text explicitly adds three items that
the LLM needs to do. This is based on the problems we encountered in the process of optimizing the
prompting text. The first is added because the LLM ignored the symbol ‘,’, which is a mandatory element.
The second is added because LLM would actively instantiate the grammar rule “PackageDeclaration”
which is a newly added optional rule. The rationale behind this instruction is to ensure minimal changes
to the instance, modifying only what is necessary to maintain conformance with the evolved grammar.
The third item is added because LLM partially ignored comments.</p>
      <p>Co-evolution in Seven Case Languages (RQ1). For each case language, we obtained an generated
instance through the two Python scripts and the prompting text, and we repeated this generation
operation ten times. We manually compare these ten generated instances with instance 1 and grammar
2 one by one to collect data, and then average the data. An overview of the results from Claude-3.5 is
presented in Table 3, while the results for GPT-4o are presented in Table 4. In addition, we added the
average values on a certain metric vertically, leading to a total count – shown in the last row. Based on
the results presented in Table 3 and 4, we can now address our first research question regarding how
LLMs can be used to automate the co-evolution process.</p>
      <p>Answer to RQ1: We developed two automated solutions based on Claude-3.5 and GPT-4o, each
consisting of a dedicated Python script and an optimized prompting text. These solutions take the
grammar before evolution, the evolved grammar, and the instance that conforms to the grammar
before the evolution, then use LLMs to analyze grammar diferences and generate an evolved
instance that conforms to the evolved grammar.</p>
      <p>Correctness evaluation (RQ2). To address RQ2, on the capability of LLMs to produce correct solutions
for the underlying co-evolution task, we consider three metrics related to correctness: #LineErr (i.e.,
count of lines with grammar errors in the evolved instance), #LineEvl (i.e., count of lines of instance
1 that are correctly evolved), and #LineEvlWrg (i.e., count of lines of instance1 that are missing or
incorrectly evolved in the evolved instance).</p>
      <p>We found that in two DSLs, “smart-dsl” and “mongoBeans”, the two LLMs made no mistakes in
performing the co-evolution of the instance. In both cases, all lines that needed to change were evolved
correctly (see #LineEvl and #LineEvlWrg). The resulting instances included no grammatical errors and,
thus, conformed to the new grammar (#LineErr).</p>
      <p>In the two cases “xtext-orm” and “xtext-dnn” Claude-3.5 performed better than GPT-4o. Claude-3.5
performed all 10 runs to co-evolution the instance of “xtext-orm” correctly, and only evolved one line
incorrectly during one of the 10 runs to co-evolve the instance of “xtext-dnn”. Fortunately, the result
of this evolution operation still conforms to the evolved grammar. Note that an incorrectly evolved
line still conforms to the grammar, e.g., when a line is substituted by an empty line. For “xtext-dnn”
GPT-4o produced on average 1,7 lines that were erroneous regarding the evolved grammar and the
same number of lines that were evolved wrongly. However, GPT-4o performed not well for “xtext-orm”,
were it produced on average 13,9 erroneous lines regarding the evolved grammar and 12,3 lines that
were evolved incorrectly. Note that an incorrectly evolved line can cause grammatical issues with other
lines that have not been changed at all, e.g., when a closing bracket is removed lines that follow might
not be parsed as intended. For the language “elite-se.xtext.languages.plantuml” Claude-3.5 managed
to always create an instance that conforms to the evolved grammar. However, on average 0,6 lines
of the instance were evolved incorrectly over the 10 runs. Here, GPT-4o evolved slightly fewer lines
incorrectly (0,5 lines on average), but also failed to systematically create instances that conform to the
evolved grammar (with an average of 2,8 erroneous lines).</p>
      <p>However, both LLMs made mistakes when performing the co-evolution of the instances for the
languages “isis-script” and “CheckerDSL”. For example, in the co-evolution of “isis-script”, Claude-3.5
evolved on average 23.1 lines incorrectly, while GPT-4o evolved on average 29.5 lines incorrectly.
In the co-evolution of “CheckerDSL”, GPT-4o outputs better results than Claude-3.5, at least when
looking at our metrics. Here Claude-3.5 evolves more than 30 lines incorrectly each time it co-evolves
“CheckerDSL” (on average 57,9 lines), producing on average 22 lines that do not conform to the evolved
grammar. We compared the instances generated by Claude-3.5 in ten runs with the instance 1. We
found that in eight of the ten runs, the evolved instance generated by Claude-3.5 for “CheckerDSL”
only contained about the first 140 lines, while instance 1 had 173 lines. I.e., about the last 30 lines were
directly discarded by mistake. In the other two runs, Claude-3.5 did not successfully generate an evolved
instance, but only outputted suggestions on how to evolve the instance.</p>
      <p>Faced with such diferences between the diferent DSLs, we looked further into the size of the instances
in those languages. We found that three languages, “CheckerDSL”, “isis-script”, and “xtext-orm”, have
larger instances. From the evolution results of the two LLMs in the seven case languages, the errors
also mainly appear in these three languages. Thus, it seems that the size of the instances is a factor that
might afect the performance of the LLMs when executing the co-evolution.</p>
      <p>Answer to RQ2: The LLMs performed excellently on small instances (such as those of
“smartdsl” and “mongoBeans”), correctly executing all necessary evolution operations. However, their
performance significantly degraded when co-evolving larger instances (such as those of “isis-script”
and “CheckerDSL”). We conclude that the instance size afects the correctness of LLM-generated
solutions.</p>
      <p>Support for auxiliary information (RQ3). We consider how well the LLMs performed in preserving
auxiliary information, in terms of the metrics #LineCmtLost and #LineCmtSave (i.e., count of lines
of instance 1 with comments that are lost and retained, respecitvely), as well as #LineFmtLost and
#LineFmtSave (i.e., count of lines of instance 1 with format information lost and retained, respectively).
Preservation of Comments. Only the instances of “xtext-orm”, “mongoBeans”, and “CheckerDSL”
contained comments. In the co-evolution of grammar and instances in language “mongoBeans” and
“xtext-orm”, both LLMs successfully preserved all comments (only in one co-evolution run of
“xtextorm”, GPT-4o lost a comment). However, when evolving “CheckerDSL”, GPT-4o retained comments
better than Claude-3.5. As mentioned in Section 5, Claude-3.5 did not generate an evolved instance in
two of the ten evolution runs for “CheckerDSL”. As a side efect, all comments in instance 1 were lost
because no evolved instance was generated.</p>
      <p>Preservation of Formats. From the data, GPT-4o lost only a few lines of formatting information when
performing the co-evolution. However, when we further opened the instance files, we found that this
was not the case. Sometimes GPT-4o does not perform the evolution that should occur during some
co-evolution executions, but directly copies the text content of the instance 1 to the new instance. For
example, in seven out of ten runs of the co-evolution in “CheckerDSL”, GPT-4o directly copied all
the text content of instance 1. I.e., the text of the entire instance was not modified, so the original
formatting information was completely copied to the evolved instance. We found that in case languages
with a small number of lines of instance text (e.g., “mongoBeans”), LLM can well preserve formatting
information during co-evolution.</p>
      <p>Answer to RQ3: In terms of preserving auxiliary information, both LLMs performed well in
maintaining comments and formatting information when handling small instances. However,
GPT4o would sometimes directly copy the original instance without performing necessary evolution
operations. For larger instances (such as instance of CheckerDSL), Claude-3.5 exhibited output
truncation issues, resulting in the loss of comments and formatting information. This indicates
that LLMs’ capability to preserve auxiliary information is significantly afected by instance size.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>In general the initial results of both LLMs are promising, as they indicate that, at least for small DSLs,
evolution steps, and instances, a co-evolution while maintaining auxiliary information is possible. In
the following, we discuss some open issues and threats to validity.</p>
      <p>
        Scalability. Our results indicate that the good initial results do not scale well for larger textual
instances, larger grammars, and more significant changes between grammar versions. While this
does not invalidate LLMs’ applicability to practical cases as long as they remain small—DSLs are often
conceived as "small languages" for dedicated tasks [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]—applying an LLM-based solution to large
practical cases is currently challenging. Future work should explore approaches to improve scalability,
including techniques for handling larger inputs and potentially generating migration programs rather
than performing direct co-evolution. As more powerful LLMs become available, their capability to
handle larger instances during co-evolution will likely increase.
      </p>
      <p>Non-grammar-driven Instance Changes. While we identified instances in repositories that conform
to evolved grammars, these instances often contain changes unrelated to grammar evolution. For
example, in “isis-script”, numerous action objects were deleted between versions, though this was not
required for grammar conformance. This makes it dificult to use real updated instances as a baseline
for evaluating whether LLMs perform co-evolution similarly to human developers. Future work should
explore ways to distinguish between pure co-evolution changes and other modifications developers
make during language evolution.</p>
      <p>
        Variations in Migration Strategies. Research on meta-model to model co-evolution has shown that
there is sometimes more than just one possible and valid outcome of a migration [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The same is likely
true for textual instances. Therefore, we would like to address this issue in future work, developing an
approach to consider diferent migration strategies. Alternatively, expanding on the idea to use LLMs
to generate migration programs, one could also use LLMs to generate configurable migration code.
Threats to Validity. The primary limitation of this study is the relatively small set of case languages and
their associated instances, which may not fully represent the diversity of DSLs in practice. Additionally,
the instances found in repositories were likely designed for demonstration purposes rather than
production use, potentially overestimating LLMs’ performance on real-world cases. Finally,
Claude3.5’s consistent output truncation for CheckerDSL (approximately 170 lines) prevented us from fully
evaluating its capabilities on larger instances, suggesting that our findings on scalability should be
interpreted cautiously until further investigation with improved techniques for handling larger inputs.
We derived our prompt from a single case language, which may lead to overfitting. To mitigate this
threat, we applied our approach to seven diverse case languages selected from diferent domains
with varying complexity levels, and performed ten runs for each case language to account for LLM
output variability. The observed performance variations across these cases provide insights into the
generalizability limitations of our approach. In future work, employing cross-validation techniques
using multiple representative DSLs during prompt development could further reduce this threat.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This paper explored the use of LLMs (Claude-3.5 and GPT-4o) to support co-evolution between DSL
grammar definitions and instances. Experiments on seven case languages showed that LLMs can
efectively handle co-evolution tasks while preserving comments and formatting, especially for smaller
instances. However, performance declined for larger, more complex languages and major grammar
changes, with issues such as truncated outputs and inconsistent handling.</p>
      <p>Future work includes extending our study to graphical DSLs, comparing LLM-based and manual
co-evolution to quantify potential benefits in practical scenarios and refining prompt engineering
strategies (e.g., specificity of terms and grammar rule ordering). Additionally, we intend to develop an
enhanced evaluation framework with precise quantitative metrics. Moreover, we hope to systematically
evaluate the quality of newly added information in co-evolved instances, providing deeper insights into
LLMs’ capabilities for supporting comprehensive DSL evolution.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly to support the improvement of
grammar and spelling checking. After using these tool(s)/service(s), the author(s) reviewed and edited
the content as needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lämmel</surname>
          </string-name>
          ,
          <source>Software Languages</source>
          , Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Azvine</surname>
          </string-name>
          ,
          <article-title>Evolution of fuzzy grammars to aid instance matching</article-title>
          , in: 2006
          <source>International Symposium on Evolving Fuzzy Systems</source>
          , IEEE,
          <year>2006</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Azvine</surname>
          </string-name>
          ,
          <article-title>Incremental evolution of fuzzy grammar fragments to enhance instance matching and text mining</article-title>
          ,
          <source>IEEE Transactions on Fuzzy Systems</source>
          <volume>16</volume>
          (
          <year>2008</year>
          )
          <fpage>1425</fpage>
          -
          <lpage>1438</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaupel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Strüber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rieger</surname>
          </string-name>
          , G. Taentzer,
          <article-title>Agile bottom-up development of domain-specific ides for model-driven development</article-title>
          , in: Workshop on Flexible Model Driven Engineering, CEURWS.org,
          <year>2015</year>
          , pp.
          <fpage>12</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hebig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Khelladi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bendraou</surname>
          </string-name>
          ,
          <article-title>Approaches to co-evolution of metamodels and models: A survey</article-title>
          ,
          <source>IEEE Transactions on Software Engineering</source>
          <volume>43</volume>
          (
          <year>2016</year>
          )
          <fpage>396</fpage>
          -
          <lpage>414</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Latifaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciccozzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohlin</surname>
          </string-name>
          , E. Posse,
          <article-title>Towards automated support for blended modelling of uml-rt embedded software architectures</article-title>
          .,
          <source>in: ECSA (Companion)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Holtmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Steghöfer</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Zhang,</surname>
          </string-name>
          <article-title>Exploiting meta-model structures in the generation of xtext editors</article-title>
          .,
          <source>in: MODELSWARD</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>218</fpage>
          -
          <lpage>225</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liping</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Fengrong</surname>
          </string-name>
          ,
          <article-title>A survey on research of code comment</article-title>
          ,
          <source>in: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Macvean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Hellendoorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vasilescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <article-title>Using an llm to help with code understanding</article-title>
          ,
          <source>in: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Generalization or memorization: Data contamination and trustworthy evaluation for large language models</article-title>
          ,
          <source>arXiv preprint arXiv:2402.15938</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bettini</surname>
          </string-name>
          ,
          <article-title>Implementing domain-specific languages with Xtext and Xtend</article-title>
          , Packt,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Strüber,
          <article-title>Tales from 1002 repositories: Development and evolution of xtext-based dsls on github</article-title>
          ,
          <source>in: 2024 50th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>172</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Holtmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Strüber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hebig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Steghöfer</surname>
          </string-name>
          ,
          <article-title>Supporting meta-model-based language evolution and rapid prototyping with automated grammar transformation</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>214</volume>
          (
          <year>2024</year>
          )
          <fpage>112069</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Steghöfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hebig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Strüber</surname>
          </string-name>
          ,
          <article-title>A rapid prototyping language workbench for textual dsls based on xtext: Vision and progress</article-title>
          ,
          <source>arXiv preprint arXiv:2309.04347</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wagelaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Iovino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Di</given-names>
            <surname>Ruscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pierantonio</surname>
          </string-name>
          ,
          <article-title>Translational semantics of a co-evolution specific language with the emf transformation virtual machine</article-title>
          ,
          <source>in: Theory and Practice of Model Transformations: 5th International Conference, ICMT 2012</source>
          , Prague, Czech Republic, May
          <volume>28</volume>
          -29,
          <year>2012</year>
          . Proceedings 5, Springer,
          <year>2012</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>207</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cicchetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Di</given-names>
            <surname>Ruscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Eramo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pierantonio</surname>
          </string-name>
          ,
          <article-title>Automating co-evolution in model-driven engineering, in: 2008 12th International IEEE enterprise distributed object computing conference</article-title>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>222</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Rose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Kolovos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Paige</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Polack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poulding</surname>
          </string-name>
          ,
          <article-title>Epsilon flock: a model migration language</article-title>
          ,
          <source>Software &amp; Systems Modeling</source>
          <volume>13</volume>
          (
          <year>2014</year>
          )
          <fpage>735</fpage>
          -
          <lpage>755</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Tolvanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Di</given-names>
            <surname>Rocco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pierantonio</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Tinella, A framework for evaluating tool support for co-evolution of modeling languages, tools and models</article-title>
          ,
          <source>Software and Systems Modeling</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J. Di</given-names>
            <surname>Rocco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Di</given-names>
            <surname>Ruscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Sipio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rubei</surname>
          </string-name>
          ,
          <article-title>On the use of large language models in model-driven engineering</article-title>
          ,
          <source>Software and Systems Modeling</source>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Z. K.</given-names>
            <surname>Kebaili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Khelladi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Acher</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Barais,</surname>
          </string-name>
          <article-title>An empirical study on leveraging llms for metamodels and code co-evolution</article-title>
          ,
          <source>in: European Conference on Modelling Foundations and Applications (ECMFA</source>
          <year>2024</year>
          ), volume
          <volume>23</volume>
          ,
          <source>Journal of Object Technology</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Deursen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Klint</surname>
          </string-name>
          ,
          <article-title>Little languages: little maintenance?</article-title>
          ,
          <source>Journal of Software Maintenance: Research and Practice</source>
          <volume>10</volume>
          (
          <year>1998</year>
          )
          <fpage>75</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>