<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Y. Zhu);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Jianing Sun∗, Jiahui Wang, Yuyan Zhu, Xingyu Li, Ying Xie and Jiaxin Chen</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Chongqing University</institution>
          ,
          <addr-line>400044 Chongqing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>QuASoQ 2024:12</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>9</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>This paper presents a novel method for generating automated test scripts for Domain-Specific Languages (DSLs) in software testing, particularly for the automotive industry. It emphasizes the growing importance of software testing in ensuring product quality amid IT advancements. The paper reviews software testing's evolution, modern processes, and the role of Large Language Models (LLMs). It highlights DSLs' significance and uses the automotive sector to show how LLMs can automate test script generation. Tests indicate that in cases with a small sample size, the effectiveness of prompt engineering is superior to model fine-tuning. The proposed method thus relies on well-designed prompts to direct LLMs to produce accurate scripts. The generation system's overview is discussed, along with an evaluation of the scripts' quality using metrics like Levenshtein Distance. Results indicate that LLMs boost test automation, defect detection, and software reliability. Future work will optimize these tools for higher testing automation levels.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;software testing</kwd>
        <kwd>domain-specific languages</kwd>
        <kwd>large language models</kwd>
        <kwd>Levenshtein Distance 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Software testing is a key component in ensuring the
quality and reliability of software products. In the
rapidly developing information technology era, software
has become an indispensable part of our daily life and
work. With the increasing complexity and
diversification of software functions, the importance of
software testing has become increasingly prominent.
Software testing is a series of processes designed to
check that a software product meets specified
requirements and ensures its quality. It not only helps
developers to find and fix defects, but also greatly
enhances system security, especially in fields with high
software safety requirements such as automotive and
aviation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>1.1. A Brief History of Software Testing</title>
        <p>
          The origins of software testing date back to the 1950s,
focusing initially on debugging to identify and rectify
software faults [
          <xref ref-type="bibr" rid="ref3">2</xref>
          ][
          <xref ref-type="bibr" rid="ref4">3</xref>
          ][
          <xref ref-type="bibr" rid="ref5">4</xref>
          ]. As software complexity grew,
the need for independent testing organizations became
apparent. In 1957, Charles Baker first defined program
testing, in his review of the book Digital Computer
Programming by Dan McCracken, separating it from
debugging. Bill Hetzel formalized software testing as a
concept at the University of North Carolina in 1972,
establishing it to ensure a program performs as intended
[
          <xref ref-type="bibr" rid="ref6">5</xref>
          ]. Glenford J. Myers further refined this in 1979,
describing testing as executing a program to uncover
errors [
          <xref ref-type="bibr" rid="ref7">6</xref>
          ].
        </p>
        <p>
          By 1983, IEEE had standardized software testing,
defining it as a process -manual or automated- to verify
system requirements [
          <xref ref-type="bibr" rid="ref8">7</xref>
          ]. The 1990s brought agile
methodologies, integrating testing and development and
encouraging tester involvement from the earliest
development stages [
          <xref ref-type="bibr" rid="ref9">8</xref>
          ]. In the 21st century, testing has
advanced, with a focus on exploratory testing that
highlights the tester initiative. The era of AI and big data
has intensified scrutiny of software testing. Despite still
leveraging 20th-century methods, the field anticipates
future innovations, potentially revolutionizing testing
practices [
          <xref ref-type="bibr" rid="ref10">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Modern Approaches</title>
        <p>The modern software testing process is crucial for
ensuring software quality and functionality. It starts
with requirement analysis, followed by developing a test
plan, designing test cases, and preparing test data
(Figure 1). The test environment is set up, tests are
executed and recorded, and defects are tracked.
Regression and performance testing are conducted,
along with security and system testing. Acceptance
testing confirms business requirements are met. Test
reports summarize results, and evaluations identify
process improvements.</p>
        <p>Techniques like automated testing, Continuous
Integration (CI), and Continuous Delivery (CD) enhance
testing efficiency. Agile testing fosters collaboration
between testers and developers. Performance, security,
and mobile application testing ensure software
reliability across different aspects. Cloud testing
leverages cloud resources for extensive testing.</p>
        <p>AI-based testing leverages machine learning to
automate software testing processes, enhancing
efficiency and accuracy. It encompasses exploratory
testing to identify issues without fixed test cases,
ensuring broader coverage. Model-driven testing and
testability design optimize test case generation and
software sustainability. Additionally, managing test data
and implementing strategies such as Test Left Shift and
Test Right Shift further refine the development cycle.
These dynamic approaches adapt to different
methodologies, ensuring consistent software quality
throughout the testing process.</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Large Language Models</title>
        <p>Large Language Models (LLMs) are cutting-edge AI
specialized in natural language understanding and
generation. Trained on extensive datasets and
employing neural networks like Transformers, they
capture linguistic subtleties and perform a variety of
language tasks such as categorization, analysis,
translation, and Q&amp;A systems. They discern nuances,
generate realistic text, and continuously adapt to
linguistic evolution, raising concerns over data privacy
and ethics.</p>
        <p>
          LLMs have significantly impacted sectors like smart
offices, travel, e-commerce, and government by
enhancing efficiency and personalization. In software
development, LLMs are revolutionizing the field. They
aid in document summarization, provide travel advice,
and improve user engagement. Tools like GitHub
Copilot demonstrate their advantage by assisting in
coding tasks [
          <xref ref-type="bibr" rid="ref11">10</xref>
          ].
        </p>
        <p>Also, LLMs boost software testing by automating
tasks, detecting defects, and ensuring reliability. They
improve fuzz and unit testing, creating test cases, and
suggesting fixes. Research shows their significant
benefits in expanding test coverage and error detection.
Future efforts will focus on optimizing testing tools and
techniques.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Domain-Specific Software</title>
    </sec>
    <sec id="sec-3">
      <title>Testing</title>
      <sec id="sec-3-1">
        <title>2.1. Domain-Specific Languages</title>
        <p>
          Domain-specific languages (DSLs) are specialized
languages designed for particular domains or tasks,
offering simplified syntax for ease of use by domain
experts [
          <xref ref-type="bibr" rid="ref12">11</xref>
          ]. They can be integrated with
generalpurpose languages (GPL) like Java and C++, enhancing
development efficiency through tool support such as
analyzers and compilers. DSLs are crucial in various
industries, for example, HTML in web development and
SQL in databases. They automate tasks like API
documentation and legal document generation,
improving efficiency and reducing errors. DSLs also
facilitate team collaboration by allowing non-technical
members to express requirements in a natural
languagelike format.
        </p>
        <p>For software development, DSLs boost efficiency by
simplifying complex representations and promoting
code reusability. They accelerate prototyping and
iteration, integrating seamlessly into existing tools and
workflows. Testing DSL-developed programs requires a
detailed plan with automated test scripts for regression
testing. Test cases must be readable, and test data should
reflect the domain specifics to ensure comprehensive
coverage and identify defects. Maintainability of test
cases and DSL is essential for ongoing development
success.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Software Testing using DSL from the</title>
      </sec>
      <sec id="sec-3-3">
        <title>Automobile Industry</title>
        <p>This section addresses the critical need for rigorous
testing in passenger car product development, ensuring
quality and performance meet standards.</p>
        <p>Traditionally, automotive testing relies on manual,
labor-intensive translation of requirements into test
cases and scripts, causing significant strain on resources.
To streamline this process and integrate Continuous
Testing/Continuous Delivery
(CT/CD) pipelines, the industry is moving towards
automated test development.</p>
        <p>The automotive sector is in pursuit of an AI-driven
solution to streamline the automation of test script
generation for its proprietary DSLs, which are integral
to the testing of a spectrum of automotive systems. The
prevailing manual methodology is marred by
inefficiencies, susceptibility to errors, and variability in
code quality, alongside insufficient test coverage. By
harnessing the capabilities of large language models, an
AI-powered tool has the potential to orchestrate this
process, amplifying efficiency, curbing errors, and
upholding uniformity in code excellence, thereby
conquering existing challenges and invigorating the
software development lifecycle.</p>
      </sec>
      <sec id="sec-3-4">
        <title>2.3. Sample Data</title>
        <p>A total of 51 data samples (Table 1), each representing
a true mapping from a test case to a test script in a
particular DSL format.</p>
        <p>For privacy protection purposes, all information,
program code and data in this paper have been
anonymized.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Approach</title>
      <p>Broadly speaking, two dominant strategies have
emerged for augmenting the knowledge base: the art of
prompt engineering, which is particularly effective for
modest datasets, and the process of model fine-tuning,
which is best suited for addressing more substantial
volumes of data. Considering the current data landscape,
characterized by a dearth of samples and inherent
uncertainties, a comprehensive evaluation was
undertaken to compare the merits of both prompt
engineering and LLM fine-tuning. This analysis has
demonstrated that, under the present circumstances of</p>
      <p>Test script
Signals.Check(signals=[Gears_GearsStatus],
values=[Gears_GearsStatus_shift], waiting_time=100)
Gear.Shift('drive')
Signals.Check(signals=[Gears_GearsStatus],
values=[Gears_GearsStatus_shift], waiting_time=100)
Gear.Shift('reverse')
current_speed = car.get_speed()
current_gear = car.get_gear_status()
Self.assertNotEqual(current_gear, 'park')
car.stop()
car.shift_gear('park')
current_gear = car.get_gear_status()
limited data, prompt engineering emerges as the slightly
superior approach.</p>
      <p>Consequently, we have intentionally opted to
employ the finesse of prompt engineering for the
automated crafting of test scripts. This strategic choice
is rooted in its proven ability to deliver optimized
outcomes, even within the confines of our data scarcity.
By leveraging the finesse of prompt engineering, we aim
to transcend the limitations imposed by scant data
availability, thereby enhancing the overall performance
and reliability of our test script generation process.</p>
      <p>An integral element of our approach is the selection
of the foundational Large Language Model. To this end,
we have undertaken a model selection process,
meticulously assessing ChatGLM3, Llama3, and Qwen2.
Following an exhaustive comparison, we determined
that Llama3's generative capabilities align more closely
with our requirements. Hence, we have chosen Llama3
to serve as the underlying LLM for this study.</p>
      <sec id="sec-4-1">
        <title>3.1. Prompt to Make Precise Test Script</title>
      </sec>
      <sec id="sec-4-2">
        <title>Generation</title>
        <p>Through a meticulous process of refinement, we've
perfected our prompt for generating test scripts, as
shown in the example. This fine-tuning ensures our AI
model produces outputs that are both accurate and meet
our objectives.</p>
        <p>Our prompt is divided into four key components
(Table 2): First, an exhaustive list of potential samples,
excluding the current focus, provides a comprehensive
training context. Second, we concentrate on the specific
test purpose to create targeted, efficient test scripts.
Third, we provide clear instructions in natural English
for the LLM to follow, ensuring a seamless and accurate
generation process. Lastly, we impose constraints to
optimize the generation process, enabling our LLM
model to autonomously produce precise and relevant
test scripts without excessive input.</p>
        <p>Prompt for test script generation (pseudo code)
dataFrame = All sample mappings except the one
which is being generated
testCase = one test case which is being processed
instruction = “Above is a list of test cases and
corresponding test scripts, assembled in Json format.
Please generate test script for the following test case:”
condition = “Please export generated test script only,
no leading text, no leading new lines.”
prompt = dataFrame + CRLF + instruction + testCase
+ condition</p>
        <p>This structured approach not only boosts script
accuracy but also enhances the efficiency of our testing
process, bringing us closer to our goal of fully automated
AI-driven test script generation.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.2. Test Script Generation System</title>
      </sec>
      <sec id="sec-4-4">
        <title>Overview</title>
        <p>The test script generation system, illustrated in Figure
3, converts input test cases into executable scripts in the
partner's DSL language, verifying product functionality.
It uses outlined methodologies, and scripts are evaluated
by experts for accuracy and reliability, with corrections
made as needed.</p>
        <p>Validated scripts are executed and stored,
informing future prompts and enhancing script
generation over time. This cycle of evaluation and
learning improves script quality and reduces manual
creation, aiming for an automated, self-improving
system that streamlines software testing. As data
storage grows, prompts become more complex,
reflecting deeper learning and improved autonomy in
script generation, ultimately advancing AI in software
testing.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Result and Evaluation</title>
      <sec id="sec-5-1">
        <title>4.1. Evaluation Metric</title>
        <p>
          This paper employs the Levenshtein Distance [
          <xref ref-type="bibr" rid="ref13">12</xref>
          ] to
evaluate the textual accuracy of our language model,
providing an objective measure of how closely
generated text matches the ground truth. This edit
distance metric, devised by Vladimir Levenshtein,
quantifies the minimum number of single-character
edits required to transform one string into another,
offering insights into model performance. It plays a
crucial role in fields like Natural Language Processing,
where it assesses text similarity, and Bioinformatics,
where it indicates genetic relatedness. Despite its higher
computational demands for longer strings, our use of
dynamic programming makes it an efficient tool for our
analysis. The Levenshtein Distance aids in refining our
model, ensuring that the text generation is both accurate
and reliable.
        </p>
        <p>The formal definition of Levenshtein Distance
between two arbitrary strings  and  with length of
|| and || respectively is given by

(,  ) =
⎧
⎪
⎪
⎨
⎪1 + min
⎪
⎩




| |,
| |,
tail( ), tail( ) ,
(tail( ),  )
, tail ( ) ,
tail( ), tail( )
if | | = 0,
if | | = 0,
if head( ) = head( ),
otherwise.</p>
        <p>where tail() of any string  of length  is a
substring of  without the first character, i.e. tail( ) =
tail(  ⋯  ) =   ⋯  and head() of any
string  of length  is a substring of  without the last
character, i.e. head( ) = head(  ⋯  ) =
  ⋯  .</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Result and Discussion</title>
        <p>In our comprehensive analysis, we have utilized the
Levenshtein Distance alongside the test script
generation methodologies previously discussed to assess
the output across all 51 data samples. It is crucial to
highlight the exceptional stability achieved with the
prompts we've designed, particularly when employing
Llama3 as our LLM. The consistency of Llama3 is
noteworthy; for a given data sample, or in other words,
with the same prompt, the model reliably produces
identical results in each test scenario. This uniformity is
a testament to the robustness of our prompt engineering
and the model's ability to deliver reliable outcomes. This
level of consistency is not only a significant advantage
in the context of test script generation but also a key
factor in ensuring the reproducibility of our
experiments. It allows us to confidently attribute any
variations in the output to changes in the input data or
to the model's fine-tuning, rather than to the inherent
instability of the model itself. By achieving such a high
degree of stability, we pave the way for more accurate
and meaningful evaluations of our model's performance,
which in turn, informs
our continuous efforts to enhance its capabilities.
Moreover, this stability ensures that our test script
generation process is not only efficient but also
dependable, providing our partners and users with a tool
that they can trust to deliver consistent results.</p>
        <p>The test results for all 51 samples are displayed in
the horizontal bar chart in Figure 4, offering a clear
visual representation of our system's performance. The
red bars in the chart signify the text lengths of the
ground truth test scripts, serving as a benchmark for
comparison. It represents the ideal output, against which
the effectiveness of our system is measured. The pink
bars, on the other hand, denote the lengths of the test
scripts generated by our system. This provides insight
into the output of our AI-driven script generation
process, highlighting the efficiency and effectiveness
with which our system translates prompts into
executable test scripts. Most importantly, the blue bars
in the chart represent the Levenshtein Distances for
each sample, a critical metric that quantifies the
difference between the generated test scripts and the
ground truth. This distance is calculated based on the
minimum number of single-character edits required to
transform the generated test scripts into the ground
truth test scripts. In this context, a shorter blue bar
indicates a higher degree of similarity, suggesting that
the generated script closely mirrors the ground truth,
which is the goal of our system.</p>
        <p>As observed from Error! Reference source not
found., it is evident that the system currently exhibits a
noticeable margin of error. This finding is further
accentuated and clarified in the subsequent statistical
box plot in Figure 5, which provides a more detailed
visualization of the distribution of errors across our
dataset. It is apparent that our dataset, comprising a
mere 51 samples, is significantly limited for a deep
learning initiative. The consensus in the field is that a
larger dataset is often necessary to train models to
achieve higher accuracy and reliability.</p>
        <p>However, it is remarkable to note that despite this
constraint, our system has produced flawless results in
six instances where the generated test scripts matched
the ground truth perfectly. This achievement is
particularly impressive given the small sample size and
serves as a testament to the potential of our approach
using prompt engineering with large language models.
The fact that our system was able to generate scripts
indistinguishable from the ground truth in these cases
suggests that with further optimization and a more
extensive dataset, we could see a substantial
improvement in the system's overall performance.</p>
        <p>This early success with a limited dataset is not just
encouraging; it also validates the feasibility of our
methodological approach. It indicates that our system
has the innate capacity to learn and produce
highquality outputs, even when faced with data scarcity. As
we continue to expand our dataset and refine our
models, we are confident that the performance will see
a marked enhancement, further solidifying the
effectiveness of our AI-driven test script generation
system in the field of software testing.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion and Future Work</title>
      <p>This research highlights the significant impact of LLMs
on enhancing software testing efficiency, particularly in
the automotive sector. Our findings underscore the
superiority of prompt engineering over model
finetuning, especially with smaller datasets. The
Levenshtein Distance proved a reliable metric for script
accuracy. Notably, LLMs, such as Llama3, demonstrated
remarkable consistency, indicating the robustness of our
framework. Even with a limited dataset, our system
achieved high accuracy, showcasing LLMs' potential in
software testing.</p>
      <p>Our study introduces a novel approach to DSL
testing, with a user-friendly web application for our test
script generation system, enhancing accessibility and
testing efficiency. Future work includes expanding our
dataset to improve script performance and integrating
the system into CI/CD pipelines for real-time testing.
Ethical considerations and model transparency will also
be prioritized. In conclusion, our research establishes
LLMs as a viable solution for automating DSL test script
generation, laying the groundwork for future
advancements in AI-assisted software testing.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Awedikian</surname>
          </string-name>
          , Roy, and Bernard Yannou. “
          <article-title>Design of a Validation Test Process of an Automotive Software</article-title>
          .”
          <source>International Journal on Interactive Design and Manufacturing (IJIDeM) 4</source>
          , no.
          <issue>4</issue>
          (
          <issue>November 1</issue>
          ,
          <year>2010</year>
          ):
          <fpage>259</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>https://doi.org/10.1007/s12008-010-0108-2.</mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Campbell</surname>
          </string-name>
          , Robert V. D. “Evolution of Automatic Computation.”
          <source>In Proceedings of the 1952 ACM National Meeting (Pittsburgh)</source>
          ,
          <fpage>29</fpage>
          -
          <lpage>32</lpage>
          . ACM '
          <fpage>52</fpage>
          . New York, NY, USA: Association for Computing Machinery,
          <year>1952</year>
          . https://doi.org/10.1145/609784.609786.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Orden</surname>
          </string-name>
          , Alex. “
          <article-title>Solution of Systems of Linear Inequalities on a Digital Computer</article-title>
          .”
          <source>In Proceedings of the 1952 ACM National Meeting (Pittsburgh)</source>
          ,
          <fpage>91</fpage>
          -
          <lpage>95</lpage>
          . ACM '
          <fpage>52</fpage>
          . New York, NY, USA: Association for Computing Machinery,
          <year>1952</year>
          . https://doi.org/10.1145/609784.609793.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Demuth</surname>
            ,
            <given-names>Howard B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>John</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <string-name>
            <surname>Jackson</surname>
            ,
            <given-names>Edmund</given-names>
          </string-name>
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Metropolis</surname>
          </string-name>
          , Walter Orvedahl, and
          <string-name>
            <surname>James</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Richardson</surname>
          </string-name>
          . “MANIAC.”
          <source>In Proceedings of the 1952 ACM National Meeting (Toronto)</source>
          ,
          <fpage>13</fpage>
          -
          <lpage>16</lpage>
          . ACM '
          <fpage>52</fpage>
          . New York, NY, USA: Association for Computing Machinery,
          <year>1952</year>
          . https://doi.org/10.1145/800259.808982.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hetzel</surname>
          </string-name>
          , William C.
          <article-title>Program Test Methods</article-title>
          . Prentice-Hall,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>Glenford J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corey</surname>
            <given-names>Sandler</given-names>
          </string-name>
          , and Tom Badgett.
          <source>The Art of Software Testing</source>
          . John Wiley &amp; Sons,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[7] “IEEE Standard for Software Test Documentation</article-title>
          .”
          <source>Accessed September 17</source>
          ,
          <year>2024</year>
          . https://standards.ieee.org/ieee/829/1217/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>James.</given-names>
          </string-name>
          <article-title>Rapid Application Development</article-title>
          . Macmillan Publishing Company,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Khaliq</surname>
          </string-name>
          , Zubair,
          <source>Sheikh Umar Farooq, and Dawood Ashraf Khan. “Artificial Intelligence in Software Testing : Impact</source>
          , Problems, Challenges and Prospect.” arXiv,
          <year>January 14</year>
          ,
          <year>2022</year>
          . https://doi.org/10.48550/arXiv.2201.05371.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Schäfer</surname>
          </string-name>
          , Max, Sarah Nadi, Aryaz Eghbali, and Frank Tip.
          <article-title>“An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation</article-title>
          .
          <source>” IEEE Transactions on Software Engineering</source>
          <volume>50</volume>
          , no.
          <issue>1</issue>
          (
          <year>January 2024</year>
          ):
          <fpage>85</fpage>
          -
          <lpage>105</lpage>
          . https://doi.org/10.1109/TSE.
          <year>2023</year>
          .
          <volume>3334955</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[11] “Domain Specific Languages.” Accessed September 17</source>
          ,
          <year>2024</year>
          . https://martinfowler.com/books/dsl.html.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Levenshtein</surname>
            ,
            <given-names>Vladimir I.</given-names>
          </string-name>
          “
          <article-title>Двоичные Коды с Исправлением Выпадений, Вставок и Замещений Символов [Binary Codes Capable of Correcting Deletions</article-title>
          , Insertions, and Reversals].
          <source>” Soviet Physics Doklady</source>
          <volume>163</volume>
          , no.
          <issue>4</issue>
          (
          <year>February 1966</year>
          ):
          <fpage>845</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>