<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SMT</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>On Writing SMT-LIB Scripts: Metrics and a New Dataset</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soaibuzzaman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Oliver Ringert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bauhaus-University Weimar</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>23</volume>
      <fpage>10</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>Satisfiability Modulo Theory (SMT) checking is concerned with checking the satisfiability of first-order formulas with respect to some background theories. The SMT-LIB format is a standardized language for scripts expressing SMT problems. Popular datasets of SMT-LIB scripts have been collected for benchmarking SMT solvers. Rather than focusing on evaluating SMT solvers, our work focuses on exploring how novice users write SMT-LIB scripts. We present a dataset of SMT-LIB scripts with fine-grained editing paths. The dataset consists of 2,415 editing paths with a total of 18,133 SMT-LIB scripts. All scripts were collected from a web-based interface for the Z3 SMT solver in educational settings. We analyze the dataset in terms of sizes of scripts, errors users make, similarities of consecutive scripts, editing distances, and edit steps required to fix errors. We make the dataset and the code for computing our metrics available for future research on language design, tool support, and teaching materials.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;SMT-LIB</kwd>
        <kwd>evolution</kwd>
        <kwd>dataset</kwd>
        <kwd>novices</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Satisfiability Modulo Theory (SMT) checking [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is concerned with checking the satisfiability of
firstorder formulas with respect to some background theories. SMT solvers [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ] are tools for reasoning
about SMT problems, e.g., for checking satisfiability or generating unsatisfiability proofs.
      </p>
      <p>
        The SMT-LIB format is a standardized language [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] for scripts expressing SMT problems. The format
of the language was designed as a standard input formal for SMT solvers and is continuously growing
to foster and reflect advances in SMT. The SMT-LIB syntax has inspired various other languages from
diferent domains, e.g., verification [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], modeling [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], and syntax-guided synthesis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The SMT-LIB
standard is supported by most SMT solvers and API layers for using SMT solvers [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ].
      </p>
      <p>
        Popular datasets of SMT-LIB scripts [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], containing more than 400,000 SMT-LIB scripts, have
been collected for benchmarking SMT solvers [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In addition to performance, many works address the
correctness and quality of SMT solvers through testing on the input- [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or API-level [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Much fewer
works deal with the analysis of SMT-LIB scripts themselves. Some exceptions are tools and libraries
for validating and translating scripts [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ] or tools for debugging these [
        <xref ref-type="bibr" rid="ref19 ref20">19, 20</xref>
        ]1. Dolmen [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] ofers
parsing, validation, and type-checking capabilities, enabling real-time error detection in editors such
as VS Code and Emacs. However, we are not aware of any tools ofering contextual autocompletion,
refactoring support, or complex dependency resolution. Similarly, we are not aware of any studies on
quality aspects nor experiences of users writing SMT-LIB scripts.
      </p>
      <p>
        Thus, rather than focusing on evaluating SMT solvers, our work focuses on exploring how novice
users write SMT-LIB scripts. We present the FMPsmt dataset of SMT-LIB scripts with fine-grained editing
paths, i.e., sequences of SMT-LIB scripts that were consecutively written and analyzed by (novice)
users. The dataset consists of 2,415 editing paths with a total of 18,133 SMT-LIB scripts. All scripts were
collected from a web-based interface for the Z3 SMT solver [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] in educational settings. Similar datasets
exist for other specification languages, e.g., the Alloy language [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ].
      </p>
      <p>
        We analyze the dataset in terms of sizes of scripts, errors users make, similarities of consecutive scripts,
editing distances, and edit steps required to fix errors. Our analyses are inspired by similar analyses
conducted on the aforementioned Alloy datasets [
        <xref ref-type="bibr" rid="ref22 ref23">23, 22</xref>
        ]. We make the new FMPsmt dataset [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and
the code for computing our metrics [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] available for future research on language design, tool support,
and teaching materials.
      </p>
      <p>The remainder of this work is structured as follows. Section 2 briefly presents the foundations of our
work. Section 3 lists our research questions, Sect. 4 presents our dataset and data processing. Section 5
presents the evaluation of our research questions on the datasets. We conclude in Sect. 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>
        We now give a brief overview of the SMT-LIB language specification and the Formal Methods Playground.
The SMT-LIB standard [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] defines a textual scripting language for sorted first-order logic terms and
formulas and commands to interact with SMT solvers, e.g., for creating assertions, checking their validity,
and inspecting generated models and proofs. The standard also provides languages for specifying logics
and background theories less relevant our analysis presented here. An example SMT-LIB script is shown
in Listing 1. It encodes part of the Agatha puzzle [26, P.55] consisting of various statements relating
Agatha, her butler, and Charles to determine who killed Agatha.
      </p>
      <p>Listing 1: Excerpt of an example SMT-LIB script from the FMPsmt dataset</p>
      <p>SMT-LIB scripts are sequences of commands, e.g., the set-logic command selecting the logic of
uninterpreted functions in Listing 1, l. 1, the command assert for asserting formulas in Listing 1,
ll. 9-15, or the command check-sat for checking satisfiability of the asserted formulas. Additional
commands with prefix declare- introduce datatypes (Listing 1, l. 2), constants, and functions (Listing 1,
ll. 3-6). The command get-model (Listing 1, ll. 17), if successful, retrieves an interpretation of constants
and functions satisfying all asserted formulas and presents them as function definitions, e.g., the
interpretation of constant Killer will be defined as the value Agatha, Butler, or Charles.</p>
      <sec id="sec-2-1">
        <title>2.2. Formal Methods Playground</title>
        <p>
          The Formal Methods Playground2 is a web application for writing and analyzing models, specifications,
and scripts in various modeling and specification languages. We have developed this application mainly
for teaching, e.g., slides in our Formal Methods for Software Engineering lecture [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] contain permalinks
to example scripts on the Formal Methods Playground for direct analysis in the browser. Currently,
the Formal Methods Playground supports Limboole3, Z3 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], Alloy [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ], nuXmv [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] and Spectra [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]
specifications. For analyzing SMT-LIB scripts, the input pane of the Formal Methods Playground consists
2See https://play.formal-methods.net and https://www.youtube.com/playlist?list=PLGyeoukah9NYq9ULsIuADG2r2QjX530nf
3See https://fmv.jku.at/limboole/
of an editor with syntax highlighting and basic code completion and search capabilities. The output
pane shows the output of the Z3 solver executed on the input SMT-LIB script verbatim. The analysis of
SMT-LIB scripts is usually performed locally in the users’ browser using Web Assembly [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] without
any timeout or as a fall-back on our server with a timeout of 60s. We have added basic language support,
e.g., syntax checking and referencing constant and function symbols, as an optional feature in February
2025 (we have not evaluated the use and efect of these features so far).
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Research Questions</title>
      <p>We aim to gain a better understanding of how novice users write and evolve SMT-LIB scripts.</p>
      <p>We define the following research questions:
• RQ1: What are the key characteristics of the FMPsmt dataset?
• RQ2: Where do users most commonly introduce syntactic errors in SMT-LIB scripts?
• RQ3: How do consecutive SMT-LIB scripts difer?
• RQ4: How large are the edit distances between consecutive SMT-LIB scripts?
• RQ5: How do users fix errors over multiple edit steps?</p>
    </sec>
    <sec id="sec-4">
      <title>4. Data Processing and Metrics Computation</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Data</title>
        <p>
          We analyze our proposed Formal Methods Playground SMT-LIB (FMPsmt) dataset [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. It contains
SMT-LIB scripts executed on the Formal Methods Playground from November 2023 to May 2025.
Students at Bauhaus-University Weimar use this platform as part of our Formal Methods for Software
Engineering [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] module. Students taking this module are mainly from the international programs
Digital Engineering (DE, an interdisciplinary 2-year MSc program) and Computer Science for Digital
Media (CS4DM, a 2-year MSc program). While the background of the CS4DM students is a classical BSc
degree in computer science, the second, and larger group of DE students are from diverse engineering,
e.g., civil, mechanical, or even electrical engineering, (∼ 50%) and computer science (∼ 50%) backgrounds.
Based on user activity and the models authored, at least one additional university uses the platform.
We capture scripts, timestamps, and historical derivations structured in a parent-child relationship. The
FMPsmt dataset contains 18,133 SMT-LIB scripts.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Edit Paths</title>
        <p>The FMPsmt datasets maintain records of the previous revision of each SMT script. Each revision is a
user submission i.e., the current script whenever the user executes the solver. We utilize this information
to reconstruct the edit path4, allowing us to capture the sequences of edits/submissions made by users
(these edits are typically small, see Sect. 5.4).</p>
        <p>The FMPsmt dataset consists of 2,415 unique edit paths. In particular, the top 25% of the edit paths
have a length greater than 13 for FMPsmt, with median lengths of 6. The FMPsmt dataset has 272 unique
initial scripts5 within the edit paths.</p>
        <p>Note that the number of scripts in edit paths exceeds the total number of scripts in the dataset. The
exchange of permalinks to individual scripts via the Formal Methods Playground allows the branching
of edit paths leading to common prefixes.</p>
        <sec id="sec-4-2-1">
          <title>4We adopt the terminology of [23] and [22]. 5While many edit paths start from scratch, most edit paths share initial models provided by instructors [27].</title>
          <p>Q3</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Semantic Comparison of SMT-LIB scripts</title>
        <p>
          For semantic comparison, we utilize a lightweight procedure (implemented using Z3 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] APIs that we
make available from [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]) that verifies the semantic entailment between the assertion sets collected
from two compared scripts. This process classifies their semantic relationship as either equivalent,
incomparable, or a refinement in either direction. Specifically, we evaluate two queries: whether script
1 semantically entails script 2, and vice versa. If both hold, the scripts are semantically equivalent; if
neither does, they are incomparable; and if only one direction holds, we report a refinement relation.
        </p>
        <p>It is important to note that this semantic comparison approach is naive and has several limitations.
The comparison is oblivious to variable renaming, ignores push and pop scopes, and can misclassify
scripts that contain inconsistent or unsatisfiable assertions. Consequently, the reported relationships
are approximate and may not accurately reflect subtle semantic nuances. However, relatively small
edit distances (see Sect. 5.4), i.e., syntactically very similar scripts in edit paths, and low numbers of
incremental scripts (see Sect. 5.1) suggest a potential for the analysis on the FMPsmt dataset.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>We now present data to answer our research questions from Sect. 3. All experiments used Z3 v4.13.4
and were executed on a server with an Intel Xeon Platinum 8260 CPU @ 2.4GHz running Ubuntu 22.04.</p>
      <sec id="sec-5-1">
        <title>5.1. RQ1: Dataset Characteristics</title>
        <p>To characterize the FMPsmt dataset, we conducted a quantitative analysis based on several structural
and syntactic features. These include efective lines of code (ELOC), the frequency of common SMT-LIB
commands such as assert, declare-const, check-sat, etc., as well as the maximum nesting depth
of expressions within each script.</p>
        <p>Table 1a summarizes the distribution of key structural features in the FMPsmt dataset, highlighting
25th percentile, median, 75th percentile, and maximum values. Efective lines of code (ELOC) have a
median of 26 and a maximum of 1,531, indicating most scripts are small, but a few are much larger. The
maximum nesting depth of operators is generally low, with a median maximum nesting depth of 5,
with some outliers reaching 42, suggesting instances of highly nested formulas. The counts of diferent
commands per script increase with ELOC.</p>
        <p>
          The FMPsmt dataset comprises 1,221 (6.73%) scripts with multiple check-sat commands (incremental
scripts in the terminology of the SMT-LIB benchmark [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]). Execution times vary considerably; as
indicated in Table 1a, 75% of the scripts finish in under 0.03 seconds, though some outliers extend up to
318.27 seconds. A uniform timeout of 600 seconds was applied based on our expectation of how long
users might wait for the in-browser SMT execution on the Formal Methods Playground. Only 28 scripts
in the dataset timed out.
        </p>
        <p>The dataset spans a wide range of SMT logics—38 in total (including typos)—with QF_UF (1,819
scripts), QF_LIA (1,447 scripts), and QF_AUFLIA (401 scripts) being the most common. Note that many
scripts do not declare a logic, as Z3 does not enforce this. Less frequent logics include specialized or
rarely used ones, such as QF_SLIA, appearing only once. This variety emphasizes the dataset’s extensive
scope, encompassing both quantifier-free and quantified fragments.
50.46%
65.26%
32.92%</p>
        <p>Edit Paths (100%)
With Invalid Scripts (%)
Without Valid Scripts (%)
(b)</p>
        <p>#</p>
        <p>Table 1b summarizes the distribution of edit path lengths. The dataset comprises 2,415 distinct edit
paths, with lengths from 1 to 321 edits. The median length is 6, and the 75th percentile is 13, suggesting
that while some users finish their analyses quickly, many engage in longer editing processes. Notably,
over 58% of edit paths have five or more steps, highlighting the significant time and efort users invest
in refining their scripts.</p>
        <p>We also investigated the frequency with which users encounter syntactic errors in these paths.
Table 2b shows that 59.13% of edit paths contain at least one invalid script, suggesting users often make
erroneous iterations before reaching a correct version. Only 1.41% of paths consist entirely of invalid
scripts, indicating instances where users cannot construct a syntactically valid SMT-LIB script.</p>
        <p>Another important characteristic of the dataset is the degree of syntactic uniqueness across scripts.
As shown in Table 2a, 9150 scripts (50.46%) are syntactically unique, indicating that more than half of
the dataset comprises distinct SMT-LIB formulations rather than duplicates or near-duplicates. Among
these unique scripts, 5971 (65.26%) are syntactically correct, highlighting a reasonably high rate of
well-formed inputs within the diverse subset.</p>
        <p>In summary, the FMPsmt dataset consists of SMT-LIB scripts of mainly small to medium
sizes (median 26 ELOC) and complexities, encompassing a broad range of 38 diferent logics
(including typos). The execution times are typically brief (within tens of milliseconds), and
users often refine scripts iteratively. The dataset includes a wide range of problem instances, as
well as diverse modeling styles and patterns.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. RQ2: Where do users most commonly introduce syntactic errors in SMT-LIB scripts?</title>
        <p>Among the 18,133 SMT-LIB scripts analyzed within the FMPsmt dataset, we observed that approximately
40% contain at least one syntax error, which is distributed across various SMT commands. To analyze
common syntactic errors in SMT-LIB scripts, we developed a parser to identify and categorize errors.
For each identified syntactic error, we recorded the top-level SMT-LIB command, the absolute number
of errors, the total occurrences of the command, and calculated the relative percentage of errors for
each command.</p>
        <p>Table 3 presents the ten commands most susceptible to user-induced syntactic errors, evaluated by
both absolute and relative frequencies of errors. The assert command accounts for the highest absolute
number of errors, with 35,132 instances (11.01%), likely due to the inherent complexity of the logical
expressions it often encompasses. In contrast, the declare-const command shows 10,920 errors, but
with a lower error rate of 4.85%, indicating that it is mostly used correctly despite its frequent use.
Interestingly, the commands get-value and eval exhibit the highest relative error rates, at 30.62% and
27.96%, respectively. These commands are less common but appear to be more error-prone. However,
command
assert
declare-const
declare-fun
get-value
define-fun
get-model
declare-datatype
check-sat
eval
quantifiers</p>
        <p>abs. # error # total elements rel. % of command
note that our automated analyses of errors in SMT-LIB scripts did not distinguish between static syntax
errors, e.g., invoking command get-value for an undeclared constant, and runtime errors, e.g., invoking
command get-value without a model. We leave a more detailed investigation to future work.</p>
        <p>The absolute and relative numbers of errors in Table 3 suggest that while assert is frequently used
and therefore contributes a large raw number of errors, commands like get-value and eval might
proportionally be more challenging for novice users to implement correctly.</p>
        <p>
          Building on the analysis of error distribution across SMT-LIB commands, we investigated the specific
types of syntactic errors encountered by users. Table 4 summarizes the top 10 categories of syntactic
errors observed in the FMPsmt dataset. These errors are classified according to frequent patterns
observed in the error messages. Detailed information regarding the exact and complete error messages
for each script6 can be found in the replication package [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
        </p>
        <p>The most prevalent error, accounting for over 50% of all cases, is "Unknown constant", which typically
arises when users reference symbols that have not been previously declared. This suggests a recurring
issue with undeclared variables or scope-related oversights in SMT-LIB scripts. The following most
common categories involve incorrect or malformed declarations, such as "Invalid constant declaration"
(9.8%) and "Parsing function declaration" errors (7.16%), indicating that sort mismatches and incorrect
syntactic forms in declarations are also significant sources of failure.</p>
        <p>
          These errors highlight a common challenge that novice users encounter with sort declarations and
naming conventions in SMT-LIB. Many of these issues are avoidable and could likely be minimized
through improved language support, such as context-aware autocompletion, scoping, referencing,
validation, and informative diagnostics that guide users toward correct usage as they write. Partially,
these analyses are already supported by existing tools, e.g., Dolmen [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] supports validation and
type-checking, and evaluating their impact might be interesting future work.
        </p>
        <p>Errors such as "Logic does not support" and "Model is not available" indicate issues related to the
selected logic or solver state. If the commands are misused in a given context, these issues can also
appear as syntactic problems.</p>
        <p>In summary, syntactic errors in the FMPsmt dataset (mainly referencing unknown constants
or writing invalid declarations) are both frequent and concentrated around specific SMT-LIB
commands and constructs (mainly in assertions and declarations). Common errors include
malformed or mismatched declarations, pointing to challenges in understanding SMT-LIB’s
type and declaration system. These insights suggest a need for improved tooling and guidance,
such as contextual validation, better scoping support, and more intelligent error detection and
error messages to assist users.
6See file https://raw.githubusercontent.com/se-buw/smt-metrics/refs/tags/SMT2025/results/fmp_error_category.csv</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. RQ3: How do consecutive SMT-LIB scripts difer?</title>
        <p>To understand the evolution of SMT-LIB scripts during user editing sessions, we analyzed the diferences
between “consecutive” (i.e., directly successive) and “non-consecutive” (i.e., scripts related through longer
edit paths) script pairs along each edit path (see Sect. 4.2). Our analysis addresses both syntactic identity
and various semantic relationships, including equivalence (≡ ), incomparability (̸=), and refinement
relationships (1 ⊏ 2 and 2 ⊏ 1) as mentioned in Sect. 4.3.</p>
        <p>As shown in Table 5, a substantial number of consecutive script pairs (4,319 pairs) were syntactically
identical, which is significantly higher than for non-consecutive pairs (2,121 pairs). This suggests users
often submit the same script without modifications, e.g., when reviewing their own earlier work or
examples provided in permalinks.</p>
        <p>This analysis revealed that only 23.86% of consecutive script pairs (6,332 pairs) in the edit paths are
semantically equivalent, while over 10% (2,805 pairs) are incomparable, and roughly 11% (1,149 and 1,748
pairs) exhibit refinement relations. This indicates significant changes in specification behavior, even
among consecutive edits. Furthermore, we have identified 216 unknowns. Note that these percentages
do not add up to 100% as one syntactically incorrect script in a pair prevents our semantic comparison
of that pair.</p>
        <p>In contrast, non-consecutive script pairs on the same edit path demonstrate lower semantic
equivalence (8.04%) but higher rates of incomparability (19.48%). These trends suggest that as users make
more edits over time, scripts tend to diverge semantically, reflecting an increasing complexity in the
SMT script. Interestingly, refinement relationships remain prevalent even in non-consecutive pairs,
implying that users incrementally strengthen or weaken specifications throughout the editing process.</p>
        <p>As an example, Listings 2 and 3 demonstrate a refinement relation between two SMT-LIB scripts.
Listing 2 has a stronger specification that creates a cyclic chain of Boolean implications leading to a
contradiction. Conversely, Listing 3 asserts a weaker property that is always satisfiable. Our semantic
comparison reveals that Listing 2 ⊏ Listing 3: all models of the listing 2 (none, in this case) satisfy the
assertions in Listing 3, but not vice versa.</p>
        <p>In summary, over 4,300 consecutive pairs are syntactically identical, indicating user habits,
such as rerunning unchanged scripts. Consecutive scripts show high numbers of cases of
semantic equivalence and refinement, implying that users refine their logic incrementally. The
persistence of refinement in both consecutive and non-consecutive pairs highlights a frequent
pattern of strengthening or weakening specifications, suggesting a trial-and-error process in
developing SMT scripts.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. RQ4: How large are the edit distances between consecutive SMT-LIB scripts?</title>
        <p>
          To explore the evolution of SMT-LIB scripts across edit paths, we calculate the Levenshtein distance [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]
between successive scripts. This metric quantifies the number of single-character insertions, deletions,
or substitutions required to transform one script into another, providing a straightforward yet insightful
assessment of syntactic variation.
        </p>
        <p>The findings are summarized in Table 6, illustrating that most script edits tend to be minor. The 25th
percentile of edit distances is 6, while the median stands at 51, suggesting that users implement only
slight adjustments between successive versions in over half of the instances. These adjustments can
involve minor tweaks such as refining logical assertions, correcting constants, adding or removing a
single command, or fixing formatting issues.</p>
        <p>Nonetheless, the distribution shows a lengthy tail of significant edits. The 75th percentile sits at 315
characters, while the maximum distance reaches 38,659, indicating infrequent yet significant rewrites.
These large distances correspond to major structural changes or cases where the user replaces the
existing script entirely, potentially because they are starting anew.
In summary, the edit distances between consecutive SMT-LIB scripts indicate that the majority
of user edits are small (median 51 characters). This suggests that users generally implement
incremental adjustments such as refining assertions, correcting errors, or modifying individual
commands– rather than undertaking comprehensive overhauls of entire scripts.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. RQ5: How do users fix errors over multiple edit steps?</title>
        <p>Our analysis indicates that approximately 40% of all scripts exhibit at least one syntax error (see Sect. 5.1),
while approximately 60% of the edit paths are characterized by the presence of at least one erroneous
script (see Table 1b).</p>
        <p>To quantify the efort required for error resolution, we calculated the number of consecutive edit
steps necessary to rectify specific types of issues. Table 7 presents the quartiles and maximum values
for the steps involved in correcting syntax errors and transforming an unsatisfiable (UNSAT) script into
a satisfiable (SAT) script.</p>
        <p>The data indicates that most syntax errors are corrected swiftly: both the 25th percentile (Q1) and
the median stand at 1, meaning that in at least half of the instances, users address a parse error in just
one edit step. The 75th percentile (Q3) is 3, whereas the most prolonged fix recorded took 52 edit steps.
This suggests that while fast recovery is typical, some users face considerable challenges, particularly
with longer or more intricate scripts.</p>
        <p>Similarly, the process of resolving semantic unsatisfiability, which involves changing a script from
UNSAT to SAT, also demonstrates a high degree of eficiency. The median number of steps required is 1,
with 75% of transitions occurring within two steps. This suggests that users often make precise logical
adjustments in a single step once they recognize unsatisfiability. However, some complex semantic
issues resulted in up to 58 steps for correction.</p>
        <p>These findings indicate that novice users are typically able to address syntactical issues and
advance towards satisfiability with low efort; however, they also underscore the existence of non-trivial
challenges in a minority of instances.</p>
        <p>In summary, most users resolve syntax and satisfiability issues quickly, often within a single
edit. This indicates users typically identify and fix problems with minimal efort. However,
some cases require extensive revisions, highlighting that, while rapid recovery is common, some
users face significant challenges with complex debugging tasks.</p>
      </sec>
      <sec id="sec-5-6">
        <title>5.6. Threats to Validity</title>
        <p>
          We now identify and discuss various threats to the validity of our analyses. For transparency and
reproducibility we provide the implementation of our processing in [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
        </p>
        <p>First, we assume that the use of the Formal Methods Playground across the dataset is similar to that
of our students, i.e., small models are created and analyzed. However, we have no control over how
other users use the publicly available platform, e.g., it could be used for teaching with very narrow task
definitions. To assess this uncertainty, we have assessed unique initial nodes of edit paths in Sect. 4.2,
giving some confidence in the models’ variability.</p>
        <p>Second, the permalinks provided in the lecture materials might have inadvertently encouraged
users to begin their models from these pre-existing structures. This could afect the root nodes of edit
paths, complicating the comparison of model evolution across various initial states and potentially
underestimating the diversity of evolution paths. Additionally, users may forget to reset the edit path
after completing one model and before starting another, i.e., the edit paths of the new model may
incorrectly link to the previous one. This could distort the sequence of changes and hinder our ability
to analyze the edit paths accurately.</p>
        <p>Third, the dataset captures only a small portion of the SMT-LIB standard and is primarily written
by novice users introduced to SMT-LIB through lecture materials. Any findings are restricted to
these assumptions and may not be generalized. In particular, while it might seem surprising that
unsatisfiability is often resolved in a single edit step, this could be due to the prevalence of small SMT
scripts with few assertions.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We have presented and analyzed the Formal Methods Playground SMT-LIB (FMPsmt) dataset. We
analyzed the dataset in terms of sizes of scripts, errors users make, syntactic and semantic similarity
of consecutive scripts, editing distances, and edit steps required to fix errors. Syntactic errors are
highly prevalent in writing SMT-LIB scripts, and our analysis reveals common syntax error classes that
could likely be avoided with improved tool support, such as code completion and reference and scope
checking. Interestingly, debugging unsatisfiability did not appear to present a problem to users, and
scripts returned to being satisfiable often within a single editing step.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Data Availability</title>
      <p>
        We have made the Formal Methods Playground SMT-LIB (FMPsmt) dataset publicly available on Zenodo
as [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. In addition, to support reproducibility of our metrics computations and the preprocessing, we
have made the implementation used for our analyses available in a GitHub repository as [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>
        We want to acknowledge Salar Kalantari’s MSc thesis [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] on a preliminary analysis (inspiring Table 4
and including a dificulty metric, not presented here) of a smaller, earlier subset of the FMPsmt dataset.
      </p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <sec id="sec-9-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Seshia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tinelli</surname>
          </string-name>
          ,
          <article-title>Satisfiability modulo theories</article-title>
          ,
          <source>in: Handbook of Satisfiability - Second Edition</source>
          , volume
          <volume>336</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2021</year>
          , pp.
          <fpage>1267</fpage>
          -
          <lpage>1329</lpage>
          . doi:
          <volume>10</volume>
          .3233/FAIA201017.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>L. M. de Moura</surname>
            ,
            <given-names>N. S.</given-names>
          </string-name>
          <string-name>
            <surname>Bjørner</surname>
          </string-name>
          ,
          <article-title>Z3: an eficient SMT solver</article-title>
          ,
          <source>in: TACAS</source>
          <year>2008</year>
          , volume
          <volume>4963</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2008</year>
          , pp.
          <fpage>337</fpage>
          -
          <lpage>340</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -78800-3\_
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Conway</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Deters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hadarean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jovanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          , C. Tinelli, CVC4,
          <source>in: CAV</source>
          <year>2011</year>
          , volume
          <volume>6806</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2011</year>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>177</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>642</fpage>
          -22110-1\_
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Dutertre</surname>
          </string-name>
          , Yices
          <volume>2</volume>
          .2, in:
          <source>CAV</source>
          <year>2014</year>
          , volume
          <volume>8559</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>737</fpage>
          -
          <lpage>744</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>319</fpage>
          -08867-9\_
          <fpage>49</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fontaine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tinelli</surname>
          </string-name>
          ,
          <source>The SMT-LIB Standard: Version 2</source>
          .7,
          <string-name>
            <surname>Technical</surname>
            <given-names>Report</given-names>
          </string-name>
          , Department of Computer Science, The University of Iowa,
          <year>2025</year>
          . Available at https://www.SMT-LIB.org.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Griggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tonetta</surname>
          </string-name>
          ,
          <article-title>The VMT-LIB language and tools</article-title>
          ,
          <source>in: SMT</source>
          <year>2022</year>
          , volume
          <volume>3185</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>80</fpage>
          -
          <lpage>89</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3185</volume>
          / extended9547.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Day</surname>
          </string-name>
          ,
          <article-title>Avestan: a declarative modeling language based on SMT-LIB</article-title>
          ,
          <source>in: MiSE</source>
          <year>2012</year>
          , IEEE Computer Society,
          <year>2012</year>
          , pp.
          <fpage>36</fpage>
          -
          <lpage>42</lpage>
          . doi:
          <volume>10</volume>
          .1109/MISE.
          <year>2012</year>
          .
          <volume>6226012</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Day</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakili</surname>
          </string-name>
          ,
          <article-title>Representing hierarchical state machine models in SMT-LIB</article-title>
          ,
          <source>in: MiSE@ICSE</source>
          <year>2016</year>
          , ACM,
          <year>2016</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          . doi:
          <volume>10</volume>
          .1145/2896982.2896990.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Padhi</surname>
          </string-name>
          , E. Polgreen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raghothaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Reynolds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Udupa</surname>
          </string-name>
          ,
          <source>The sygus language standard version 2</source>
          .1, CoRR abs/2312.06001 (
          <year>2021</year>
          ). arXiv:
          <volume>2312</volume>
          .
          <fpage>06001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gario</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Micheli</surname>
          </string-name>
          ,
          <article-title>Pysmt: a solver-agnostic library for fast prototyping of smt-based algorithms</article-title>
          ,
          <source>in: SMT Workshop</source>
          <year>2015</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Baier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Friedberger</surname>
          </string-name>
          ,
          <article-title>Javasmt 3: Interacting with SMT solvers in java</article-title>
          ,
          <source>in: CAV2021</source>
          , volume
          <volume>12760</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>208</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -81688-9\_9.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Preiner</surname>
          </string-name>
          , H.
          <article-title>-</article-title>
          <string-name>
            <surname>J. Schurr</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Barrett</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Fontaine</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Niemetz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Tinelli</surname>
          </string-name>
          ,
          <article-title>Smt-lib release 2024 (incremental benchmarks</article-title>
          ),
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.11186591.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Preiner</surname>
          </string-name>
          , H.
          <article-title>-</article-title>
          <string-name>
            <surname>J. Schurr</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Barrett</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Fontaine</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Niemetz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Tinelli</surname>
          </string-name>
          ,
          <article-title>Smt-lib release 2024 (nonincremental benchmarks</article-title>
          ),
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.11061097.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stump</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <article-title>The 2013 evaluation of SMT-COMP and</article-title>
          <string-name>
            <surname>SMT-LIB</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Autom</surname>
          </string-name>
          . Reason.
          <volume>55</volume>
          (
          <year>2015</year>
          )
          <fpage>61</fpage>
          -
          <lpage>90</lpage>
          . doi:
          <volume>10</volume>
          .1007/S10817-015-9328-2.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Winterer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <article-title>On the unusual efectiveness of type-aware operator mutations for testing SMT solvers</article-title>
          ,
          <source>Proc. ACM Program. Lang</source>
          .
          <volume>4</volume>
          (
          <year>2020</year>
          )
          <volume>193</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>193</lpage>
          :
          <fpage>25</fpage>
          . doi:
          <volume>10</volume>
          .1145/3428261.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Niemetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Preiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Biere</surname>
          </string-name>
          ,
          <article-title>Model-based API testing for SMT solvers</article-title>
          ,
          <source>in: SMT</source>
          <year>2017</year>
          , volume
          <volume>1889</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>14</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-1889/paper1.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bury</surname>
          </string-name>
          ,
          <article-title>Dolmen: A validator for SMT-LIB and much more</article-title>
          ,
          <source>in: SMT</source>
          <year>2021</year>
          , volume
          <volume>2908</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cok</surname>
          </string-name>
          , jsmtlib: Tutorial,
          <article-title>validation and adapter tools for smt-libv2</article-title>
          ,
          <source>in: NFM</source>
          <year>2011</year>
          , volume
          <volume>6617</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2011</year>
          , pp.
          <fpage>480</fpage>
          -
          <lpage>486</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -20398-5\_
          <fpage>36</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>O.</given-names>
            <surname>Guthmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Strichman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Trostanetski</surname>
          </string-name>
          ,
          <article-title>Minimal unsatisfiable core extraction for SMT</article-title>
          ,
          <source>in: FMCAD</source>
          <year>2016</year>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          . doi:
          <volume>10</volume>
          .1109/FMCAD.
          <year>2016</year>
          .
          <volume>7886661</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kremer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Niemetz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Preiner, ddsmt 2.0: Better delta debugging for the smt-libv2 language and friends</article-title>
          , in: A.
          <string-name>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. R. M. Leino</surname>
          </string-name>
          (Eds.),
          <source>CAV Proceedings</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          , volume
          <volume>12760</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>242</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -81688-9\_
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N.</given-names>
            <surname>Macedo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cunha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C. R.</given-names>
            <surname>Paiva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Ramalho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <article-title>Experiences on teaching alloy with an automated assessment platform</article-title>
          ,
          <source>Sci. Comput</source>
          . Program.
          <volume>211</volume>
          (
          <year>2021</year>
          )
          <article-title>102690</article-title>
          . doi:
          <volume>10</volume>
          .1016/J.SCICO.
          <year>2021</year>
          .
          <volume>102690</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Soaibuzzaman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kalantari</surname>
            ,
            <given-names>J. O.</given-names>
          </string-name>
          <string-name>
            <surname>Ringert</surname>
          </string-name>
          ,
          <article-title>On writing alloy models: Metrics and a new dataset</article-title>
          ,
          <source>in: ABZ</source>
          <year>2025</year>
          , volume
          <volume>15728</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -94533-5\ _5.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jovanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          ,
          <article-title>Right or wrong - understanding how users write software models in alloy</article-title>
          ,
          <source>in: SEFM</source>
          <year>2024</year>
          , volume
          <volume>15280</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>309</fpage>
          -
          <lpage>327</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -77382-2\_
          <fpage>18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Soaibuzzaman</surname>
            ,
            <given-names>J. O.</given-names>
          </string-name>
          <string-name>
            <surname>Ringert</surname>
          </string-name>
          ,
          <article-title>Formal methods playground smt-lib dataset</article-title>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .5281/ zenodo.15488370.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Soaibuzzaman</surname>
            ,
            <given-names>J. O.</given-names>
          </string-name>
          <string-name>
            <surname>Ringert</surname>
          </string-name>
          ,
          <article-title>Smt-lib metrics replication package</article-title>
          ,
          <year>2025</year>
          . Available from https: //github.com/se-buw/smt-metrics.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Pelletier</surname>
          </string-name>
          ,
          <article-title>Seventy-five problems for testing automatic theorem provers</article-title>
          ,
          <source>J. Autom. Reason</source>
          .
          <volume>2</volume>
          (
          <year>1986</year>
          )
          <fpage>191</fpage>
          -
          <lpage>216</lpage>
          . doi:
          <volume>10</volume>
          .1007/BF02432151.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Soaibuzzaman</surname>
            ,
            <given-names>J. O.</given-names>
          </string-name>
          <string-name>
            <surname>Ringert</surname>
          </string-name>
          ,
          <article-title>Introducing github classroom into a formal methods module</article-title>
          ,
          <source>in: FMTea</source>
          <year>2024</year>
          , volume
          <volume>14939</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>42</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -71379-8\_2.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jackson</surname>
          </string-name>
          ,
          <article-title>Alloy: a language and tool for exploring software designs</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>62</volume>
          (
          <year>2019</year>
          )
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
          . doi:
          <volume>10</volume>
          .1145/3338843.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cavada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dorigatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Griggio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mariotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Micheli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Roveri</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Tonetta,</surname>
          </string-name>
          <article-title>The nuxmv symbolic model checker</article-title>
          ,
          <source>in: CAV</source>
          , volume
          <volume>8559</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>334</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Maoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. O.</given-names>
            <surname>Ringert</surname>
          </string-name>
          ,
          <article-title>Spectra: a specification language for reactive systems</article-title>
          ,
          <source>Softw. Syst. Model</source>
          .
          <volume>20</volume>
          (
          <year>2021</year>
          )
          <fpage>1553</fpage>
          -
          <lpage>1586</lpage>
          . doi:
          <volume>10</volume>
          .1007/S10270-021-00868-Z.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monroe</surname>
          </string-name>
          , P. de Halleux,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lerner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Bjørner</surname>
          </string-name>
          ,
          <article-title>Z3guide: A scalable, studentcentered, and extensible educational environment for logic modeling</article-title>
          ,
          <source>CoRR abs/2506</source>
          .08294 (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2506.08294.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>V. I. Levenshtein</surname>
          </string-name>
          ,
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          ,
          <source>Proceedings of the Soviet physics doklady</source>
          (
          <year>1966</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kalantari</surname>
          </string-name>
          ,
          <article-title>Exploration of Specifications written by Novices: SAT, SMT</article-title>
          , Alloy, NuSMV,
          <source>Master's thesis</source>
          , Bauhaus-University Weimar,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>