<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Operationalizing the Formalizability of Mathematics Problems for Intelligent Tutoring Systems: Taxonomy, Measurement Protocol, and Educational Impact⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr Yevdokymov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andriy Chukhray</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tetiana Stoliarenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National aerospace university «Kharkiv Aviation Institute»</institution>
          ,
          <addr-line>Vadyma Manka St. 17, 61070 Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Simon Kuznets Kharkiv National University of Economics</institution>
          ,
          <addr-line>Nauky Avenue, 9A, 61166 Kharkiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Problem formalizability is examined under explicitly defined constraints of time and interaction budgets. A five-level taxonomy (F0-F4) is introduced to characterize different degrees of formalizability, together with a replicable protocol for estimating the distribution of these levels within a given course. To support practical applicability in intelligent tutoring systems (ITS), a systematic mapping is established between learner-facing mathematical interfaces and the programmatic APIs of state-of-the-art proof assistants (Lean, Coq/Rocq, Isabelle) as well as external Automated Theorem Proving (ATP) and Satisfiability Modulo Theories (SMT) back-ends. Worked mathematical examples are provided to demonstrate how this mapping can be realized in practice. The study references current manuals and standards (Isabelle2025 &amp; Sledgehammer, Coq/Rocq 8.19, Mathematics in Lean 2025, SMT-LIB 2.7, Z3, and cvc5 documentation) along with relevant benchmarks (miniF2F, LeanDojo, MATH) [1]-[6], [8]-[16]. To demonstrate feasibility, we prepared an illustrative pilot on a curated set of undergraduate-level problems drawn from typical algebra/calculus exercises. This pilot was not deployed in a live course, and its figures are demonstrative rather than course-level estimates. We supply a small reproducibility package (data, solver configurations, and proof-assistant scripts) to enable future replications and a planned in-situ study ⋆ProfIT AI'25: 5th International Workshop of IT-professionals on Artificial Intelligence, October 15-17, 2025, Liverpool, UK ⋆ You should use this document as the template for preparing your publication. We recommend using the latest version of the CEURART style. 1∗ Corresponding author. † These authors contributed equally. o.yevdokymov@khai.edu (O. Yevdokymov); achukhray@gmail.com (A. Chukhray); t_stol@ukr.net (T. Stoliarenko) 0009-0008-9687-6344 (O. Yevdokymov); 0000-0002-8075-3664 (A. Chukhray); 0000-0002-7202-316X (T. Stoliarenko)</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;intelligent tutoring systems</kwd>
        <kwd>formal proofs</kwd>
        <kwd>automated theorem proving</kwd>
        <kwd>Lean</kwd>
        <kwd>Coq/Rocq</kwd>
        <kwd>Isabelle</kwd>
        <kwd>Sledgehammer</kwd>
        <kwd>TPTP</kwd>
        <kwd>SMT-LIB 2</kwd>
        <kwd>7</kwd>
        <kwd>Z3</kwd>
        <kwd>cvc5</kwd>
        <kwd>ε-δ analysis</kwd>
        <kwd>ODEs</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern proof assistants, such as Isabelle/HOL, Coq (Rocq), and Lean, in combination with
Automated Theorem Proving (ATP) and Satisfiability Modulo Theories (SMT) back-ends, are now
capable of verifying extensive areas of mathematics, supported by steadily expanding libraries and
advanced automation techniques [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]–[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Nevertheless, a systematic planning instrument is still
lacking for educators to determine what proportion of a course’s problems can be readily
formalized and proved with modest instructional support, and which problems remain resistant
under
comparable
constraints.
      </p>
      <p>This
paper
proposes a resource-sensitive
definition
of
formalizability, introduces a five-level taxonomy (F0–F4), and establishes a replicable measurement
protocol. In addition, a structured</p>
      <p>
        mapping is presented from learner-facing mathematical
interfaces to the application programming interfaces (APIs) of Isabelle/Sledgehammer, Lean, and
Rocq/Coq, together with standardized bridges to external ATP (TPTP) and SMT-LIB back-ends
(current version 2.7, released February 5, 2025) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work: Educational Uses of Proof Assistants and ATP/SMT</title>
      <p>Lean in mathematics education. A growing body of work analyzes Lean as a vehicle for teaching
proof. Thoma &amp; Iannone report an exploratory study with first-year undergraduates, observing
positive effects on students’ ability to construct proofs—even on paper—after working with Lean
[17]. Massot details didactic goals and classroom experience, including controlled natural-language
and “verbose” modes for beginners [18]. A pilot by Bottoni in a “Foundations of Mathematics”
course evaluates Lean’s impact on understanding and on organizing practical sessions [19]. Beyond
formal studies, there is a wide practice of university courses and workshops that integrate Lean
into undergraduate math curricula (e.g., “Courses using Lean” and the “Learning Mathematics with
Lean” event series) [20], [21]. Collectively, these sources underscore Lean’s strengths for instant
feedback, graded hints, and scaffolded library use, while leaving open a course-level metric of what
fraction of typical problems can be quickly formalized.</p>
      <p>Coq (Rocq) in logic/programming courses. The long-running textbook series Software
Foundations has served as a “gentle on-ramp” to Coq for logic, semantics, and algorithm
verification; a “15 years on” retrospective summarizes pedagogical principles and experience at
scale [22], [23]. Earlier case studies document teaching logic and formal methods with Coq in
classroom settings [24]. These works highlight the effectiveness of tactical learning (induction,
equality rewriting, lia/ring automation) and stepwise feedback, but they do not provide
resource-oriented per-course estimates of problem shares.</p>
      <p>Isabelle/HOL in formal methods — and even examinations. Several “from the classroom” reports
exist. Villadsen et al. describe Isabelle in two courses on logic and automated reasoning and share
organizational practices [25]. Jacobsen shows how exams in automated reasoning can be built in
Isabelle so that a large portion of grading is semi-automated [26]. Updated tutorials (e.g., Nipkow’s
Programming and Proving in Isabelle/HOL and the Isabelle tutorial) serve as pedagogical
foundations, especially for induction and equivalence transformations typically used near the start
of a course [27], [28]. In practice, automation (notably Sledgehammer) reduces manual burden;
however, none of these works quantify a course in terms of “% of problems F0/F1…” under fixed
time/hint budgets.</p>
      <p>Where SMT and ATP fit pedagogically.In deductive verificationeducation, Why3-based courses
show how to structure topics (loop invariants, ghost code, specifications) with industrial SMT
back-ends [29]. In the ACL2 community, pedagogic IDEs such as DRACULA and Proof Pad ease
students’ first steps [30], [31]. For auto-checking programming tasks, SMT-based systems (e.g.,
AutoRubric) reduce student code to solver formulas for equivalence checking against references
[32]. There are also student-oriented guides to SMT modeling and solver use [33]. The conclusion
across these lines is consistent: SMT/ATP cover QF_LIA/NRA and equality problems well (often
F0–F1 in similar to proposed scale), yet their role in a comprehensive course profile is usually
described narratively—without systematic statistics.</p>
      <p>
        Links to AI-assisted proving and benchmarks. For hint selection and advanced challenge sets,
two modern corpora are particularly relevant: LeanDojo (programmatic access to Lean proof states
with retrieval augmentation) and miniF2F (formalized Olympiad-level problems) [34], [35]. These
resources focus on proof difficulty and lemma retrieval, but do not offer course -oriented estimates
of “what fraction of real curricular problems can be quickly formalized.” That precise gap is
addressed by this paper’s methodology (UI→API mapping in §3 and measurement protocol in §6)
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]–[16].
      </p>
      <p>A complementary perspective is provided by A. Chukhray et al., where proposed that formal
verification systems (i.e., interactive proof assistants and automated theorem provers) could be
integrated into Intelligent Tutoring Systems for higher mathematics courses in order to enable the
checking of complex problems, not just routine exercises [36]. This vision directly motivates our
resource-aware taxonomy, since it raises the practical question of which types of problems are
formally checkable within reasonable budgets and how such capabilities can be embedded into
adaptive ITS workflows.
3. Interface mapping (learner math UI → prover APIs → ATP/SMT)</p>
      <sec id="sec-2-1">
        <title>3.1. Conceptual layers</title>
        <p>
          



(A) Learner UI (math layer): typed expressions, goals, and actions (Simplify, Rewrite, Prove,
Search lemma, Solve inequalities).
(B) Logical AST: typed terms, binders (∀ ,∃ ), connectives, algebraic structures (semirings,
rings, fields), vectors/matrices [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
(C) Prover interface: tactics + libraries in Lean (mathlib), Rocq/Coq, Isabelle/HOL [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]–
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
(D) External automation: Sledgehammer bridges Isabelle to ATP/SMT; direct SMT-LIB calls
target Z3/cvc5 for fragments like QF_LIA/NRA; TPTP is the de-facto ATP format [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] ,[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]–[13].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Data model and translation</title>
        <sec id="sec-2-2-1">
          <title>Let the UI grammar of terms be and formulas</title>
          <p>
            t : := x∣ c∣ f (t )∣ t ⊕ t ∣ (t , t )∣ {t ∣ ϕ (t ) }
ϕ : :=t =t ∣ t ≤ t ∣ ¬ ϕ∣ ϕ∧ ϕ∣ ϕ∨ ϕ∣ ∀ x . ϕ∣ ∃ x . ϕ .
(1)
(2)
A compiler Φ maps UI inputs to a typed AST [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] of the target system (HOL with type
classes/locales). Constraints ϕ are normalized to solver fragments (e.g., QF_LIA/NRA) when
possible; otherwise they remain interactive subgoals. The ITP kernel checks proof terms; external
tools only propose steps [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>3.3. Mapping tables</title>
        <p>
          On Isabelle, Sledgehammer bridges to ATP/SMT; SMT-LIB 2.7 is the latest spec (Feb 2025); Z3/cvc5
provide proof/trace documentation [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]–[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [12], [13].
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>3.4. Benchmarks</title>
        <p>For challenge pools and evaluation in tutors and research: miniF2F (formal Olympiad-level tasks),
LeanDojo (programmatic access to Lean proof states), and MATH (text-first, step-by-step
competition problems) [14]–[16].</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.5. Correctness by translation (safety)</title>
        <p>
          If the UI→AST compiler Φ preserves types and binds symbols to intended structures (e.g., ℝ as a
field), and the target ITP kernel checks the proof term π of Φ(ϕ), then the original UI statement ϕ
holds in the intended structure. External ATP/SMT calls are advisory; kernels verify [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]–[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Operational definition and the F0–F4 taxonomy</title>
      <sec id="sec-3-1">
        <title>Definition (formalizability under budgetB).</title>
        <p>A mathematics problem P is formalizable under B if there exists:
1. an adequate encoding [ [ P ]] in the target logic/system and
2. a checked derivation (by automation and/or interactive steps) within B, where
B=(time limits , interaction limits , library scope )
(3)</p>
      </sec>
      <sec id="sec-3-2">
        <title>Taxonomy (indicative budgets; tune to course level).</title>
        <p>



</p>
        <p>F0 — Fully automatic. Solved without human hints (≤ 30 s wall time). Typical: rewriting,
linear arithmetic, polynomial identities. Lean: simp, linarith, ring; Coq: lia, ring;
Isabelle: simp, linarith, algebra_simps; SMT: QF_LIA/NRA; ATP: equality +
rewriting.</p>
        <p>F1 — Minimal hints. 1–3 tactics or a short sketch (≤ 5 min). May reuse one auxiliary lemma;
at most 2 external calls (Sledgehammer/SMT).</p>
        <p>F2 — Short structured proof. 5–20 min; several steps and 1–3 lemmas using standard
libraries (analysis/algebra).</p>
        <p>F3 — Nontrivial development. 20–60 min+; needs helper lemmas/definitions; substantial
tactic/solver orchestration.</p>
        <p>F4 — Impractical under BBB. Lacks abstractions/lemmas or requires costly encodings (e.g.,
delicate geometry, compactness/existence arguments).</p>
        <p>Drivers of complexity. (i) Library coverage (definitions/theorems present?); (ii) Logic/fragment
(e.g., first-order vs. higher-order, SMT theory availability); (iii) Proof archetype (equational,
inductive, ε–δ, existence); (iv) Lemma search (availability vs. invention).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Mathematical foundations and worked examples</title>
      <p>This section presents a sequence of representative mathematical results, each formulated as a
selfcontained statement and accompanied by a proof sketch. For each result, we provide formal code
snippets in Lean, Coq (Rocq), and Isabelle/HOL, which serve to illustrate its placement within the
proposed F-level taxonomy.</p>
      <sec id="sec-4-1">
        <title>5.1. Algebraic identities (ring normalization)</title>
        <p>Theorem 4.1 (Binomial identity). ∀ x , y∈ ℝ :( x + y )2= x2+2 xy + y2
Proof sketch. Normalization in the equational theory of commutative semirings.
1. Lean:</p>
        <p>theorem binomial (x y : ℝ) : (x + y)^2 = x^2 + 2*x*y + y^2 := by ring
2. Isabelle/HOL:
3. Coq / Rocq:
lemma binomial: "(x + y)^2 = x^2 + 2*x*y + y^2"
by (simp add: algebra_simps)
From Coq Require Import Reals Psatz.</p>
        <p>Open Scope R_scope.</p>
        <p>Lemma binomial : forall x y: R, (x + y)^2 = x^2 + 2*x*y + y^2.</p>
        <p>Proof. ring. Qed.</p>
        <p>Typical F-level: F0 (library-backed normalization).</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Linear arithmetic and transitivity</title>
        <p>Lemma 4.2. For all , , ∈ ℤ , if x ≤ y∧ y ≤ z then x ≤ z .</p>
        <p>Proof. Linear integer arithmetic.</p>
        <p>1. Lean:
import Mathlib.Data.Int.Basic
open Int
theorem le_transitive (x y z : ℤ) (h₁ : x ≤ y) (h₂ : y ≤ z) : x ≤ z := by
exact le_trans h₁ h₂
2. Coq / Rocq:</p>
        <p>From Coq Require Import ZArith Lia.</p>
        <p>Lemma le_transitive (x y z : Z) : x &lt;= y -&gt; y &lt;= z -&gt; x &lt;= z.</p>
        <p>Proof. intros; lia. Qed.</p>
        <p>Typical F-level: F0 (via lia/linarith/SMT QF_LIA).</p>
      </sec>
      <sec id="sec-4-3">
        <title>5.3. Induction and sums</title>
        <p>Proposition 4.3 (Arithmetic series). ∀ n∈ℕ :∑ k =
k=1
n
n(n+1)
2
.</p>
        <p>Proof sketch. Induction on n.</p>
        <p>Lean 4:
open Nat
theorem sum_nat (n : ℕ) :</p>
        <p>(∑ k in Finset.range (n+1), k) = n*(n+1)/2 := by
-- uses library lemmas about `Finset.range` and sums
simpa using Nat.sum_range_id
Typical F-level: F1–F2 (one-page structured proof with library lemmas).</p>
      </sec>
      <sec id="sec-4-4">
        <title>5.4. ε–δ reasoning for limits</title>
        <p>Proposition 4.4. For a , b∈ℝ , lim (b x )=b a</p>
        <p>x→a
Proof sketch.</p>
        <p>Given ε &gt;0 , choose δ =ε / max (1 ,|b|). If 0&lt;|x−a|&lt;δ then|b x−b a|=|b||x−a|&lt;ε
Typical F-level: F2–F3 (uses analysis libraries and ε–δ lemmas).</p>
      </sec>
      <sec id="sec-4-5">
        <title>5.5. Separable ODE</title>
        <p>Example 4.5. Solve y ′ =ky (constant k∈ ℝ)
Solution. y ( x )=C ekx, A formal proof uses integration facts or exponential series;
Typical F-level: F2–F3 depending on library coverage (existence/uniqueness, differentiability of ekx
).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Measurement protocol and statistical estimators</title>
      <p>Let {Pi }in=1be atomic problems sampled from a course.</p>
      <p>Pipeline:
1. Normalization: harmonize notation; split multi-part items into atomic goals.
2. Automation first: run SMT/ATP with strict timeouts (e.g., 30 s, 120 s).
3. ITP with light hints: Lean/Coq/Isabelle with a cap on hints/tactics (e.g., ≤ 3) and external
calls.
4. Labeling: assign F0–F4; record t formalize and t prove, #hints, #external calls, tool versions.
5. Dual
annotation:
two
annotators;
reconcile;
report</p>
      <p>Cohen’s

6. Aggregation: compute per-topic %F0…%F4 and medians.</p>
      <sec id="sec-5-1">
        <title>6.1. Proportions and uncertainty</title>
        <p>p^k=
1 n
n ∑i=1 1 {label ( Pi)= Fk }
(4)</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Integrating F-labels into ITS</title>
      <p>Adaptive scheduling.</p>
      <p>
        Mass practice &amp; instant feedback (F0–F1): auto-checkable pools; graded hints tied to mapped tactics
(e.g.,“Try linarith”). Strategy instruction (F2): scaffolded sketches; lemma search via apply? /
Sledgehammer[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Capstones (F3–F4): longer proofs; use miniF2F and LeanDojo tasks to assess
retrieval/lemma-search skills; compare with MATH for text-first problem difficulty [14]–[16].
      </p>
      <sec id="sec-6-1">
        <title>7.1. Feedback mapping with Reverse Projection</title>
        <p>Tactic failure → hint (UI math form).</p>
        <p>
</p>
        <p>System: linarith fails.</p>
        <p>Hint: “Try linarith after isolating linear terms.”
Math UI mapping: highlight inequality x ≤ y, y ≤ z and suggest: “Try combining inequalities
to derive x ≤ z.”
Proof state→ subgoals listed for the learner with micro-hints.</p>
      </sec>
      <sec id="sec-6-2">
        <title>7.2. Content authoring</title>
        <p>Build a bank of formalized templates aligned with weekly topics (algebra, analysis, linear algebra,
ODEs), each tagged with F-labels and example code for Lean/Coq/Isabelle.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>8. Illustrative pilot and reproducibility</title>
      <p>Scope and caveat. We conducted an illustrative pilot to exercise the labeling protocol on a curated
set of undergraduate problems. Importantly, this pilot was not run in a live course; its purpose is to
demonstrate workflow and artifacts, not to estimate course-level shares.</p>
      <p>Setup. The set comprises 12 atomic problems (algebraic identities and linear inequalities,
induction/sums, ε–δ limits, a separable-ODE pattern). Budgets were fixed as specified earlier:
automation timeouts 30 s &amp; 120 s, ITP hint cap ≤ 3, standard libraries for Lean 4 / Coq (Rocq) 8.19 /
Isabelle 2025, and Z3 / cvc5 / E / Vampire back-ends. Tool versions and commands are included in
the artifact.</p>
      <p>Annotation and reliability. Two authors independently labeled the items (F0–F4) under the same
budgets; disagreements were reconciled. Cohen’s κ = 0.78 with 83 % raw agreement (10/12),
indicating substantial agreement.
Interpretation. The F0–F1 items are routine equational/linear tasks (e.g., ring, linarith/lia;
QF_LIA in SMT). F2 typically needs short structured proofs (induction, ε–δ). F3 requires helper
lemmas/definitions (e.g., ODE facts). No F4 appeared under these budgets in this illustrative set.
These outcomes should not be read as course-level estimates.</p>
    </sec>
    <sec id="sec-8">
      <title>9. Threats to validity</title>
      <p>


</p>
      <p>Version drift. Library growth can shift some items from F2→F1 or F1→F0; always report
tool versions and re-sample each term.</p>
      <p>Annotator expertise. F-labels depend on the skill of annotators in specific ITPs; mitigate
with dual annotation and .</p>
      <p>Topic idiosyncrasies. “Elementary” geometry/olympiad tricks may be F3–F4 due to
modeling overhead.</p>
      <p>Logic mismatch. Results apply primarily to HOL-centric ITPs with ATP/SMT bridges.
10. Conclusion
This paper presents a resource-aware definition of formalizability; introduces the fiv-elevel F0–F4
taxonomy; specifies a measurement protocol with practical estimators; expands the set of
mathematical exemplars; and provides a UI-to-API mapping that grounds intelligent tutoring
system (ITS) adaptation in the concrete capabilities of modern provers. Future work includes: (i)
releasing a benchmarked corpus of course problems annotated with F-labels; (ii) conducting
multi-institutional replications; and (iii) publishing fully reproducible scripts for Isabelle/HOL,
Lean, and Coq (Rocq), together with solver configurations. This paper includes an illustrative pilot
(8) that was not deployed in a live course; the reported figures are demonstrative.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-5 Pro for draft structuring and wording.
The authors reviewed and edited the content and take full responsibility for the publication.
[12] S. Schulz, “The E Theorem Prover,” 2025. URL:
https://wwwlehre.dhbw-stuttgart.de/~sschulz/E/E.html
[13] Vampire Team, “Vampire Theorem Prover,” 2025. URL: https://vprover.github.io/
[14] K. Zheng, J. M. Han, S. Polu, et al., “miniF2F: A Cross-System Benchmark for Formal</p>
      <p>Olympiad-Level Mathematics,” arXiv:2109.00110, 2021. doi: 10.48550/arXiv.2109.00110
[15] K. Yang, et al., “LeanDojo: Theorem Proving with Retrieval-Augmented Language Models,”</p>
      <p>NeurIPS Datasets &amp; Benchmarks, 2023. doi: 10.48550/arXiv.2306.15626
[16] D. Hendrycks, et al., “Measuring Mathematical Problem Solving with the MATH Dataset,”</p>
      <p>NeurIPS Datasets &amp; Benchmarks, 2021. doi: 10.48550/arXiv.2103.03874
[17] A. Thoma and P. Iannone, “Learning about Proof with the Theorem Prover LEAN,” Int. J. Res.</p>
      <p>Undergrad. Math. Ed., 2022. doi: 10.1007/s40753-021-00140-1.
[18] P. Massot, “Teaching Mathematics Using Lean and Controlled Natural Language,” in Proc. ITP
2024, LIPIcs 309:27, 2024. doi: 10.4230/LIPIcs.ITP.2024.27.
[19] M. L. Bottoni, “Teaching ‘Foundations of Mathematics’ with the Lean Theorem Prover,”
arXiv:2501.03352, 2025.
[20] “Courses using Lean,” Lean community page, 2025.
[21] “Learning Mathematics with Lean—presentations &amp; abstracts,” 2024–2025.
[22] B. C. Pierce, “Software Foundations, 15 years on,” talk slides, Univ. of Pennsylvania, 2023.</p>
      <p>(Accessed: Sep. 2025).
[23] B. C. Pierce et al., Software Foundations book series website, 2025.
[24] M. Henz, “Teaching Experience: Logic and Formal Methods with Coq,” in LNCS, 2011, pp. 199–
215. Doi: 10.1007/978-3-642-25379-9_16.
[25] J. Villadsen, et al., “Using Isabelle in Two Courses on Logic and Automated Reasoning,”</p>
      <p>FMTea’21, 2021.
[26] F. K. Jacobsen, “On Exams with the Isabelle Proof Assistant,” 2023.
[27] T. Nipkow, Programming and Proving in Isabelle/HOL, tutorial/book, 2025.
[28] T. Nipkow, L. C. Paulson, and M. Wenzel, Isabelle: A Proof Assistant for Higher-Order Logic
(Tutorial), 2025.
[29] S. Blazy, “Teaching Deductive Verification in Why3 to Undergraduate Students,” in LNCS,
2019. doi: 10.1007/978-3-030-32441-4_4.
[30] P. G. Vaillancourt, et al., “ACL2 in DrScheme (DRACULA): A Pedagogic Framework,” 2006.
[31] C. Eggensperger, “Proof Pad: A Pedagogic ACL2 IDE,” in TFP 2013, 2013.
[32] J. Cai, “AutoRubric: Autograding Template-Based Exam Programs with SMT,” UC Berkeley</p>
      <p>Tech. Rep., 2020.
[33] “Z3Guide: A Scalable, Student-Centered, and Extensible Guide to SMT,” arXiv:2506.08294, 2025.
[34] K. Yang, et al., “LeanDojo: Theorem Proving with Retrieval-Augmented LMs,” NeurIPS Datasets
&amp; Benchmarks, 2023. doi: 10.48550/arXiv.2306.15626.
[35] K. Zheng, J. M. Han, S. Polu, et al., “miniF2F: A Cross-System Benchmark for Formal</p>
      <p>Olympiad-Level Mathematics,” arXiv:2109.00110, 2021. doi: 10.48550/arXiv.2109.00110.
[36] A. Chukhray, T. Stoliarenko, O. Yevdokymov, V. Demyanenko, “Possibilities of using
Intelligent Tutoring Systems (ITS) in Higher Mathematics Courses,” Open Information and
Computer Integrated Technologies, no. 102, pp. 92–119, Feb. 2025, doi:
10.32620/oikit.2024.102.07.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Isabelle</given-names>
            <surname>Team</surname>
          </string-name>
          , “
          <string-name>
            <surname>Documentation</surname>
          </string-name>
          (
          <issue>Isabelle2025</issue>
          ),”
          <year>2025</year>
          . Online. URL: https://isabelle.in.tum.de/documentation.html
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Blanchette</surname>
          </string-name>
          ,
          <article-title>Sledgehammer for Isabelle/HOL (User's Guide)</article-title>
          ,
          <source>Isabelle2025 Manuals</source>
          ,
          <year>2025</year>
          . Online. URL: https://isabelle.in.tum.de/dist/doc/sledgehammer.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>The</given-names>
            <surname>Rocq</surname>
          </string-name>
          /Coq Development Team,
          <source>The Coq Proof Assistant Reference Manual</source>
          ,
          <year>v8</year>
          .
          <fpage>19</fpage>
          ,
          <year>2025</year>
          . Online. URL: https://rocq-prover.
          <source>org/doc/V8</source>
          .19.0/refman/index.html
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Avigad</surname>
          </string-name>
          , P. Massot, Mathematics in Lean,
          <year>2025</year>
          . Online. URL: https://avigad.github.io/mathematics_in_lean/mathematics_in_lean.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Lean</given-names>
            <surname>Community</surname>
          </string-name>
          ,
          <source>mathlib4 Documentation</source>
          ,
          <year>2025</year>
          . Online. URL: https://leanprover-community.github.io/mathlib4_docs/index.html
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fontaine</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Tinelli</surname>
          </string-name>
          ,
          <source>The SMT-LIB Standard, Version 2</source>
          .7,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          . 
          <volume>5</volume>
          , 
          <year>2025</year>
          . Online. URL: https://smt-lib.org/papers/smt-lib
          <source>-reference-v2</source>
          .
          <fpage>7</fpage>
          -r2025-
          <fpage>02</fpage>
          -05.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chukhray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dvinskykh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Narozhnyy</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Stoliarenko</surname>
          </string-name>
          ,
          <article-title>"Using an expression tree for adaptive learning</article-title>
          ,
          <source>" 2023 13th International Conference on Dependable Systems, Services and Technologies (DESSERT)</source>
          , Athens, Greece,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          , doi: 10.1109/DESSERT61349.
          <year>2023</year>
          .10416497
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Microsoft</given-names>
            <surname>Research</surname>
          </string-name>
          ,
          <source>Online Z3 Guide</source>
          ,
          <year>2025</year>
          . URL: https://microsoft.github.io/z3guide/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[9] cvc5 Team, cvc5 Documentation (Proof Rules and Proof Production)</source>
          ,
          <fpage>2022</fpage>
          -
          <lpage>2025</lpage>
          . URL: https://cvc5.github.
          <source>io/docs/cvc5-1.0</source>
          .0/proofs/proofs.html
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Barbosa</surname>
          </string-name>
          et al.,
          <article-title>“cvc5: A Versatile and Industrial-Strength SMT Solver,”</article-title>
          <string-name>
            <surname>TACAS</surname>
          </string-name>
          ,
          <year>2022</year>
          , doi: 10.1007/978-3-
          <fpage>030</fpage>
          -99524-9_
          <fpage>24</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sutcliffe</surname>
          </string-name>
          , “
          <article-title>The TPTP Problem Library</article-title>
          ,”
          <year>2025</year>
          . Online. URL: https://tptp.org/TPTP/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>