<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>TOLC-ASP: a tool for training students for admission tests ⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edda Dal Santo</string-name>
          <email>dalsantoedda@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agostino Dovier</string-name>
          <email>agostino.dovier@uniud.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Talissa Dreossi</string-name>
          <email>talissa.dreossi@uniud.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>2021 Standard Deviation 11</institution>
          ,
          <addr-line>5 5,9</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi di Udine, DMIF</institution>
          ,
          <addr-line>Via delle Scienze 206, 33100 Udine</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>7</volume>
      <issue>2025</issue>
      <abstract>
        <p>We present a tool generating and executing logical tests of the same kind of those administered by the Italian Universities for a sort of admission tests. They are developed by CISIA, usually referred to as TOLC. A set of benchmarks drawn from oficial repositories has been encoded into ASP and the solutions have been verified. Other tests have been invented starting from them and checked in ASP. Tests have been then parametrized with names of people involved, actions, environments, and so on. This way an exponential number of diferent tests, w.r.t. the initial collection, can be obtained, basically with random grounding, and their solutions verified by ASP. Furthermore, a set of explanations of why an answer is wrong has been added for didactical purposes. The tool has been tested with students of the first year of the Degree in Computer Science.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Didactic of Computer Science</kwd>
        <kwd>Logic Education</kwd>
        <kwd>ASP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The TOLC (CISIA Online Test) is a test developed by the Italian Inter-university Consortium of Integrated
Access Systems (CISIA) used by most Italian universities to verify the minimum knowledge required for
access to degree courses and/or to guide the choice of the most promising university career. Diferent
types of TOLC are associated with specific study programs. We have focused on the TOLC-S, the
entrance test for Scientific degree courses. It is composed of 55 questions divided into six categories:
20 on Mathematics, 15 on “Reasoning, Problems and Text Comprehension”, then 5 each on Biology,
Chemistry, Physics and Earth Sciences. The first and second groups of questions are those in which
students generally encounter the greatest dificulty, as witnessed by the report [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Total
Mathematics</p>
      <p>Reasoning
and Problems</p>
      <p>Text
Comprehension</p>
      <p>Sciences</p>
      <p>N.
questions
50
20
10
3,5
4,8
3,6
3,3
4,8
3,6
2,5</p>
      <p>
        In particular, the sub-part of logical tests are those that are considered hardest among all. It is not
hard to convince researchers in logic programming on the importance of having strong roots in logics.
Logical skills are essential for success in scientific studies, for understanding the rationale behind
mathematical proofs, and in general for reasoning, problem-solving, programming. Typical errors are
inverting formulas, confusing if, only-if, if and only if, reasoning with quantifiers, and so on. These
kind of questions are also inserted in the other TOLC tests. CISIA ofers on its website [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] teaching
material for preparing for the tests, including a text called: Mentor di Logica [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] (briefly, Mentor in the
remaining part of the paper).
      </p>
      <p>This document represents one of the starting points of this work, namely developing a targeted
tool that allows students to practice on this type of questions. In particular, the paper aims to lay the
foundations for the future creation of a web or mobile application that represents a digital version of
the Logic Mentor.</p>
      <p>A further example of the dificulty felt in front of logical (even simple) questions comes from the
popular italian television game “L’eredità” transmitted on RAI1 exactly on the deadline date of the CILC
2025 conference. The competitor, an engineer and champion of the program, received the question
reported in Figure 1. Observe that the value of the answer is 40ke. We don’t unveil his answer but
curious readers might enjoy it on Raiplay1.</p>
      <p>This kind of puzzles are (currently) also a limit for Large Language Models (LLMs). The lack of a true
logical-deductive reasoning module and a “reasoning” relying on statistical correlations learned during
training makes these tools still trying to justify their first choice without doing some (hard and time
costly) backtracking job.</p>
      <p>
        Answer Set Programming (ASP) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] instead has among his main features the capability of dealing
with logical puzzles if correctly encoded [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ]. Results are obtained and justified using formal logic
and rigorous deduction under the so called stable model semantics. And they can be explained to
experts and non-experts using tools developed for this scope [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Experience in High schools for using
ASP for encoding and solving puzzles even in absence of other introductory courses have been made
and proved successful [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The paper is organized as follows: in Section 2 we describe in more details the main elements of
the quizzes of the TOLC. In Section 3 we report the basic ideas for the ASP encoding of families of
logical puzzles. In Section 4 we present our tool for generating TOLC-like quizzes and how it can help
the tester when he/she provides wrong answers. In Section 5 we report some experimental results on
the use of our tool. Some side comments are reported in Section 6 and some conclusions are drawn in
1https://www.raiplay.it/programmi/leredita — 2 Maggio 2025, minute 51</p>
    </sec>
    <sec id="sec-2">
      <title>2. TOLC Logical Puzzles</title>
      <p>
        The Mentor [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is a self-learning tool designed to help students practice multiple-choice logic questions,
although it is not a strict simulation of the TOLC test. Available in PDF format on the CISIA website
(accessible after registration), this guide features a unique interactive structure. The reader begins with
an initial question and selects an answer, being then redirected to another page of the document: if
the choice is correct, confirmation is given along with a new puzzle to solve; if incorrect, a feedback is
provided and the reader is taken back to the previous question. This non-linear navigation (supported
by page references and hyperlinks) encourages active reasoning and logical thinking in a low-pressure
environment.
      </p>
      <p>The Mentor includes questions on propositional and visual logic, sometimes requiring basic mental
calculations. No advanced mathematical knowledge is needed and all texts are written in plain natural
(Italian) language. On the other hand, while no specialized preparation is required, careful reading and
interpretation are essential to avoid misunderstandings.</p>
      <p>
        The logical puzzles included in the Mentor [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] can be classified into 4 categories:
• logical (propositional/first order) puzzles,
• puzzles that require analysis of a picture or a graphical representation to obtain the solution,
• puzzles that require to complete a series of tables, numbers, pairs of numbers, . . . ,
• combinatorial puzzles.
      </p>
      <p>For the purpose of the application developed here, we will focus on the first family (32 examples out of
51). We can split them into (1) puzzles with/without the use of quantifiers, or, (2) puzzles in which the
student must choose exactly one answer or must exclude exactly one answer. We call the latter puzzle
with “dual” answer.</p>
      <p>For completeness of presentation, in Figure 2 we report two examples of puzzles of diferent families.</p>
      <p>In Section 3 we show how to encode these puzzles in ASP, demonstrating how to verify the existence
of a unique solution while highlighting their underlying logical structure. The encoding section can be
viewed as naïve by CILC participants, however, we decided to write it as a tutorial for readers that are
non-experts.</p>
    </sec>
    <sec id="sec-3">
      <title>3. ASP Encoding of Puzzles</title>
      <p>
        Given a denumerable set of variables  , a set of constant symbols , and a set of predicate symbols
, an atomic formula (atom) is a formula of the form (1, . . . , ), where  ∈ , and 1, . . . ,  are
constant symbols or variables. A literal is either an atomic formula or a formula of the form not 
(naf-literal), where  is an atomic formula. A (general) ASP rule is a formula of the form
 ←
1, . . . , , not 1, . . . , not 
(1)
{ p } .
:- p.
where , ,  are atomic formulas. If  =  = 0, then it is said to be a fact. A program is a set of
clauses. A literal, a rule, a program is ground if it contains no variables. The semantics of a program is
based on the notion of stable model/answer set [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Given a ground program  , a set of atoms  is an
answer set of  if it is the (unique) minimum Herbrand model of the so-called “reduct”   obtained
from  by first removing all rules with not  in their body where  ∈  and then removing the
negated literals from all remaining rules. In addition to the rules of form (1), the ASP language has been
extended over time with several constructs such as denials (rules without the head ), choice rules and
cardinality constraints that will be presented and used in the remaining part of this paper.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Puzzles Encoding</title>
        <p>A first challenge has been the one of encoding logical tests from the Mentor in ASP. We report here the
main ideas behind the encoding and two detailed examples.</p>
        <p>Some domain-related properties must be stated according to the example (e.g., person(a).
person(b). brother(a,b). etc). This is diferent in any puzzle and it requires a minimum
of experience in logical modeling. But this is not the dificult part.</p>
        <p>In logical tests/puzzles typically there are properties that can be either false or true and their value
of truth is constrained by subsequent hints. Moreover, the set of candidate results among which the
participant has to select one is often structured as a (Boolean or first-order) sentence. In the next
subsections we try to generalize some lessons learned in the encoding of TOLC problems.
Non-deterministic Choice. If  is one of the properties of interest, a choice rule can be written as
A choice rule is a syntactic sugar for the pairs of rules  ← not .  ← not . where  is an
auxiliary predicate introduced for the negation of .</p>
        <p>All unknown properties of the problem can be declared by assigning them a non-deterministic choice.
If, from the hints, it can be immediately inferred that  holds this can be forced simply by a fact</p>
        <p>If, instead, it can be immediately inferred that  cannot hold, this can be imposed adding a denial
that, intuitively, impose: It is impossible that the program admits a stable model in which the Body of
the denial is true. As said before, it is a syntactic extension, there are several well-known ways of
implementing it with general rules of type (1).</p>
        <p>Disjunctive Hints. A sentence such as at least one of the following  literals is true (in other words,
their disjunction is true) can be expressed as:
h :- ell_1.
h :- ell_2.
...
h :- ell_k.
:- not h.</p>
        <p>The last line (a denial) excludes the possibility of a stable model in which not h is true (e.g., h is false).
Assume ℓ1, . . . , ℓℎ are positive literals (atoms) while ℓℎ+1 = not 1, . . . , ℓ = not − ℎ are naf-literals,
for ℎ to be in a stable model it must be that at least one among ℓ1, . . . , ℓℎ hold, supported by other
atoms and rules, or that at least one among 1, . . . , − ℎ is not in the stable model.</p>
        <p>In the remaining part of the section we will use h for each example, for simplicity. In the encodings,
a diferent predicate name must be assigned to any hint (e.g., ℎ1, ℎ2, ℎ3, . . . ).</p>
        <p>We can generalize this encoding to sentences such as At least  and at most  of the following 
atoms/literals are true, as follows:
h :- m{ ell_1; ell_2; ... ;ell_k }n.
:- not h.</p>
        <p>The constructor in the body of the first rule is the so-called cardinality constraint, which is another
extremely useful syntactical extension to ASP. Its semantics is exactly the one just sketched: it is true
when at least  and at most  of the its  literals are true. Their truth however must be guaranteed by
another rule(s).</p>
        <p>Cardinality constraints can be rewritten, e.g., by using denials excluding that more than  literals
hold and that more than  −  literals do not hold.</p>
        <p>Conjunctive Hints. If the hint states that a conjunction of (positive or negative) literals must hold,
we can write
h :- ell_1, ell_2, ... ,ell_k.
:- not h.</p>
        <p>Boolean combinations. Any other Boolean combinations can be implemented using the above
ides. For instance, in the case of implications given as hint they can be encoded using their disjunctive
semantics: ( → ) is equivalent to say that it is impossible that  holds and  does not hold:
h :- A, not B.
:- not h.</p>
        <p>Quantifiers. If  is a property (an atom) depending on a parameter , a hint such as ∃ () can be
forced as
h :- p(X).
:- not h.</p>
        <p>Similarly, a hint such as ∀ () can be expressed as:
nh :- not p(X).
:- nh.</p>
        <p>In this case we say that it is impossible that we have nh (think as not h). This means that there cannot
exist an  such that () is false, namely that for all  we have ().
3.2. Logical Consequence with Stable Model Semantics
ASP solvers can be required to generate all stable models (in the case of clingo this can be made using
the call option 0). If an atom  holds in all computed stable models of a program  we can say that it is
a logical consequence of the program under the stable model semantics. This is briefly denoted as
 |=</p>
        <sec id="sec-3-1-1">
          <title>Example 1. Let us analyze the test n. 6 of the Mentor, namely</title>
          <p>There are two brothers, Romolo and Remo. One is always sincere and, conversely, the other is
always a liar.</p>
          <p>If Romolo says that their mother is named Silvia and Remo says that she is blonde, it can be
deduced with certainty that:
A. if their mum is blonde, her name is not Silvia.</p>
          <p>B. if the name of their mum is Silvia, then her hair is blonde
C. the name of their mother is Silvia and she is not blonde
D. if the name of the mother is not Silvia, then she is not blonde</p>
          <p>E. their mother is not called Silvia
It is interesting to see that ChatGPT answers C.2 Of course the correct answer is A. As a matter of fact, there
are two possible scenarios:</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Romolo sincere Silvia liar not Silvia Remo liar not blonde sincere Blonde By case analysis it is immediate to realize that only the answer A is applicable. The example can be encoded in ASP as follows:</title>
          <p>%%% The two brothers
bro(romolo). bro(remo).
%%% Property: being sincere.
{sincere(X)} :- bro(X).
%%% Exactly one is sincere
h1 :- 1 { sincere(romolo); sincere(remo) } 1.
:- not h1.</p>
          <p>After the first part we have to encode the fact that Romulus is sincere if her mother is named Silvia
(similarly for Remo and blonde).
%%% romolo is sincere iff the mother is Silvia
silvia :- sincere(romolo).
:- silvia, not sincere(romolo).
%%% similarly for remo
blonde :- sincere(remo).
:- blonde, not sincere(remo).</p>
          <p>Then we have to encode the possible answers, assigning them a parameter.
%% A. The implication (blonde -&gt; not silvia)
%%%% is viewed as not blonde or not silvia
opt(a) :- not blonde.
opt(a) :- not silvia.
%% B. (silvia -&gt; blonde)
opt(b) :- not silvia.
opt(b) :- blonde.
%% C. silvia and not blonde
opt(c) :- silvia, not blonde.
%%% D. (not silvia -&gt; not blonde)
opt(d) :- silvia.
opt(d) :- not blonde.
%%% E. not silvia
opt(e) :- not silvia.</p>
          <p>Running the code with clingo one realizes that opt(a) is the unique option holding in all the (two) stable
models (thus, it is a logical consequence under the stable model semantics).</p>
          <p>Example 2. Let us analyze the test n. 49 of the Mentor, namely:</p>
          <p>Please select the correct negation of the sentence: Umberto has at least a blonde son:
A. At least one son of Umberto is not blonde
B. Umberto has no sons or he has only not blonde sons
C. All Umberto’s sons are brown haired
D. Not all Umberto’s sons are blonde</p>
          <p>E. All Umberto’s sons are red haired.</p>
          <p>First of all we introduce the sentence about Umberto as a hint and we require that its negation is true:
i1 :- father(umberto,X), blonde(X).
%%%% It is impossible that the hint is true
%%%% (thus its complement must hold)
:- i1.</p>
          <p>In order to generate possible scenarios we introduce some possible sons, each of them with their hair
information.
{father(umberto,X)} :- peanut(X).
%%% Here they are
peanut(sally). peanut(schroeder).
peanut(lucy). peanut(piperita).
%%% And their hair color
blonde(sally). blonde(schroeder).
black(lucy). brown(piperita).</p>
          <p>Then we encode the various options:
peanut(charlie).
peanut(peggy_jane).
bald(charlie).</p>
          <p>red(peggy_jean).
%%%% (exists a non blonde son)
opt(a) :- father(umberto,X), not blonde(X).
%%%% A disjunction
parent :- father(umberto,X).
opt(b) :- not parent.
%%% In the case he has sons, they cannot be blonde
blondesons :- father(umberto,X), blonde(X).
opt(b) :- not blondesons.
%%%%% All the sons are brown - there is no son that is not brown
opt(c) :- not nonbrownsons.
nonbrownsons :- father(umberto,X), not brown(X).
%%% there is a son that is not blonde
opt(d) :- father(umberto,X), not blonde(X).
%%% Like opt(c) with a different color
opt(e) :- not nonredsons.
nonredsons :- father(umberto,X), not red(X).</p>
          <p>The answer is option B which is true in all 16 stable models. ChatGPT answers correctly to this quiz.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. TOLC-ASP</title>
      <p>
        The application developed and presented in this section is based on a scheme of 12 quizzes drawn from
the Mentor [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], namely those numbered 1, 6, 7, 20, 21, 23, 32, 41, 42, 48, and 49. A further quiz (we call
it 0) is obtained from a sample freely downloadable from the CISIA website [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Tests 0, 1, 6, 7, 42 are
without quantifiers, the others use quantifiers. Test 23 requires a “dual” answer.
      </p>
      <p>The application was developed following the steps outlined below.</p>
      <sec id="sec-4-1">
        <title>4.1. ASP Encoding</title>
        <p>The twelve selected quizzes have been encoded in ASP along the lines sketched in the previous section.
They have been solved with clingo, in particular using the option 0 to understand if a quiz is under
constraints and in any case to verify if an answer is true in all possible stable models.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. New Quizzes</title>
        <p>Modifying the above encoded quizzes, new ones have been invented (and their encodings checked).
Particular care has been posed in introducing parameters that allow to modify them with new names,
actions, attributes and so on. A set of placeholders is used and some dataset for replacing the placeholders
(e.g., with diferent human names) has been populated. The following example shows how the test n. 6
of the Mentor has been transformed into a template and how it is has been subsequently encoded.
Example 3. There are two people, *name1* and *name2*. One is always sincere and,
conversely, the other is always a liar. If *name1* says that *sentence1* and *name2* says that
*sentence2*, it can be deduced with certainty that:
A. if *sentence2*, then it’s not true that *sentence1*.</p>
        <p>B. if *sentence1*, then it’s true that *sentence2*
C. it’s true that *sentence1* and it’s not true that *sentence2*
D. if it’s not true that *sentence1*, then it’s not true that *sentence2*</p>
        <p>E. it’s not true that *sentence1*</p>
        <p>While it is clear what *name1*, *name2* stand for, the placeholders *sentence1*, *sentence2* represent
simple proposition of the kind subject-verb-complement (e.g. Adam eats an apple or Eva tricks the snake).
The ASP encoding goes as follows:</p>
        <p>This way, starting from a limited number of predefined models (12 for now) and some data files
listing names, actions and so on, it is possible to generate an exponential number of diferent quizzes
obtained by instantiating the placeholders using a random generator.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Natural Language Generation</title>
        <p>A check of the readability of the generated quizzes was made. Texts are in Italian and the use of
conjunctive tense requires a further step to make the sentences clear and fluent. This could be made, in
principle, using the help of a LLM.</p>
        <p>
          Initially, an attempt was made to instantiate the quiz templates with real data using the Minerva
LLM, developed by Sapienza NLP in collaboration with Future Artificial Intelligence Research (FAIR)
and CINECA [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. However, at the request to replace a template’s placeholders with fabricated data, the
model was unable to produce the desired outcome, returning a quiz identical to the original template.
Since Minerva is rapidly evolving, another attempt will be made as soon as possible.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Error Correction</title>
        <p>In every test (in a parametric way) we added some sentences with specific feedback for the users’
incorrect answers. Similarly to what is done in the Mentor, we wanted to create a system that not only
indicates the correct solution, but also explains the reason why an answer is wrong. This approach
aims to promote active learning, helping users to understand their own errors and to refine their logical
reasoning skills. Coding this feedback directly in ASP ofered the advantage of being able to directly
and automatically verify its logical correctness: by adding the ASP rules for the feedback of the wrong
answers to the code and uncommenting (automatically) them one at a time, the output of clingo was
useful to explicitly display the counterexamples to the corresponding answer option, thus demonstrating
its erroneousness.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Automatization</title>
        <p>A python program implementing the test generation (using a random generator) and the test execution
was subsequently developed. This program allows users to visualize the quizzes one at a time, to select
an answer and to receive a feedback. Users have an indefinite number of attempts at their disposal, so
that they can retry the same puzzle as many times as they need to find the right answer.</p>
        <p>Specifically, the program accepts a command-line argument –feedback that allows the user to specify
whether they wish to receive an hint after each wrong answer. Each question template, organized
in JSON files, contains placeholders that are replaced with values defined in a secondary dictionary
containing sets of names, actions, attributes and other. As an example, we report the entry corresponding
to the test n. 6 of the Mentor previously discussed.</p>
        <p>{"id": 3,
"source": 6,
"text": "There are two people, *name1* and *name2*.\n</p>
        <p>One is always sincere and, conversely, the other is always a liar.\n
If *name1* says that *sentence1* and *name2* says that *sentence2*,
it can be deduced with certainty that:",
"options": {
"A": "if *sentence2*, then it's not true that *sentence1*",
"B": "if *sentence1*, then it's true that *sentence2*",
"C": "it's true that *sentence1* and it's not true that</p>
        <p>*sentence2*",
"D": "if it's not true that *sentence1*, then it's not true</p>
        <p>that *sentence2*",
"E": "it's not true that *sentence1*"
},
"exact": "A",
"feedback":{
"A": "Correct!",
"B": "Wrong! *name1* and *name2* cannot be sincere at the same
time",
"C": "Wrong! There is a case where *name2* tells the truth and
*name1* tells a lie",
"D": "Wrong! *name1* and *name2* cannot be liar at the same</p>
        <p>time",
"E": "Wrong! There is a case where *name1* is sincere"
}}</p>
        <p>
          This process is handled by a function that instantiates a diferent version of each question, shufles
the answer choices and eventually adds the feedback message for each option. The presentation of
questions to the user is made by another function that uses the inquirer package [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to show the
quiz sequentially. In particular, inquirer allows the users to select answers via keyboard arrows.
After each question, users receive an immediate message about the correctness of their answer and the
possibility to retry if it was wrong, accompanied by an hint if feedback was enabled. The program keeps
asking the same question until the user selects the right answer. Not only the number of attempts, but
also timestamps and identifiers for the question for each question are tracked in a file that is collected
at the end of the session.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>Here we report on an experiment carried out testing the above program with first-year Computer
Science students at the University of Udine who (more or less) volunteered to participate during their
Programming Lab classes at the end of March 2025. The software was installed on several lab computers
and each volunteer completed the test anonymously, being allowed to take as much time as needed, in
order to simulate a training session rather than a formal assessment.</p>
      <p>All participants received the same set of 12 question templates, each instantiated with variable
data and appearing in random order. Moreover, the participants were divided into two groups: one
group was given the quizzes with extended feedback, while another received only minimal feedback
(right/wrong)—see also Figure 3. At the end of the test, students filled out an anonymous questionnaire,
where they could express their opinions on the dificulty of the questions and the usefulness of the
feedback. The aim of the experiment was, in fact, to evaluate the efectiveness of the feedback provided
for incorrect answers and to gather some direct input from users.</p>
      <p>A total of 42 students participated in the experiment, each taking an average of 10 to 20 minutes to
complete the test: 22 out of 42 participants were given the “extended feedback” version, while 20 were
given the “minimal feedback” version. This uneven distribution is due to the fact that students had to
take turns using a computer and those who received minimal feedback took more time to complete the
test than the others.</p>
      <p>Although the small size of the sample does not allow for statistically meaningful conclusions, some
observations can still be made and may serve as a basis for future improvements. On average, the
performance between the two groups was relatively similar, with a slight advantage in favor of the
group with “extended feedback”: students belonging to the group with “minimal feedback” generally
needed more attempts to get the right answer. In detail, the statistics for the number of correct answers
on the first attempt are as follows
• extended feedback: mean = 5.72, standard deviation = 2.49, maximum = 11;
• minimal feedback: mean = 6.1, standard deviation = 2.51, maximum = 10.</p>
      <p>For the total number of failed attempts, the results show
• extended feedback: mean = 13, 14, standard deviation = 6.39 , maximum = 24;
• minimal feedback: mean = 14.25 , standard deviation = 6.98, maximum = 26.</p>
      <p>The feedback appears to have had a moderate impact. The questionnaires completed at the end of
the test indicate that most students found the test to be moderately dificult and thought the feedback
was of limited usefulness.</p>
      <p>Given these outcomes and the lack of a clear distinction between the two groups, one could argue
that the role of feedback is marginal. However, we remain convinced of its importance in a learning
tool of this kind: these results simply point out the need to refine the way feedback is designed and
delivered, in order to improve its efectiveness and perceived value.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Side Work</title>
      <p>
        During the analysis of the TOLC problems, we investigated some well-known logical puzzles and asked
some LLMs to solve them, in order to test the models’ proficiency on this kind of logical reasoning
task. In particular, we analyzed the famous Zebra Puzzle [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and we considered three of the most
utilized LLMs nowadays: ChatGPT, Gemini and Deepseek. Since this puzzle is already well known in
the literature, the models had an easier time solving it. To challenge the LLMs, then, we invented a
variant of the same puzzle that says:
1. In the wonderful lagoon of Grado there are 4 houses of diferent colors, listed from west
to east. 3
2. In each house lives a family that speaks a diferent dialect.
3. Each family drinks one and only one beverage.
4. The Gradese does not live in the pink house.
5. The Istrian lives next to the Carnic.
6. The Gradese drinks spritz.
7. The Carnic does not drink water. Never.
8. Near the Gradese, people drink either water or beer.
9. Whoever drinks Ribolla does not live in a red or green house.
10. The white house is, from left to right, between the green and the red one.
11. The person in the white house lives further west than the one who doesn’t drink alcohol.
12. The Gradese and the Bisiaco are not neighbors.
      </p>
      <p>QUESTION: Who lives in the white house?</p>
      <p>There is only one possible correct answer, and it is that the Carnic lives in the white house. This
answer was obtained encoding the puzzle in ASP and running clingo whose output is:
3Actually the correct formulation was “casone” namely small buldings made of wood and hay where fishermen lived
lives(1, Gradese, green, spritz) lives(2, Carnic, white, beer)
lives(3, Istrian, red, water) lives(4, Bisiaco, pink, ribolla)
here 1 is the number assigned to the first house we find to the west and
to the east.</p>
      <p>The answers given by the models mentioned above are the following:
• Gemini4 answers that the Gradese lives in the white house and argues incorrectly violating several
conditions of the riddle;
• ChatGPT5 ofers a more detailed argumentation but its answer - The Istrian lives in the white
house - is still wrong; in particular, it seems to misinterpret constraint 10), which is commonly
understood to mean that the white house must be between the green and red houses, in
left-toright order and in a consecutive manner;
• DeepSeek6 provides the correct answer, but its solution shows that constraint 10) is not respected;
this is due to the model’s dificulty in interpreting the phrase "from left to right" as "from west to
east".</p>
      <p>In conclusion, no one of these models is completely reliable, but some have better logical reasoning
abilities than others. DeepSeek R1 is the model that gets closer to the correct and complete solution,
but its reliability depends on how clearly the problem is stated.
4 is assigned to the last house</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>With the growing popularity of large language models (LLMs), it is natural for students to consider
chatbots like ChatGPT, Gemini or Deepseek R1 as potential tools to support their preparation for tests
and exams. However, at present, these models do not guarantee a suficient level of reliability, as their
responses are not always consistent or correct. In this scenario, a structured system such as the one
described above represents a more robust alternative.</p>
      <p>Developing a digital version of the Mentor involves several challenges, including:
• selecting an appropriate collection of questions,
• ensuring suficient variety in the questions,
• guaranteeing that the answers are consistent and that there is always a single correct answer,
• generating clear and efective feedback for users.</p>
      <p>All of these aspects have been addressed in the previous sections. Moreover, thanks to the direct
involvement of students in the experimental phase, valuable suggestions emerged for further improvements of
the tool. Among the most relevant proposals are: the introduction of a more intuitive and accessible
graphical interface; the refinement of the feedback mechanisms, both for incorrect and correct attempts,
to better clarify the underlying logical reasoning; the integration of features designed for students
with special educational needs (BES). Other improvements could include expanding the repository of
templates by adding new quizzes, including combinatorial puzzles.</p>
      <p>These insights open up promising avenues for the future evolution of the software, with the goal of
meeting the real needs of the users.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We would like to express our gratitude to Giovanna D’Agostino for sharing with us her expertise. We
thank Dario Della Monica and Claudio Mirolo for allowing us to test the tool during their lab classes
and all the (anonymous) students of the first year of the Laurea L31 of the University of Udine that
accepted to participate to the experiment.
The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Filippi</surname>
          </string-name>
          ,
          <source>Vincenzo Falco, I risultati TOLC</source>
          <year>2023</year>
          , https://www.cisiaonline.it/sites/default/ ifles/Divulgazione/2024/Volume-Risultati_TOLC_
          <fpage>2023</fpage>
          -
          <lpage>CISIA</lpage>
          .pdf,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Area</surname>
            <given-names>riservata test CISIA</given-names>
          </string-name>
          , https://testcisia.it/studenti_tolc/login_sso.php,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Caire</surname>
          </string-name>
          , Paola Suria Arnaldi, Mentor di Logica vol.
          <volume>1</volume>
          ,
          <string-name>
            <surname>Cisia</surname>
          </string-name>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Marek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Truszczynski</surname>
          </string-name>
          ,
          <article-title>Stable models and an alternative logic programming paradigm, in:</article-title>
          <string-name>
            <surname>K. R. Apt</surname>
            ,
            <given-names>V. W.</given-names>
          </string-name>
          <string-name>
            <surname>Marek</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Truszczynski</surname>
          </string-name>
          , D. S. Warren (Eds.),
          <source>The Logic Programming Paradigm - A 25-Year Perspective, Artificial Intelligence</source>
          , Springer,
          <year>1999</year>
          , pp.
          <fpage>375</fpage>
          -
          <lpage>398</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>642</fpage>
          -60085-2_
          <fpage>17</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -60085-2\_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dovier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Formisano</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Pontelli,</surname>
          </string-name>
          <article-title>An empirical study of constraint logic programming and answer set programming solutions of combinatorial problems</article-title>
          ,
          <source>J. Exp. Theor. Artif. Intell</source>
          .
          <volume>21</volume>
          (
          <year>2009</year>
          )
          <fpage>79</fpage>
          -
          <lpage>121</lpage>
          . URL: https://doi.org/10.1080/09528130701538174. doi:
          <volume>10</volume>
          .1080/09528130701538174.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dreossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dovier</surname>
          </string-name>
          ,
          <article-title>Modeling and solving the rush hour puzzle</article-title>
          , in: R. Calegari,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Omicini (Eds.),
          <source>Proceedings of the 37th Italian Conference on Computational Logic</source>
          , Bologna, Italy, June 29 - July 1,
          <year>2022</year>
          , volume
          <volume>3204</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>294</fpage>
          -
          <lpage>306</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3204</volume>
          /paper_29.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dovier</surname>
          </string-name>
          , 3cosoku
          <article-title>and its declarative modeling</article-title>
          ,
          <source>J. Log. Comput</source>
          .
          <volume>32</volume>
          (
          <year>2022</year>
          )
          <fpage>307</fpage>
          -
          <lpage>330</lpage>
          . URL: https://doi.org/10.1093/logcom/exab086. doi:
          <volume>10</volume>
          .1093/LOGCOM/EXAB086.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alviano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L. T.</given-names>
            <surname>Trieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Son</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Balduccini, The XAI system for answer set programming xasp2</article-title>
          ,
          <source>J. Log. Comput</source>
          .
          <volume>34</volume>
          (
          <year>2024</year>
          )
          <fpage>1500</fpage>
          -
          <lpage>1525</lpage>
          . URL: https://doi.org/10.1093/logcom/exae036. doi:
          <volume>10</volume>
          . 1093/LOGCOM/EXAE036.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dovier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Benoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Brocato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dereani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tabacco</surname>
          </string-name>
          ,
          <article-title>Reasoning in High Schools: Do it with ASP!</article-title>
          , in: C.
          <string-name>
            <surname>Fiorentini</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Momigliano (Eds.),
          <source>Proceedings of the 31st Italian Conference on Computational Logic</source>
          , Milano, Italy, June 20-22,
          <year>2016</year>
          , volume
          <volume>1645</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>205</fpage>
          -
          <lpage>213</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1645</volume>
          /paper_9.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Gelfond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lifschitz</surname>
          </string-name>
          ,
          <article-title>The stable model semantics for logic programming</article-title>
          , in: R. A.
          <string-name>
            <surname>Kowalski</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>A</article-title>
          .
          <string-name>
            <surname>Bowen</surname>
          </string-name>
          (Eds.),
          <string-name>
            <surname>Logic</surname>
            <given-names>Programming</given-names>
          </string-name>
          ,
          <source>Proceedings of the Fifth International Conference and Symposium</source>
          , Seattle, Washington, USA,
          <year>August</year>
          15-
          <issue>19</issue>
          ,
          <year>1988</year>
          (
          <article-title>2 Volumes)</article-title>
          , MIT Press,
          <year>1988</year>
          , pp.
          <fpage>1070</fpage>
          -
          <lpage>1080</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Minerva</surname>
          </string-name>
          , https://nlp.uniroma1.it/minerva/,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[12] inquirer 3.4</source>
          .0, https://pypi.org/project/inquirer/,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zebra</surname>
          </string-name>
          Puzzle - Wikipedia, https://en.wikipedia.org/wiki/Zebra_Puzzle,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>