<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluation of Logic Proof Problem Difficulty through Student Performance Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Behrooz Mostafavi</string-name>
          <email>bzmostaf@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiffany Barnes</string-name>
          <email>tmbarnes@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>North Carolina State, University, Department of Computer</institution>
          ,
          <addr-line>Science Raleigh, NC 27695</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The interactions of concepts and problem-solving techniques needed to solve open-ended proof problems are varied, making it di cult to select problems that improve individual student performance. We have developed a system of datadriven ordered problem selection for Deep Thought, a logic proof tutor. The problem selection system presents problem sets of expert-determined higher or lower di culty to students based on their measured proof solving pro ciency in the tutor. Initial results indicate the system improves student-tutor scores; however, we wish to evaluate problem set di culty through analysis of student performance to validate the expert-authored problem sets.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Problem Di culty</kwd>
        <kwd>Logic Proof</kwd>
        <kwd>Data-driven Problem Selection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        E ective intelligent tutoring systems present problems to
students in their zone of proximal development through
scaffolding of major concepts [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In domains such as deductive
logic, where the problem space is open-ended and requires
multiple steps and knowledge of di erent rules, it is di cult
to choose problems for individual students that are
appropriate for their proof-solving ability. We have developed a
system that uses the data-driven knowledge tracing (DKT)
of domain concepts in existing student-tutor performance
data to regularly evaluate current student pro ciency of the
subject matter and select successive structured problem sets
of expert-determined higher or lower di culty.
      </p>
      <p>We used an existing proof-solving tool called Deep Thought
to test the DKT problem selection system. The system was
integrated into Deep Thought and tested on a class of
undergraduate philosophy students who used the tutor as assigned
homework over a 15-week semester. Performance data from
this experiment were compared to data from previous use of
Deep Thought without the DKT problem selection system.
The results of the comparison indicate that the DKT
problem selection system is e ective in improving student-tutor
performance. However, we wish to evaluate the di culty of
presented problems using student performance data to
validate the di culty of expert-determined problem sets, and
improve the system for future students.</p>
    </sec>
    <sec id="sec-2">
      <title>2. DEEP THOUGHT</title>
      <p>
        Fig. 1 shows the interface for Deep Thought, a web-based
proof construction tool created by Croy as a tool for proof
construction assignments [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Deep Thought displays logical
premises, buttons for logical rules, and a logical conclusion
to be derived. For example, the proof in Fig. 1 provides
premises A ! (B ^ C); A _ D; and :D ^ E, from which the
user is asked to derive conclusion B using the rules on the
right side of the display window.
      </p>
      <p>Deep Thought keeps track of student performance for the
purpose of pro ciency evaluation and post-hoc analysis. As
a student works through a problem, each step is logged in
a database that records: the current problem; the current
state of progress in the proof; any rule applied to selected
premises; any premises deleted; errors made (such as illegal
rule applications); completion of the problem; time taken
per step; elapsed problem time; knowledge tracing scores
for each logic rule in the tutor.</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Problem Selection</title>
      <p>The problem selection system in Deep Thought presents
ordered problem sets to ensure consistent, directed practice
using increasingly related and di cult concepts. The system
presents set of problems at di erent degrees of di culty,
determined through evaluation of current student performance
in the tutor.</p>
      <p>Evaluation of student performance is performed at the
beginning of each level of problems. Level 1 of Deep Thought
contains three problems common to all students who use the
tutor, and provides initial performance data to the problem
selection model. Levels 2{6 of Deep Thought are each split
into two distinct sets of problems, labeled higher and lower
pro ciency. The problems in the di erent pro ciency sets
are conceptually identical to each other, prioritizing rules
important for solving the problems in that level. To
prevent students from getting stuck on a speci c proof problem,
Deep Thought allows students to temporarily skip problems
within a level. A unique case occurs if a student skips a
problem more than once in a higher pro ciency problem set; the
student will be dropped to the lower pro ciency problem set
in the same level, under the assumption that the student
was improperly assigned the higher pro ciency set (See Fig.
2).</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Logic Proof Problems</title>
      <p>The degree of problem solving di culty between pro ciency
sets is di erent, as determined by domain experts. The
problems in the low pro ciency set require fewer numbers of steps
for completion, lower complexity of logical expressions, and
lower degree of rule application than problems in the high
pro ciency set (See Table 1).
(A ! B) ^ (:D ! F )
A _ :D
:A ! (D _ G)
:A
B _ F
:D
D _ G
G
(B _ F ) ^ G
For the purpose of problem di culty evaluation, progress
through the tutor can be expressed as a directed graph for
each individual student, with nodes in the graph each
corresponding to a single problem. The node set for the graph
represents the problem space for the tutor, and is the same
for every student. Each problem node has the following
properties:
1. Tutor Level (1{6)
2. Pro ciency (High or Low)</p>
      <sec id="sec-4-1">
        <title>3. Problem Number (1{3)</title>
      </sec>
      <sec id="sec-4-2">
        <title>4. Problem Complete (True or False)</title>
      </sec>
      <sec id="sec-4-3">
        <title>5. Expert-Authored</title>
        <p>(a) Required Rules
(b) Minimal Solution</p>
      </sec>
      <sec id="sec-4-4">
        <title>6. Corresponding Step Logs (See Section 2)</title>
        <p>Directed edges between nodes correspond to movement
between problems by the individual student, and are assigned
a numerical value, ordered by increasing time stamp. The
nodes and directed edges together give a map of the
student's progression through the tutor. Connected nodes with
false Problem Complete status represent a skipped problem,
and the node adjacent to the highest numbered edge
represents the student terminus point in the tutor. Isolated nodes
represent non-visited problems, and are therefore un-useable
for problem di culty evaluation.</p>
        <p>Logic proofs can also be represented as directed graphs, with
each node containing a proof premise, and each directed
edge indicating a node parent-child relationship, along with
an applied logic rule. For example, the top proof shown in
Table 1 can be represented as a graph with the premise in
each line as a node, with the directed edges into that node
corresponding to the derivation of that premise from parent
nodes. A proof premise can either be a variable (i.e. A),
a negated variable or expression (i.e. :A, or :(A ^ B)), or
an operational expression in (variable/nested
expression)operand-(variable/nested expression) form (i.e. A _ B, or
(A ^ B) _ (A ! B)). Nested expressions can be represented
in high level form. Therefore, node premises can be
categorized by their operand (conjunction, disjunction, negation,
implication, equivalence), the complexity of the expression
(single variable, simple expression, complex [nested]
expression), and the rule used for derivation.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. PROBLEM DIFFICULTY EVALUATION</title>
      <p>The question at hand is how to best use the recorded data
to determine proof problem di culty through student
performance. We wish to nd both a classi cation of
problem di culty between pro ciency sets in the same level, and
di culty of all problems in the tutor, compared to
expertdetermined classi cations.</p>
      <p>
        Because students follow di erent problem-solving paths, no
student can solve all available problems in the tutor, nor
are students likely to solve problems in both pro ciency sets
within the same level. This makes student performance
comparison over multiple problems di cult. We plan to use a
combination proof-problem properties weighted by student
performance metrics to evaluate problem di culty; however,
we have not determined which combination of methods to
use. We are currently looking into weighted cluster-based
classi cation methods to apply to the problems. The
hypothesis presumed before applying one of these methods
would be that problems of similar di culty would be placed
into the same clusters. Student performance metrics for each
problem could be used to determine distance, since it's
assumed that students would react most similarly to problems
of similar di culty. Eagle et al. applied network community
mining to this student log data in order to form interaction
networks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]; a modi ed version could be applied here on a
student-per-problem level in order to determine prominent
similar behaviors that are correlated with problem
performance.
      </p>
      <p>This would determine which problems are of similar di
culty, but not necessarily which problems (or groups of
problems) are more or less di cult. That determination could be
made by analyzing student rule scores across problems, or
even the di erence in scores at the start and end of a
problem. In particular, analyzing the di erence in rule scores
would both standardize the scores (to account for the scores
being calculated at di erent points in the tutor) and give a
measure of forward or backward progress (a student's rule
scores should not decrease after solving an easy problem).
Problem properties we feel are valuable to take into
consideration when evaluating problem di culty per student
include:</p>
      <p>Classi cation of problems by operand/expressions
Deviation of student solutions from expert solutions
{ Number of steps taken
{ Number and frequency of rules used
Student performance metrics that we feel are valuable to
take into consideration include:</p>
      <sec id="sec-5-1">
        <title>Path progression through the tutor, including</title>
        <p>{ Order of assigned pro ciency sets
{ Number and path location of skipped problems
{ Terminus point in tutor
{ Final tutor grade
Knowledge tracing scores for each rule, prioritized by
problem requirements</p>
      </sec>
      <sec id="sec-5-2">
        <title>Type and number of errors committed</title>
        <p>We would appreciate any literature recommendations, as
well as suggestions for how to use the data from our
experiment to measure and compare problem di culty through
student performance.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. ACKNOWLEDGEMENTS</title>
      <p>This material is based on work supported by the National
Science Foundation under Grant No. 0845997.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Croy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Stamper</surname>
          </string-name>
          .
          <article-title>Towards an Intelligent Tutoring System for Propositional Proof Construction</article-title>
          .
          <source>In Current Issues in Computing and Philosophy</source>
          , pages
          <volume>145</volume>
          {
          <fpage>155</fpage>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eagle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          .
          <article-title>Interaction Networks: Generating High Level Hints Based on Network Community Clusterings</article-title>
          .
          <source>In Proceedings of the 5th International Conference on Educational Data Mining (EDM</source>
          <year>2012</year>
          ), pages
          <fpage>164</fpage>
          {
          <fpage>167</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Murray</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Arroyo</surname>
          </string-name>
          .
          <article-title>Toward Measuring and Maintaining the Zone of Proximal Development in Adaptive Instructional Systems</article-title>
          .
          <source>In Proceedings of the 10th International Conference on Intelligent Tutoring Systems (EDM</source>
          <year>2002</year>
          ), pages
          <fpage>289</fpage>
          {
          <fpage>294</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>