<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Solving Mathematical Exercises: Prediction of Students' Success</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Wankerl</string-name>
          <email>sebastian.wankerl@uni-wuerzburg.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerhard Gotz</string-name>
          <email>gerhard.goetz@mosbach.dhbw.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Hotho</string-name>
          <email>andreas.hotho@informatik.uni-wuerzburg.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair for Computer Science X</institution>
          ,
          <addr-line>Am Hubland, 97074 Wurzburg</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DHBW Mosbach</institution>
          ,
          <addr-line>Lohrtalweg 10, 74821 Mosbach</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In educational settings, recommender systems can help to choose the right exercises a student should be given for training. To make good decisions, the system should be able to estimate how successful a student would answer a recommended exercise. In this work, we study the performance of convolutional neural networks and collaborative ltering for estimating students' success. We show that we can distinguish between correctly and wrong processed exercises with a precision of up to 64% while training on a small corpus of 712 user interactions.</p>
      </abstract>
      <kwd-group>
        <kwd>Recommender systems</kwd>
        <kwd>Technology enhanced learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Recommender systems are widely used for already a long period. While the
most prominent areas of application are still e-commerce and entertainment
industry[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], several attempts are made for using recommendation algorithms
in the area of education[
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ], due to the increasing number of online learning
material and massive open online courses (MOOC).
      </p>
      <p>
        We want to build a recommender system that helps freshmen at university
to overcome their weaknesses in mathematics. Since no public data is available
for our aim, we built a rule-based system which exploits didactical ontologies[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
to nd suitable exercises for a student. Furthermore, this preliminary system
makes it possible to collect data we can later use for building a system that
better adapts to the individual student.
      </p>
      <p>In this work, we want to present a rst step in this direction, namely a
machine-learning based approach to estimate whether a student will solve an
exercise given by our rule-based system correctly. In particular, we want to
detect for all users those exercises that they were given but unable to master.
Such failing allows the students to learn more and to broaden their skills instead
of simply repeating known content.</p>
    </sec>
    <sec id="sec-2">
      <title>Educational System and Dataset</title>
      <p>Our dataset contains 128 mathematical exercises. With our rule-based system,
we collected data from 39 students who contributed a total of 787 processed
exercises. However, for our following analysis we only used the data of participants
who contributed at least 10 exercises to have enough data for each user. This
leaves us with 24 users and 712 processed exercises.</p>
      <p>The data points, also called interactions, are represented by 4-tuples (u; i; rui; t)
consisting of a user u, an exercise i and a binary rating label rui which indicates
whether the user u solved exercise i correctly or not. In addition, we keep a
consecutive timestamp t which preserves the chronological order of the interactions.
The number of contributed interactions by user u is denoted as Tu.
2.1</p>
      <sec id="sec-2-1">
        <title>Training and Test Dataset</title>
        <p>In this section we describe how we build the test and training dataset of each
student we use for our experiments.</p>
        <p>To maximize the amount of data usable for training, we keep all data of the
user except the last 5 interactions which we want to predict the user's
performance on. Hence, these last ve interactions are our test set and denoted as
teu.</p>
        <p>We denote the basis training set for user u as tru. For example, if we want to
predict the performance of user u1, we keep all interactions from users u2;:::;24
together with the rst T1 5 interactions by user u1 as basis training set.</p>
        <p>Since we expect the CNN to require more training data than provided by the
basis training set, we decided to create a second training set t~ru by augmenting
the basis training set. More precisely, it consists of replicas of the users. They
are obtained by creating approximately 20,000 windows wj of size 5 consisting
of random interactions aj1 ; : : : ; aj5 2 tru such that the following conditions are
ful lled:
1. all interactions a 2 wj come from the same user u
2. the interactions a 2 wj are ordered ascending according to their timestamp
t
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setting</title>
      <p>To reach our goal of predicting the students' performance on given exercises, we
experimented with a Convolutional Neural Network (CNN) architecture as well
as collaborative ltering (CF).
3.1</p>
      <p>CNN
As stated above, we want to predict the performance of each user u on the 5
exercises contained in teu. Hence, our CNN architecture maps sequences of 5
exercise ids (i1 : : : i5) to their binary performance labels (ri1 : : : ri5 ). We train
the network using the augmented training sets t~ru.</p>
      <p>Since the inputs to the network are discrete, i.e. high dimensional and sparse
ids, they are mapped to a lower dimensional dense vector rst using an
embedding layer. The dimension of the embeddings is set to 20. We use one
convolutional layer, with the number of lters, and kernel size set to 10 and 3,
respectively. This con guration worked best out of the explored con gurations.
We tested the number of lters between 5 and 30 and the number of
convolutional layers of 1 and 2. Moreover, we varied the kernel size between 2 and
5.</p>
      <p>We atten the output and put it into a feed-forward layer that applies the
sigmoid activation to each output. The overall architecture is depicted in gure
1 alongside a generic input and output sample.</p>
      <p>For training, we set a batch size of 32 and apply the ADAM optimizer.
Moreover, we use the binary cross-entropy as the loss function since it transforms
each element of the sigmoid output layer into an independent probability. In our
setting, these are interpreted as the probabilities of the exercises being solvable.
3.2</p>
      <sec id="sec-3-1">
        <title>Collaborative Filtering</title>
        <p>We also apply collaborative ltering (CF) using the kNN algorithm with Pearson
correlation as similarity measure as this is a long established approach in the
eld of recommender systems. We chose to consider the k = 2 nearest neighbors
only as this yields the best results on our dataset. To predict the performance of
user u, we t the algorithm on the original user interactions contained in tru. For
testing we let the algorithm predict teu of each user u, equal to the evaluation
of our CNN approach.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>As described in section 3, we evaluated a CNN architecture as well as
collaborative ltering on our recommendation setting. We use majority vote of each
exercise as a baseline. It assumes that the user u will solve exercise i incorrect
if the majority of other users did so.</p>
      <p>The classi er's predictions are rounded half away from zero to obtain
dichotomous variables of predicted success. The metrics used for evaluation are
precision p, recall r, and f1-score. In addition, we evaluated the RMSE score
between the not rounded predictions and rui. The results are shown in table 1,
averaged over all users.</p>
      <p>Both classi ers outperform the baseline with regard to all metrics as can
be seen in table 1, leading to the conclusion that both classi ers are able to
recognize contextual e ects in the students' handling of the exercises. Regardless
of parameter k, the CF could not make predictions in 13 of 120 exercises we want
to predict since no tting neighbors were available.</p>
      <p>With regard to the precision, both approaches work comparably. However,
the collaborative ltering yields a considerably higher recall score than the CNN.
Hence, the CF approach detects more of the positive items. Overall, the f1-score
of the CF is consequently higher than the one of the CNN.</p>
      <p>With regard to the RMSE score, the baseline performs worse than both
machine learning based approaches again. Also, it gives further evidence that
the predictions of the CF are slightly closer to the true values than the one
made by the CF approach.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Summary</title>
      <p>In this work we trained a CNN and CF to predict a student's success in solving
mathematical exercises presented to the student by our tutoring system. The
classi ers were given all interactions of the student except the last 5, along with
the interactions of all other students. We showed that both classi ers are able
to predict the students success more accurately than a majority baseline.</p>
      <p>The results suggest that a classi er can help in selecting appropriate exercises
for a student. As a further step, we can incorporate it into our rule-based system
to help making decisions which exercise to present to a student.</p>
      <p>Nevertheless, one has to keep in mind that the results presented here are
drawn from a very limited set of students. This could also be a possible
explanation why the neural network approach does not yield promising results. It
is therefore worthwhile to conduct the described experiments again as soon as
more student participated in the training.</p>
      <p>Moreover, it can be investigated whether it is possible to boost the quality of
predictions if the system could exploit more information than just the correctness
of an exercise, like the student's average time spent with solving the exercises or
his overall success.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Drachsler</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verbert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>O.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manouselis</surname>
          </string-name>
          , N.:
          <article-title>Panorama of recommender systems to support learning</article-title>
          .
          <source>In: Recommender systems handbook</source>
          , pp.
          <volume>421</volume>
          {
          <fpage>451</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Henning</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forstner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heberle</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swertz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Schmolz,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Barberi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Verdu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Regueras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.M.</given-names>
            ,
            <surname>Verdu</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.J.</surname>
          </string-name>
          , Pablo de Castro, J., et al.:
          <article-title>Learning pathway recommendation based on a pedagogical ontology and its implementation in moodle (</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pinkernell</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Dusi,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Vogel</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Aspects of pro ciency in elementary algebra</article-title>
          .
          <source>In: 10th Congress of European Research in Mathematics Education</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tay</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Deep learning based recommender system: A survey and new perspectives</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 52(1)</source>
          ,
          <volume>5</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>