<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Counterfactual Explanations for Student Outcome Prediction with Moodle Footprints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anjana Wijekoon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nirmalie Wiratunga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ikechukwu Nkisi-Orji</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kyle Martin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chamath Palihawadana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Corsar</string-name>
          <email>d.corsar1g@rgu.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computing, Robert Gordon University</institution>
          ,
          <addr-line>Aberdeen AB10 7GJ, Scotland</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Counterfactual explanations focus on \actionable knowledge" to help end-users understand how a machine learning outcome could be changed to one that is more desirable. For this purpose a counterfactual explainer needs to be able to reason with similarity knowledge in order to discover input dependencies that relate to outcome changes. Identifying the minimum subset of feature changes to action a change in the decision is an interesting challenge for counterfactual explainers. In this paper we show how feature relevance based explainers (such as LIME), can be combined with a counterfactual explainer to identify the minimum subset of \actionable features". We demonstrate our hybrid approach on a real-world use case on student outcome prediction using data from the CampusMoodle Virtual Learning environment. Our preliminary results demonstrate that counterfactual feature weighting to be a viable strategy that should be adopted to minimise the number of actionable changes.</p>
      </abstract>
      <kwd-group>
        <kwd>Explainable AI</kwd>
        <kwd>Counterfactual</kwd>
        <kwd>LIME</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Understanding a user's explanation need is central to a system's capability of
provisioning an explanation which satis es that need [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Typically an
explanation generated by an AI system is considered to convey the internal state or
workings of an algorithm that resulted in the system's decision [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In machine
learning (ML) the decision tends to be a discrete label or class (or in the case of
regression tasks a numeric value). Although explanations focused on the internal
state or logic of the algorithm is helpful to ML researchers it is arguably less
useful to an end-user who may be more interested in what in their current
circumstances could be changed to receive a desired (better) outcome in the future.
This calls for explanations that focus on discovering relationships between the
input dependencies that led to the systems' decision.
      </p>
      <p>
        A Nearest-Like Neighbours (NLN) based explainer, could focus on input
dependencies by identifying similarity relationships between the current problem
Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
and the retrieved nearest neighbour [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Research has shown that when similarity
computation is focused on feature selection and weighting, it can signi cantly
improve retrieval of NLN [
        <xref ref-type="bibr" rid="ref7 ref8">8, 7</xref>
        ]. Accordingly it would be reasonable to expect
that NLN-based explanation generation would also bene t from the knowledge
of feature importance. Certainly having to focus on a few key features in
domains with large numbers of features is likely to improve the cognitive burden
of understanding NLN-based explanations.
      </p>
      <p>
        Unlike NLN based explanations, a counterfactual explanation focuses on
identifying \actionable knowledge"; which is knowledge about important causal
dependencies between the input and the ML decision. Such knowledge helps to
understand what could be changed in the input to achieve a preferred (desired)
decision outcome. Typically a Nearest-Unlike-Neighbours (NUN) is used to
identify the number of di erences between the input and its neighbour, that when
changed can lead to a change in the system's decision [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A key challenge that we
address in this paper is to identify the minimum subset of feature value changes
to achieve a change in the decision - the \actionable features". In this paper, we
discover actionable knowledge as the minimum number of feature changes using
feature relevance based explainer methods like LIME [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The rest of the paper is organised as follows. Section 2 presents the Moodle1
use case followed by Section 3 which presents the proposed methods to improve
the discovery of actionable features. Section 4 presents the Moodle dataset
details, evaluation methodology, performance metrics and results. Finally we draw
conclusions and discuss future work in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Use Case: Student Outcome Prediction</title>
      <p>
        Student information systems are increasingly using ML to identify and predict
events in a student's journey to help improve early intervention and mitigate
at-risk students [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. They help both the student to adopt positive behaviours
and to pinpoint improvements to teaching style and content. We have used the
CampusMoodle Virtual Learning Environment (VLE) to construct a student
footprint dataset for a single module of study delivered within Robert Gordon
University in Semester 1 of 2020/2021. VLE interactions help to capture vital
touchpoints that can be used as proxy measures of student engagement. It
contains a wealth of relevant data for each student such as the number of times they
have accessed individual learning materials gathered over a period of ten weeks
(full details of the dataset are described in Section 4.1).
      </p>
      <p>The use case dataset links the \CampusMoodle footprint" with the
\Assessment Outcomes" data. This creates a classi cation task to predict the likely nal
grade for a given student, and an associated explanation task to explain this
outcome given the student's Moodle footprint. A student may want to know why
they received a lower grade and in response the nearest neighbour instance could
be used to factually con rm the credibility of that lower grade i.e. \other
students like you also received a similar lower grade". Although such precedence can
1 Learning management system https://moodle.org/
be used to justify the student's circumstances, it is not as useful as a
\counterfactual" explanation, which can provide indications as to how another student
with a similar pro le ended up with a better grade simply because they had
completed a formative quiz exercise. Here we want the explainer to identify a
few important features to consider, instead of expecting the end-user to analyse
hundreds of features used to describe the Moodle footprint.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Actionable Explanations</title>
      <p>
        Given a query instance, x = fx1; x2; :::; xmg, its counterfactual, x^ = fx^1; x^2; :::; x^mg,
is identi ed as the nearest-unlike-neighbour (NUN) in the Euclidean feature
space [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Here m is the number of features used to represent instances (i.e. xi
and x^i pairs).
      </p>
      <p>v
u m
uX(xi
d(x; x^) = u
t
i=1
x^i)
(1)</p>
      <p>The goal of a counterfactual explanation is to guide the end-user to achieve
class change (i.e. actionable), with a focus on minimising the number of changes
needed to ip the decision to a more desirable outcome for the end-user.
Therefore in order for an instance to qualify as a NUN, for a given query instance,
it needs to be the nearest neighbour with a di erent class to that of the query.
Discovering the minimal number of actionable features (from a maximum of m
potential feature changes) and minimising the amount of change required on
each actionable feature are two main challenges for counterfactual explainers.
3.1</p>
      <sec id="sec-3-1">
        <title>Actionable Feature Discovery with LIME</title>
        <p>
          LIME [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is a model-agnostic explainer which creates an interpretable model
around a query, x. The resulting local surrogate model is interpretable and only
locally faithful to the classi er but not globally. LIME creates a set of
perturbations within the neighbourhood of x. The original classi er is used to obtain
the class labels for perturbed data instances and this new labelled dataset is
used to create a local interpretable model (often a linear model). The new
interpretable model is used to predict the classi cation outcome of x that needs
to be explained. Accordingly, LIME outputs how each feature xi contributed to
the nal classi cation outcome.
        </p>
        <p>These local feature contribution weights form the main LIME output, which
is a list of tuples (wi; i), where each feature, xi, has its corresponding weight.</p>
        <p>LIME output = [(w1; 1); (w2; 2); :::; (wm; m)]
(2)
A positive weight (wi &gt;= 0) indicates that the corresponding feature contributes
positively and a negative weight (wi &lt; 0) corresponds negatively towards the
predicted class. Figure 1 shows a LIME explanation for a Moodle data instance
which predicted a Lower grade (See Section 4.1 for more details on the dataset).
Weights are ordered such that highest weights appear rst, regardless of a
positive or negative contribution to the predicted class. Here we can see that the
feature \Study Area: CM4107" had the highest positive impact and feature
\Assignment: DROPBOX WRITEUP: CW2" had the highest negative impact on
the Lower grade classi cation.</p>
        <p>LIME can be used to provide relevance weights for both the query and its
potential NUN. The number of feature changes, n, required to achieve a class
change, can range from 1 to m (1 &lt;= n &lt;= m). We propose two methods to
nd these actionable features, with the goal of minimising the number of feature
changes (n) needed for a succinct yet actionable explanation. The rst method
is to replace the values of the most important features in the query with the
corresponding feature values from its NUN; or a second alternative is to identify
the most relevant features of the NUN and reuse those feature values in the
modi ed query instance.</p>
        <p>We illustrate these two methods in Figure 2. Here, we depict an example
with 5 features where we replace features on the query until a class change is
observed. In the left, the query features are ordered by their LIME feature weights
and the most signi cant features are replaced by the respective NUN feature
values (Method 1). In the right, the NUN features are ordered by their LIME feature
weights and the most signi cant features are reused by the query (Method 2).
Method 1 and 2 achieve class change with 3 and 2 feature replacements
respectively.</p>
        <p>The general functions for actionable feature discovery using counterfactuals
is shown in Algorithms 1 and 2. Here F (x) = y is the classi er prediction for the
query and, y^ = F (x^) is the class prediction for the NUN. Given an instance, an
Actionable Feature Ordering function, AFOrderLIME (:) returns the feature
indices, I, ordered by the local importance of a given instance's features for
the resultant class prediction. Thereafter Actionable Feature Discovery function,
AFDiscovery, uses I to create a perturbed version of the query, x0, where
important features are replaced one at a time with the corresponding feature
values from the NUN. The features with equal values are skipped. Feature values
are iteratively replaced until the actionable change condition is met i.e. F (x0) =
y0 ^ y 6= y0. Clearly the fewer replacement iterations needed the better the
actionable features being discovered. Crucial to this minimisation is the ordering
methods that in uence I:
Method1: uses AF OrderLIME (x) returning indices ordered on the basis of the
feature importance that led to the class prediction of the query; and
Method2: uses AF OrderLIME (x^) returning indices ordered on the basis of
feature importance that led to the class prediction of the NUN.
Algorithm 1 AFDiscovery
Require: (x; y): query and label pair
Require: (x^; y^): NUN and label pair
Require: F : classi cation model
Require: I: list of feature indices
1: Declare y0 = y; x0 = x; n = 0
2: while y0 = y do
3: x0[I[n]] = x^[I[n]]
4: y0 = F (x0)
5: increment n
6: end while
7: return n</p>
      </sec>
      <sec id="sec-3-2">
        <title>Algorithm 2 AFOrderLIME</title>
        <p>Require: q: Data instance for which LIME
weights are retrieved
1: [(w1; 1); (w2; 2); :::; (wm; m)] = LIM E(q)
2: return I: list of feature indices ordered by w
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>The goal of this evaluation is to determine the e ectiveness of the actionable
feature discovery algorithm with di erent ordering strategies. We compare the
query and NUN feature relevance based ordering over a random ordering of
features for actionable feature discovery. The following algorithms are compared
in this evaluation:
1. Random: Instead of AFOrderLIME , Randomly ordered set of feature
indices is used in AFDiscovery.</p>
      <sec id="sec-4-1">
        <title>2. Method1</title>
      </sec>
      <sec id="sec-4-2">
        <title>3. Method2 4.1</title>
      </sec>
      <sec id="sec-4-3">
        <title>Moodle Dataset</title>
        <p>Moodle dataset consists of Moodle footprint of 79 students who were enrolled for
a course during September and December 2020 at RGU. There are 95 features
where each feature refers to the number of times the resource on the Course
page was accessed by a student. For instance, feature \Assignment: DROPBOX
WRITEUP: CW1" refers to the submission dropbox provided for submitting
coursework 1 and feature value refers to the number of times the student accessed
the dropbox.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Predicting student outcome using Moodle footprint The ML task of</title>
        <p>this dataset is to predict if a student gets a higher or a lower grade based on
their Moodle footprint. We consider grades A and B as Higher grades and C,
D, E and F as Lower grades. Grades were consolidated as Higher and Lower
to mitigate the comparably lower number of data instances and class imbalance.
Five outlier student cases were also ltered out using the Density-based Spatial
Clustering method o ered in sklearn Python libraries. This formed a dataset of
74 instances.</p>
        <p>A RandomForest classi er was used to predict the grade based on the Moodle
footprint. There were 500 trees created using Python sklearn libraries which were
selected after an exhaustive search for the best ML algorithm using an AutoML
method. The classi er achieved 83% accuracy over three strati ed folds. Note
that for our results when explaining an outcome, we assume that the classi er
has correctly predicted the grade.</p>
        <p>Explaining predicted student outcome There are two explanation intents
explored in this paper: rstly a counterfactual type question, Why student A did
not receive a grade X?; or an alternative question Why did student A receive
grade Y? The latter can be explained using the LIME explainer presenting the
contribution of the most important features for the predicted grade; and the
former Why not type question explained through a counterfactual explanation
formed using methods for actionable feature discovery.
4.2</p>
      </sec>
      <sec id="sec-4-5">
        <title>Evaluation Methodology</title>
        <p>We consider two versions of the Moodle dataset in this evaluation: First with all
74 instances each being considered as the query; and Second with 50 instances
from the class Lower grade as the query. For each query the number of feature
changes required to observe a class change is calculated using the three
methods described above. Final value is computed as the mean number of actionable
features. The rst dataset provides a generalised comparison between the
performances of the three methods. The second dataset represents the situation that
is most likely to require a counterfactual explanation, i.e. students who ask how
to get a Higher grade or why did they not receive a Higher grade. Accordingly
the second version will further verify the comparative performances of the three
methods in a domain-speci c context.
4.3</p>
      </sec>
      <sec id="sec-4-6">
        <title>Results</title>
        <p>Figures 3a and 3b present the comparison of the three methods for
discovering actionable features using the two datasets. With the full dataset, Random,
Method 1 and Method 2 observe a class change with 21.62, 8.14 and 8.61
feature changes on average. Similar results are observed in the domain-speci c case
where Random, Method1 and Method2 observe a class change with 23.57, 9.34,
9.13 feature changes on average. Accordingly, two methods proposed in this
paper has signi cantly outperformed the random feature ordering approach to
achieving class change. LIME has highlighted the features that are actionable,
thus minimising the number of changes required.</p>
        <p>Furthermore, with both datasets, Method1 and Method2 performances are
comparable. Accordingly, reusing features of NUN ordered by the signi cance
of how each contributed to predicting class label y^ is found to be as e ective
as considering Query feature signi cance. Given that a student's goal is to nd
actionable resources on Moodle towards achieving a di erent grade (commonly
a Higher grade) we nd these results align with our domain knowledge.
(a) All Data</p>
        <p>(b) Data labelled Lower Grade</p>
        <p>Fig. 3: Comparative Results</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we presented a novel approach to nding actionable knowledge
when constructing an explanation using a counterfactual. We used feature
relevance explainer LIME to order features that are most signi cant to a predicted
class and then used that knowledge to discover the minimal actionable features
to achieve class change. We demonstrated our approach using a dataset from the
CampusMoodle VLE where a course outcome (i.e. grade) is predicted using the
student's Moodle footprint. Our empirical results showed that our approach is
signi cantly e ective when compared with the random approach to discovering
actionable features. In future, we plan to extend our approach to use other
feature relevance algorithms and further minimise the number of features to create
succinct actionable explanations.</p>
      <p>Acknowledgements This research is funded by the iSee project (https://isee4xai.com/)
which received funding from EPSRC under the grant number EP/V061755/1.
iSee is part of the CHIST-ERA path nder programme for European coordinated
research on future and emerging information and communication technologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Conijn</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snijders</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleingeld</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matzat</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Predicting student performance from lms data: A comparison of 17 blended courses using moodle lms</article-title>
          .
          <source>IEEE Transactions on Learning Technologies</source>
          <volume>10</volume>
          (
          <issue>1</issue>
          ),
          <volume>17</volume>
          {
          <fpage>29</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Keane</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Good counterfactuals and where to nd them: A case-based technique for generating counterfactuals for explainable ai (xai)</article-title>
          .
          <source>In: International Conference on Case-Based Reasoning</source>
          . pp.
          <volume>163</volume>
          {
          <fpage>178</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kenny</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keane</surname>
          </string-name>
          , M.T.:
          <article-title>Twin-systems to explain arti cial neural networks using case-based reasoning: Comparative tests of feature-weighting methods in ann-cbr twins for xai</article-title>
          .
          <source>In: Proceedings of the Twenty-Eighth International Joint Conference on Arti cial Intelligence, IJCAI-19</source>
          . pp.
          <volume>2708</volume>
          {
          <fpage>2715</fpage>
          .
          <source>International Joint Conferences on Arti cial Intelligence Organization</source>
          (7
          <year>2019</year>
          ). https://doi.org/10.24963/ijcai.
          <year>2019</year>
          /376, https://doi.org/10.24963/ijcai.
          <year>2019</year>
          /376
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mohseni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zarei</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ragan</surname>
            ,
            <given-names>E.D.:</given-names>
          </string-name>
          <article-title>A survey of evaluation methods and measures for interpretable machine learning</article-title>
          .
          <source>arXiv preprint arXiv:1811</source>
          .
          <volume>11839</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ribeiro</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>" why should i trust you?" explaining the predictions of any classi er</article-title>
          .
          <source>In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          . pp.
          <volume>1135</volume>
          {
          <issue>1144</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wachter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mittelstadt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Counterfactual explanations without opening the black box: Automated decisions and the gdpr</article-title>
          .
          <source>Harv. JL &amp; Tech. 31</source>
          ,
          <issue>841</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Wettschereck</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aha</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms</article-title>
          .
          <source>Arti cial Intelligence Review</source>
          <volume>11</volume>
          (
          <issue>1</issue>
          ),
          <volume>273</volume>
          {
          <fpage>314</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wiratunga</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koychev</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Massie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Feature selection and generalisation for retrieval of textual cases</article-title>
          .
          <source>In: European Conference on Case-Based Reasoning</source>
          . pp.
          <volume>806</volume>
          {
          <fpage>820</fpage>
          . Springer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>