<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshops, March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Explore the Explanation and Consistency of Explainable AI in the LBLS Data Set</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tiffany T.Y Hsu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Owen H.T. Lu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International College of Innovation, National Chengchi University</institution>
          ,
          <country country="TW">Taiwan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Learning Analytics (LA) is a field focusing on analyzing educational data, utilizing machine learning. One of the most discussed topics is at-risk student prediction. However, the application of these methods for predicting students' academic behaviors has faced criticism due to concerns about context insensitivity, potentially leading to prejudice and discrimination against students. While some methods in explainable AI (xAI) have been proposed to address these issues, there remains uncertainty regarding the consistency of their results. In response, we incorporate two popular explainable AI (xAI) methods SHAP (Shapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), to interpret the predicting models. These methods attribute the output of these models to individual features, providing a clearer understanding of how each features contributes to the overall prediction. This approach is exemplified in the LBLS467 dataset, which includes data on 467 students' academic performance and learning behaviors in computer programming courses, encompassing a range of metrics from programming behavior to self-regulated learning and language learning strategies. Concerning the consistency of interpretations derived from SHAP and LIME, analysis via Kendall's tau coefficients reveals a moderate alignment in their feature weight rankings. Additionally, this alignment is substantiated by a highly significant confidence level, affirming that the observed alignment is not a mere coincidence.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Learning Analytics</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>SHAP</kwd>
        <kwd>LIME 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Learning Analytics (LA) is a research field centered on measuring, collecting,
analyzing, and reporting data about learners and their contexts [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Within this field, predicting student
academic achievement is a foundational and significant topic [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Risk student prediction involves
identifying students at risk of academic failure using data-driven insights and has been used to
enhance web-based learning environments [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This process is not about labelling or categorizing
students, rather, it aims to foresee students’ performance in classes in advance. This foresight
enables educators to offer timely assistance and intervention, tailored to each student’s needs,
thereby enhancing their academic outcomes and experiences.
      </p>
      <p>
        Machine learning is often criticized for being overly generalized, and overlooking the context
of the individual. Reflecting on the limitations of generalizations in understanding human
behavior, anthropologist Clifford Geertz suggests that theories and generalizations inevitably
lack deep and contextual understanding of human thought. ‘Theoretical disquisitions stand far
from the immediacies of social life,’ he notes. ‘Any generalization or theory constructed in the
absence of deep understanding, not grounded in the concrete and particular, is vacuous.’ [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
approach of risk student prediction has also faced similar criticism of over-generalizing. The fact
that machine learning models do not provide a causal effects between features and prediction is
overlooking the individuality of students. In machine learning predictions, we are confronted
solely with the dichotomous outcomes: students being classified as either ’at risk’ or ’not at risk.’
While the purpose of such predictions is not to categorize students, the absence of interpretability
in these outcomes can inadvertently result in failure to recognize individuality and risk of
discrimination and stereotyping [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Explainable Artificial Intelligence (xAI) appears to be a
solution to address these concerns, helping educators understand the differences among
individual students. xAI refers to methods to explain and interpret predictions made by machine
learning models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Recently, artificial intelligence has been integrated into many areas of society.
At the same time, debates surrounding AI, particularly in the context of ethics remain active. One
of the most popular topics is transparency. In discussions about transparency, besides disclosing
training data and sources, another prevalent approach is the application of xAI to render the
decision-making processes transparent. When a model's decision process is transparent, it
becomes simpler to monitor and assess its accuracy, thereby enhancing the model's
accountability. Moreover, the comprehensible predictions offered by interpretable models play a
vital role in fostering people's acceptance and trust in the decisions made by the model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].In this
study, we will answer two research questions:
      </p>
      <p>RQ1: What are the successful factors in the LBLS dataset explored by SHAP and LIME?
RQ2: How consistent are SHAP and LIME in interpreting a student’s learning performance?</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>
        The global community has developed an extensive variety of xAI approaches, which have been
applied across various domains to interpret a wide range of machine learning models, including
several complex models that were previously considered too intricate to interpret [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These
advancements in xAI have enabled a deeper understanding of machine learning outputs,
enhancing transparency and trust, especially in critical sectors. In line with these developments,
a systematic review of xAI applications reveals a concentrated focus in specific sectors, notably
healthcare, industry, and transportation [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. As for the field of education, despite the relatively
lower number of scholarly articles compared to other domains, the application of xAI has been
noted in the review. It is noteworthy that 27% of xAI applications in these articles are utilized for
decision support, which is the highest proportion of application in this context. Therefore,
employing xAI as a tool for decision support in predicting whether students are at risk is justified.
      </p>
      <p>
        The application of xAI in education manifests primarily in two aspects: data usage and
stakeholder engagement [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Application in data usage enables the explanatory models to
improve prediction models after identifying the characteristics of student success in the
classroom. In terms of stakeholder engagement, it allows teachers to adjust their teaching
methods based on the results provided by the explanations.
      </p>
      <p>
        Reflecting on previous studies, there was research focused on the automatic generation of
explanations in virtual learning environments. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a tool was developed to generate
multimodal explanations regarding predictions of whether a student will pass or fail. The study
compared the accuracy of various classifiers. Under the conditions of most models demonstrated
high accuracy, it opts for simpler models including J48, Rep-Tree, and RandomTree over complex
ones like SVM to achieve a balance between accuracy and interpretability. [12] also indicates that
when models achieve high predictive accuracy, simpler models may yield higher quality
explanations. Therefore, this study follows this direction by comparing the predictive accuracy of
multiple models and selecting a simpler model for explanation under the premise of high
accuracy.
      </p>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the most prominent repositories on GitHub in 2022 for xAI, as measured by
the number of stars, were slundberg/shap (Shapley Additive exPlanations) and marcotcr/lime
(Local Interpretable Model-agnostic Explanations). SHAP operates on game theory principles,
attributing a machine learning model’s output to the contributions of individual features [13].
Conversely, LIME elucidates the predictions of classifiers or regressors faithfully by locally
approximating them with an interpretable model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Both methods are adept at explaining
machine learning models, regardless of their complexity. Given the active community
engagement on GitHub, the high level of attention these methods have garnered, and their
opensource status, this study will incorporate both SHAP and LIME. Utilizing these approaches, we aim
to pinpoint key features that determine the classification of individual students as at-risk. We will
compare the outcomes from each method and assess the consistency between the two.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. LBLS Dataset</title>
        <p>LBLS467 is a dataset that collects data on 467 students’ academic performance and learning
behaviors in computer programming courses. It encompasses students’ programming editing
behaviors, questionnaire survey results on Self-regulated Learning (SRL) and the Strategy
Inventory for Language Learning (SILL). This dataset includes a total of 208 features, covering a
wide range of learning behaviors and performance indicators. The dataset is utilized to propose
a series of challenging suggestions for the LBLS dataset and was used in a data challenge
workshop organized by the Society for Learning Analytics Research (SoLAR) [14] [15]. 'At-risk'
has diverse definition, in this study, we defined 'At-risk' students are those who fail or are on the
verge of failing the course in this study. Specifically, risk students are those whose performance
is comparatively worse than at least 75% of the students in their class."</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Feature Extraction and Classification</title>
        <p>In this study, we employ Principal Component Analysis (PCA) as our primary tool for feature extraction.
PCA, a common preprocessing step for machine learning algorithms [16], is followed by the application
of three different models, ranging from the most explainable to the least: Decision Tree, Logistic
Regression, and Support Vector Machine (SVM). We create a graph to demonstrate how model
accuracy relates to the number of PCA components, aiming to find the most accurate model for a
given component count. The most accurate model is then further analyzed using SHAP and LIME.
("# % "&amp;)</p>
        <p>Accuracy = ("# % (# % "&amp; %(&amp; )
•
•
•
•</p>
        <p>TP (True Positives): The number of correct predictions that an instance is positive.
TN (True Negatives): The number of correct predictions that an instance is negative.
FP (False Positives): The number of incorrect predictions that an instance is positive (actually
negative).</p>
        <p>FN (False Negatives): The number of incorrect predictions that an instance is negative
(actually positive).</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. SHAP and LIME</title>
        <p>SHAP is an xAI method grounded in game theory, designed to interpret the predictions of complex
machine learning models. It employs the Shapley value to calculate the contribution of each feature
to the model’s output. This approach facilitates a comprehensive understanding of how different
features influence the model’s predictions. The weights of the features are derived from the following
[13]:</p>
        <p>ϕ! # ∑" ⊆%\{*}|"|!(|%||%&amp;||!"|&amp;')![&amp;"∪*(( "∪*)*&amp;"(( ")]
•
•
•
•
•</p>
        <p>F : All features.
|F| : The total number of features.
φ) : The SHAP value of feature i .</p>
        <p>S : A subset of all features set F excluding the feature i.
|S| : The size of subset S.
(1)
(2)
•
•
f*∪)(x*∪)) : The prediction of model f when the feature set S includes the feature i.
f*(x*) : The prediction of the model f with only the feature set S.</p>
        <p>
          SHAP provides a method to quantify the contribution to the change in prediction when feature
i is added to the model for every possible feature set S. The idea of LIME, on the other hand, is to
approximate the behavior of a complex model near the prediction of a specific instance using a
simpler model. The formula of LIME is as follows [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]:
•
•
•
•
ξ(x) = argmin L(f, g, π/) + Ω(g)
        </p>
        <p>,∈.
g : A simple model used to approximate the behavior of the complex model f near the
instance x.</p>
        <p>G : The set of all possible simple models.
π/ : A weighting function that assigns higher weights to points closer to the instance x.
Ω(g): A complexity measure that penalizes the model g.
(3)</p>
        <p>LIME begins by selecting a specific instance x (in this case, an individual student), already predicted
by a complex model. It generates a series of perturbed samples around this instance to explore the
model’s behavior locally. To approximate the behavior of the complex model in this localized region,
a simpler model, such as linear regression, is employed. The key objective is to assess the alignment
between the outputs of this simpler model, denoted as g, and the original complex model, denoted
as f , within the local context. This process is represented in the formulation by minimizing the loss
function between g and f, complemented by the minimization of g’s complexity measure.</p>
        <p>In this study, the Logistic Regression model was trained using data transformed through
PCA. Consequently, to maintain consistency with the training data, the data samples generated by
LIME need to be transformed into the same dimensional space. A wrapper function is implemented
to facilitate this process, transforming the LIME-generated data via PCA to ensure that the data is in
the appropriate form for the trained model to process effectively. And the same wrapper function has
been applied on SHAP.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4 Consistency Evaluation</title>
        <p>SHAP and LIME operate on distinct principles to determine the contribution of each feature to the
outcome. The critical question lies in the extent of the differences in the explanations derived from
these two methods. To evaluate their consistency, our approach involved identifying features that
show statistical correlation with the predicted results, as assessed by Spearman’s correlation with a
significance threshold set at α = 0.05. This threshold was chosen to discern features significantly
correlated with the outcomes. The next step is to compare the ranks of contributions as provided by
SHAP and LIME, utilizing the Kendall’s tau for this comparative analysis.</p>
        <p>The Kendall correlation coefficient measures the degree of similarity between two sets of
rankings assigned to the same group of objects [18]. Firstly, we rank the selected features based
on their influence on the prediction outcome. This process results in two sets of ranking data,
each ordering the same set of features. For each pair of features, we examine their respective
positions in both ranking sets and calculate their relative positions. Consequently, if a feature
ranks higher than another in both sets, the pair is deemed 'consistent'; the opposite scenario
indicates inconsistency. Once considers all pairs of features, calculating the difference between
the number of consistent pairs and inconsistent pairs, divided by the total number of pairs.
Following is the formula of Kendall correlation coefficient:
0! # 0$
τ = %&amp;×0×(023)
(4)
•</p>
        <p>n4 : The number of concordant pairs
•
•
n5 : The number of discordant pairs.</p>
        <p>n : The sample sizes.</p>
        <p>The value of this coefficient ranges from -1 and 1. A value approaching 1 indicates a high level of
consistency in the rankings, while a value approaching -1 signifies a substantial degree of inconsistency
[19].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <sec id="sec-4-1">
        <title>4.1 Reply RQ1</title>
        <p>As illustrated in Figure 1, the accuracy assessments demonstrate that all models achieved accuracy
rates around 80%. Notably, both Logistic Regression and Decision Tree models showed remarkable
performance. Logistic Regression achieved an 84.6% accuracy rate with 16 components, while the
Decision Tree reached a same level of accuracy with 58 components.</p>
        <p>The final decision to focus on Logistic Regression for in-depth analysis stems from a crucial
observation. Under the premise of using PCA as a method for feature extraction, the Decision Tree
model becomes less interpretable. Initially, the Decision Tree was a preferred choice due to its
wellknown ease of interpretability. However, it was crucial to assess whether its performance was
sufficiently superior to warrant detailed explanation. Upon further analysis, it was found that its
accuracy was comparable to that of the Logistic Regression model. Therefore, we decided to apply
SHAP and LIME to the Logistic Regression model.</p>
        <p>In the results, we present the explanation of SHAP’s prediction for individual instance in the form
of a waterfall plot. This mode of presentation is very similar to the way data is represented in LIME
results, which aids in our comparison of each instance.</p>
        <p>This SHAP waterfall plot illustrates how feature contributions (red and blue bars) move the model
prediction from a baseline value (the average output of the model) E[f(x)] to the final prediction f(x).
Blue bars represent features that decrease the prediction probability, while red bars indicate those
that increase it. The gray texts in front of the feature names are the value to each features.</p>
        <p>In Figure 2, the model predicts Student A as at-risk with a probability value of 0.665, surpassing
the threshold for risk. Key features like ‘ADD_MEMO’, ‘srl_s_28’, and ‘srl_s_29’’ positively influence
this outcome. In contrast, ’srl_m_18’ and 192 other features collectively decrease the prediction
probability by about 0.16. Figure 2 shows Student B as not at-risk with a predictive value of 0.448,
influenced by features like ‘SEARCH’, ’SEARCH_JUMP’, and ‘srl_m_30’ which lower the risk probability.</p>
        <p>LIME’s plot in Figure 3 indicates Student A as at-risk with a 0.66 predictive probability, consistent
with the number displayed on SHAP analysis. Influential features include ‘LINK_CLICK’,
‘SEARCH_JUMP’, and ‘srl_s_3. Conversely, features like ‘CLEAR_HW_MEMO, and
‘OPEN_RECOMMENDATION’ contribute to a lower risk prediction. Figure 3 predicts Student B as not
at-risk at 0.55 probability, significantly influenced by ‘RuntimeError’ and ‘HTTPError’.</p>
        <p>To identify the key features contributing to succeed in the LBLS dataset, we assess the contribution
of features to the prediction. For SHAP analysis, we employed global explanation to find out the top
five features with the highest contribution values. Since LIME lacks a global explanation mechanism,
we aggregated the top five features with the most significant impact from each predictions. The five
most influential features in the global explanation of SHAP were ADD_RECOMMENDATION,
ADD_HW_MEMO, s_41, s_26, and TabError; whereas, for LIME, they were LINK_CLICK, HTTPError,
ZeroDivisionError, RecursionError, and s_32.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Reply RQ2</title>
        <p>We select 93 features that are statistically correlated to the result using Spearman’s correlation
coefficient. In the next step of the analysis, we will employ SHAP and LIME to evaluate the prediction
concerning student A. Our focus will be on capturing the rankings of all 93 features. Following this,
using Kendall’s tau coefficient to assess the similarity between these rankings.
*p &lt; .05 **p &lt; .01 ***p &lt; .001</p>
        <sec id="sec-4-2-1">
          <title>Description</title>
        </sec>
        <sec id="sec-4-2-2">
          <title>Closed the book.</title>
          <p>Opened the book.</p>
          <p>Jumped to a particular page.</p>
          <p>Number of times a student copy codes.</p>
          <p>Number of times a student execute codes.</p>
          <p>ρ
-0.11*
-0.12**
0.17***
0.33***
0.28***</p>
          <p>Each blue dot on the graph represents a unique feature. When a dot aligns with the diagonal, it
signifies that SHAP and LIME assign the same ranking to that feature’s weight. For the interpretative
analysis of Student A and Student B, the Kendall’s tau are 0.66 and 0.64, respectively, suggesting a
moderate but noticeable positive correlation between the two datasets. This implies that an increase
in one dataset’s values is generally mirrored by an increase in the other, although the relationship is
not exceptionally strong. The analysis yields remarkably low P-values for the weight rankings, all of
which are below 0.001, reinforce the significance of this correlation.</p>
          <p>The graph reveals a tendency for the features’ weight rankings, as determined by both
interpretation methods, to cluster near the diagonal, particularly those with higher (towards the start)
and lower weights (towards the end). This pattern suggests a greater consistency in how both
methods evaluate these features. Conversely, the rankings of features in the central region of the
graph tend to be more dispersed.</p>
          <p>In the two prediction points, SHAP and LIME show a moderate level of consistency in assessing
feature importance, with a tendency for feature rankings to cluster near the diagonal line indicating
higher consistency in evaluating the most and least important features. The dispersion of feature
rankings in the central area of the graph suggests greater variability in interpreting features of medium
importance. The low P-values enhance the credibility of the results, suggesting that the observed
correlations are not random but reflect the underlying patterns in the data.</p>
          <p>Finally, we compared the feature weight rankings explained by SHAP and LIME for each prediction
point pairwise, calculating the average of Kendall's tau and p-value. We obtained an average Kendall's
tau of 0.623 and an average p-value of 0.000979. This suggests that there is also a moderate to strong
correlation in the feature importance rankings between the two methods for each prediction point.
In other words, the rankings of feature importance are relatively consistent between the two methods,
and the p-value being far below 0.05 shows that the correlation in rankings between SHAP and LIME
is statistically significant.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this study, we emphasize the importance of xAI in preventing over-generalization of machine
learning algorithms, especially in fields of learning analytics. We use PCA for feature extraction,
comparing accuracies of multiple models, and selected one that is both simple to use and highly
accurate. We then combine various statistical methods to check if SHAP and LIME explanations of
feature weight rankings are consistent. The results show moderate consistency in SHAP and LIME
rankings among 93 selected features related to prediction outcomes, with high confidence. In learning
analytics, divergent results from xAI in predicting at-risk students can complicate strategy formulation
for stakeholders. Our study has analyzed explanations for two students predicted with different labels.
Future research could explore which explanation is more trustable when there is a lack of consistency,
whether to sacrifice model accuracy for higher consistency, or to involve more human intuition in
assessing the reasonableness of explanations. As for key feature identification for student learning
performance and strategy formulation for adaptive development, it undoubtedly requires
involvement from school teachers, educators, and psychologists.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This study is supported in part by the National Science and Technology Council of Taiwan under
contract numbers NSTC 112-2410-H-004 -063 -.
[12] Liu, B., &amp; Udell, M. (2020). Impact of accuracy on model interpretations. arXiv preprint
arXiv:2011.09903 .
[13] Lundberg, S. M., &amp; Lee, S.-I. (2017). A unified approach to interpreting model predictions.</p>
      <p>Advances in neural information processing systems, 30
[14] Lu, O. H., Huang, A. Y., Flang, B., Ogata, H., &amp; Yang, S. J. (2022). A quality data set for data
challenge: Featuring 160 students’ learning behaviors and learning strategies in a
programming course. In 2022 30th International Conference on Computers in Education.</p>
      <p>ICCE.
[15] Flanagan, B., Ogata, H. (2018). Learning Analytics Platform in Higher Education in Japan,</p>
      <p>Knowledge Management &amp; E-Learning (KM&amp;EL), Vol.10, No.4, pp.469-484.
[16] Wold, S., Esbensen, K., &amp; Geladi, P. (1987). Principal component analysis. Chemometrics and
intelligent laboratory systems, 2 (1-3), 37–52.
[17] DW, G. D. A. (2019). Darpa’s explainable artificial intelligence program. AI Mag, 40 (2), 44
[18] Kendall, M. G. (1938). A new measure of rank correlation. Biometrika, 30 (1/2), 81 93.
[19] Abdi, H. (2007). The kendall rank correlation coefficient. Encyclopedia of Measurement and
Statistics. Sage, Thousand Oaks, CA, 508–510.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Siemens</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R. S. d.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Learning analytics and educational data mining: towards communication and collaboration</article-title>
          .
          <source>In Proceedings of the 2nd international conference on learning analytics and knowledge</source>
          (pp.
          <fpage>252</fpage>
          -
          <lpage>254</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] Ak ̧capınar, G.,
          <string-name>
            <surname>Altun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp; A ̧skar,
          <string-name>
            <surname>P.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Using learning analytics to develop earlywarning system for at-risk students</article-title>
          .
          <source>International Journal of Educational Technology in Higher Education</source>
          ,
          <volume>16</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Romero</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ventura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Educational data mining: a review of the state of the art</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          , Part C (
          <article-title>applications</article-title>
          and reviews),
          <volume>40</volume>
          (
          <issue>6</issue>
          ),
          <fpage>601</fpage>
          -
          <lpage>618</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Birhane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Algorithmic injustice: a relational ethics approach</article-title>
          .
          <source>Patterns</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Scholes</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>The ethics of using learning analytics to categorize students on risk</article-title>
          .
          <source>Educational Technology Research and Development</source>
          ,
          <volume>64</volume>
          (
          <issue>5</issue>
          ),
          <fpage>939</fpage>
          -
          <lpage>955</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Holzinger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saranti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molnar</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biecek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Samek</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Explainable ai methodsa brief overview</article-title>
          . In International workshop on extending explainable
          <source>ai beyond deep models and classifiers</source>
          (pp.
          <fpage>13</fpage>
          -
          <lpage>38</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Rudin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead</article-title>
          .
          <source>Nature machine intelligence</source>
          ,
          <volume>1</volume>
          (
          <issue>5</issue>
          ),
          <fpage>206</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ribeiro</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>” why should i trust you?” explaining the predictions of any classifier</article-title>
          .
          <source>In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining</source>
          (pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Islam</surname>
            ,
            <given-names>M. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>M. U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barua</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Begum</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>A systematic review of explainable artificial intelligence in terms of different application domains and tasks</article-title>
          .
          <source>Applied Sciences</source>
          ,
          <volume>12</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1353</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>De Laet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Millecamp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Broos</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De</surname>
            <given-names>Croon</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Verbert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            , &amp;
            <surname>Duorado</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Explainablelearning analytics: challenges and opportunities</article-title>
          .
          <source>In Companion proceedings of the 10th internationalconference on learning analytics &amp; knowledge lak20 society for learning analytics research (solar)</source>
          (pp.
          <fpage>500</fpage>
          -
          <lpage>510</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Alonso</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          , &amp; Bugar ́ın,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Expliclas: automatic generation of explanations in natural language for weka classifiers</article-title>
          .
          <source>In 2019 ieee international conference on fuzzy systems (fuzzieee)</source>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>