A framework for user assistance on predictive models
                           Gabriel Ferrettini                                                            William Raynaut
                             Julien Aligon                                                                William@hubwa.re
                          Chantal Soulé-Dupuy                                                                  Hubware
                    firstName.lastName@irit.fr                                                             Toulouse, France
       Université de Toulouse, UT1, IRIT, (CNRS/UMR 5505)
                          Toulouse, France
ABSTRACT                                                                                In this direction, the ambition of this paper is to propose a frame-
Data analysis generally requires very specialized skills, especially                 work assisting domain expert users in performing sensible data anal-
when applying machine learning tasks. The ambition of the paper                      ysis by himself (more precisely our work is focused on the task of
is to propose a framework assisting a domain expert user to analyse                  multi-label classification). This framework includes a recommender
his data, in a context of predictive analysis. In particular, the frame-             system of workflows for predictive models and a complementary
work includes a recommender system for the workflow of analysis                      system explaining these results. The aim of this explanation sys-
tasks. Because the lack of explanation in recommendations can lead                   tem is to make the model transparent and effective, while giving
to loss of confidence, a complementary system is proposed to better                  confidence to the user. His involvement in all the stages of this
understand the predictive models recommended. This complemen-                        framework should increase his confidence, while relying as little
tary system aims to help the user to understand and exploit the                      as possible on any other knowledge than his domain of expertise.
results of the data analysis, by relying on his data expertise. The                  Thus, the framework should allow a user to better select, personal-
framework is validated through a pool of questions and a mock-up                     ize and understand the recommended predictive models. In order
showing the interest of the approach.                                                to achieve this, we explore novel ways of exploiting single predic-
                                                                                     tion explanation. These new uses aim to help a non expert user to
KEYWORDS                                                                             appropriate complex data analysis processes.
                                                                                        The paper is organized as follows. Section 2 details the limits of
Recommendation system, Machine Learning, Predictive model, Pre-                      the current systems helping a user to analyze and classify his data.
diction explanation                                                                  A review of existing solutions to better explain predictive models
                                                                                     is also presented. Section 3 proposes a recommender system to
1    INTRODUCTION                                                                    be part of our framework and taking into account the drawbacks
                                                                                     identified in the literature. Section 4 describes the explanation sys-
In many cases, data analysis requires very specialized skills in                     tem for predictive models. Based on these two systems, Section 5
the implementation and use of models. For instance, designing                        presents the organisation of the framework and how a user can
something as common and popular as a predictive model requires                       easily select and fit the desired predictive model through the use of
expert knowledge in the machine learning field. Thus, these analysis                 the explanation system. The framework is validated in Section 6
tasks are especially difficult to perform for a domain expert user,                  thanks to a use case and illustrated by a mock-up. In particular, this
i.e. having a deep knowledge of the data to analyze, but without                     validation shows the framework is able to answer the following
any background in machine learning. Several past works have been                     three questions:
proposed to help these types of users, notably thanks to workflow
recommender systems and model building assistance (as proposed                            • How a non expert user can appropriate the results of the
in [20]). In general, these recommender systems offer very accurate                         recommendation by himself?
analysis workflows and predictive models (drawing strength from                           • How users can be confident in the produced results?
workflows performed by past users). But the interaction with these                        • How a model can be personalised without requiring machine
systems is often limited to execute the predictive model proposed,                          learning knowledge?
without an easy way to validate and personalize it. This is a major
drawback by which a user can lose confidence, due to a lack of
                                                                                     2 RELATED WORKS
explanations in the recommender system results. Indeed, neophyte                     2.1 From the need of model recommendation...
users tend to struggle giving credence to a system they do not                       Recommender systems based on collaborative filtering are known
understand and are not familiar with. Given the fact that important                  to be effective in various applications. For example, [3] suggests
decisions can be made using such a system, giving the user an                        queries based on previous issues queries, applied in the general
opportunity to have confidence in the system is important. For                       context of databases, while [14] recommends items based on user
example, the importance of transparency has been recognized for a                    data obtained from a social network, and [2] provides sequences of
long time in expert systems, as in [4], and studied more widely in                   queries based on similarities between OLAP user sessions.
the recommendation context in [28].                                                     Traditional collaborative filtering approaches however base their
                                                                                     recommendations on similarities between users, identifying intrin-
"Copyright © 2020 for this paper by its authors. Use permitted under Creative Com-   sic traits they have in common. Such systems rarely consider the
mons License Attribution 4.0 International (CC BY 4.0)."                             context in which the user evolves.
                                                                                                                                  Ferrettini et al.


   Context-aware recommender systems should then be preferred            usage examples. Unfortunately, detailed descriptions are a poor
when the context of the user is complex and more prominent [1].          substitute to an actual training in data analysis.
In this case, the information obtained from multiple contexts can            Some data analysis platforms, such as RapidMiner [15], Orange
be very useful to improve the relevance and effectiveness of the         [11], for instance, have dedicated a great attention to the problem
recommender systems. Such approaches have attracted a particular         of presenting and explaining the analysis results to the user. By
attention over the last few years. For example, [16] shows that          providing well-designed visualization interfaces, these platforms
detecting user emotion (context) and factoring it into a collabora-      assist the user in understanding the results produced, which is a
tive filtering approach increased user satisfaction. [31] proposes a     first step toward actually using them and acting on them. however,
system suggesting collaborations between universities and indus-         they cannot explain how these specific results have been achieved,
tries based on the identification of similar contexts of researchers     which remains a significant disincentive for users in areas where
(defined on a multitude of aspects). [33] develops a similarity-based    wrong decisions can have grave consequences. Helping those users
context-aware approach under the assumption that recommenda-             to grasp why a particular prediction is being made (in a way that
tions should be similar if the contextual situations of the users are    would allow them to check this reasoning against their own ex-
similar. They demonstrated that integrating a similarity measure         pert knowledge) could greatly enhance their trust in a reasonable
between multidimensional contexts could improve precision scores.        prediction, or on the contrary give them a meaningful reason to
   The problem of data analysis workflow recommendation has re-          discard a biased one. This is the original intuition behind the need
cently received an increased interest, advertising several promising     for prediction explanations.
methods [19, 23, 27, 32]. Their purpose is to assist user in solving a       More recently Google with its "what if" tool, mainly based on
range of different data analysis problems by the recommendation of       [30], proposes many exploratory machine learning tools. Those help
adequate workflows (defined as sequences of operators producing          a user to understand and exploit machine learning models in an
knowledge from data). However, none of these works arise from a          intuitive way. This is mainly done by allowing the user to explore
perspective of context-aware recommendation, taking into account         new data points with a trained model, and displaying different
the particular context in which a data-analyst evolves. They only        metrics in an easily interpretable way.
take into consideration the information related to their purpose,            Our approach is to make machine learning more interpretable
such as the objective of the analysis for planification methods [19].    by relying on explanation of the predictions provided by predictive
   We plan to consider more of the relevant information constitut-       models. The possible applications of prediction explanations have
ing the user’s context. Indeed, a dataset to analyse has multiple        been investigated by [22]. According to their paper, the interest for
features and is defined by particular characteristics, while the ac-     explaining a predictive model is threefold:
tual needs of a data-analyst can be complex and largely implicit.              • First, it can be seen as a mean to understand how a model
Experiments performed in the past can also be considered part of                 works in general, by peering at how it behaves in diverse
this context, as they carry information toward what was and can                  points of the instance space.
be done. Yet, even with a finer approach to the recommendation,                • Second, it can help a non expert user to judge of the quality
the user is still not able to understand and use fully what is being             of a prediction and even pinpoint the cause of flaws in its
recommended.                                                                     classification. Correcting them would then lead the user to
   As discussed in the introduction, the need for better explana-                perform some intuitive feature engineering operations.
tions and confidences in the recommendation results is essential               • Third, it can allow the user to decide the type of model
for a user, furthermore for a domain expert. To overcome this black              preferable to another one, even if he has no knowledge of
box problem in the recommendations, several solutions exist in                   the principles underlying each of them.
the literature, whose a review can be found in [28]. In particular, a
                                                                         A great number of works pertaining to prediction explanation led
number of goals characterizing the different types of explanations
                                                                         to [18], which theorized a category of explanation methods, named
in recommender systems is proposed, and how to evaluate them. In
                                                                         additive methods, and produced an interesting review of the differ-
particular, the notions of Transparency (i.e. how the system works as
                                                                         ent methods developed in this category. Some of these methods are
in [9]), Trust (i.e. perceived confidence with the recommendations
                                                                         described in detail in [10] and [26]. They are summarized in [18]
as in [7]), Persuasiveness (i.e. user acceptance with the recommen-
                                                                         as methods attributing for a given prediction, a weight to each at-
dations as in [9]) and Effectiveness (i.e. make better decisions as
                                                                         tribute of the dataset. This creates a very simple "predictive model",
in [24]) seem to fit at the best the objectives of recommendation
                                                                         mimicking the original model behavior locally. Thus, we have a
explanations proposed in our framework.
                                                                         simple interpretable linear model which gives information on the
                                                                         original model inner working in a small vicinity of the predicted
2.2    ...to the need of model explanation                               instance. The methods from which these weights are attributed to
                                                                         each attribute varies between the different additive methods, but
The existing systems and toolkits for machine learning (ML) and
                                                                         the end result is always this vector of weights.
data analysis in general mostly focus on providing and explaining
                                                                             Other lines of reasoning have been explored, such as in [6],
the methods and algorithms. This approach has proven particularly
                                                                         which investigated prediction explanation from the point of view
helpful for expert users, but still requires advanced knowledge of
                                                                         of model performance. Meaning that their metric shows which
data analysis. Indeed, some of the most well-known data analysis
                                                                         feature improves the performance of the model, rather than which
platforms such as Weka [13] or Knime [5] provide detailed descrip-
tions of the methods and algorithms they include, often giving           https://pair-code.github.io/what-if-tool/
A framework for user assistance on predictive models


feature the model consider as important for its prediction. If this        platform [29]. This platform contains more than a hundred meta-
line of reasoning is really interesting for the model explanation          attributes, from different statistical, information-theoretic, and land-
field, it does not correspond to our scope as well as other methods,       marking approaches (complete list available on http://www.openml.
as we are aiming to help users understand how a model works,               org/).
and not how to improve it. In this paper, we aim to facilitate the
understanding of any machine learning model for users with no              3.2.2 Attribute meta-attributes. Individual attributes of datasets
special knowledge of data analysis or machine learning. Thus, it           can be characterized along a set of measures, mostly consisting
is more relevant to focus on additive methods, as they generate            in non-aggregated versions of the previously described Dataset
a simple set of importance weights for each attribute. This set of         meta-attributes. To build our set of attribute meta-attributes, we use
weights is easy to interpret, even for someone without expertise           the 72 measures proposed in [21], able to characterize individuals
on machine learning.                                                       attributes. The key idea is to compare attributes of different datasets
                                                                           along their attributes meta-attributes. However, as the intuition is
                                                                           to make use of all available information, attributes are compared
3     WORKFLOW RECOMMENDATION SYSTEM                                       by most similar pairs: For two datasets 𝐴 and 𝐵, each attribute of
In this section is described the basic principles of our workflow          𝐴 is paired with an attribute of 𝐵 such as the total dissimilarity of
recommender system, as depicted in Figure 1. This section takes part       each pairs is as low as possible.
of a previous work, described in more details in [21] and [20]. This
recommender system has been showed to be an effective assistant            3.3    Workflow filtering by user preference
for relevant workflow selection, through the experiments in [20].          In order to make the recommendations more accurate, the perfor-
This system is the base of the framework presented in Section 5.           mance of a workflow is filtered according to the current user’s
                                                                           needs. For instance, if a current user has a very high cost of false
3.1     Process overview and definitions                                   negatives (like the early diagnosis of a dangerous disease), then we
3.1.1 Dataset. Datasets are defined as a collection of instances           should consider relevant workflows that exhibited good recall on
described along attributes. Given 𝐴 = 𝑎 1, ..., 𝑎𝑛 , the attributes of a   similar datasets. Our approach is then to consider criteria able to
dataset, an instance 𝑥 is a vector of 𝑛 attributes values: the descrip-    characterize different aspects of workflow performance to model
tion of 𝑥 along the attribute set 𝐴.                                       user preferences. Even considering only problems of supervised
                                                                           classification, many different criteria have been proposed to char-
3.1.2 Workflow. Workflows in their most general form usually               acterize different aspects of performance, like Cohen’s Kappa [8],
consist in directed graphs of data analysis operators [23]. These can      measuring agreement while accounting for the chance of random
include the many possible steps of data analysis, such as various          good guesses, or the more complex Information Score from [17],
preprocessing (data cleaning, normalization, etc.), construction of        measuring the amount of non-trivial information produced by the
models, search for patterns, or even parameter optimization for            model.
other operators. Note that, as explanation methods (see Section 4)            Then, the preference model of a user is represented as a set of
are applied to supervised classification, for now we only consider         performance criteria he is interested in (each of them associated
workflows arising from such models.                                        to a weight qualifying its relative importance). For instance, a user
                                                                           who wants to avoid false negatives has recall measure as its most
3.1.3 System overview. The recommendation system is based on               important criterion. But it does not mean that precision has to be
a meta-database of past machine-learning experiments. For each             ignored. A higher weight associated to recall represents the user
of them, one can access the base dataset of the experiment, the            preference.
workflow used to create a machine learning model (from the dataset)
and its performance.                                                       3.4    Workflow recommendation by pareto front
    In section 3.2 (step 1 of Figure 1), the method to determine how       Considering a current user, analysing a dataset and having defined
datasets are similar is presented. Then, in section 3.3 (step 2 of         his preferences, the system recommends workflows from past analy-
Figure 1), we present how the performance of a workflow is mod-            sis. This implies to access to a base of past data analysis experiments,
eled according to the current user’s needs. These user preferences         where (hopefully expert) users upload the analysis they perform.
filter the set of recommendations to propose relevant workflows            One such past analysis then consists in a dataset, upon which was
described in the last section 3.4 (step 3 of Figure 1).                    applied a workflow, yielding a result.
                                                                              The suggested workflow for the current analysis should then be
                                                                           determined according to two criteria:
3.2     Dissimilarity between datasets
The measure of dissimilarity is based on the characteristics of the           (1) The past analysis must have been produced on a dataset
datasets to be compared. This dissimilarity is computed through two               similar to the one of the current user.
levels of meta-attributes: the difference between each dataset meta-          (2) Its results, evaluated according to the preferences expressed
attribute and the difference between each attribute meta-attribute.               by the current user, should be satisfactory.
                                                                             We thus face a problem of multi-criteria optimisation, where
3.2.1 Dataset meta-attributes. In order to dispose of a large selec-       both dataset similarity and past performance matter. To solve it, a
tion of meta-attributes from diverse categories, we use the OpenML         pareto recommendation approach is proposed.
                                                                                                                                 Ferrettini et al.


                                         Figure 1: Global steps of the recommendation process


                                                                        4.1    Explaining prediction results
                                                                        Most of the methods of the literature are mainly devoted to explain
                                                                        a predictive model in a global way. These methods are not relevant
                                                                        when domain expert user (for instance a biologist) has to study the
                                                                        behavior of particular dataset instances over a predictive model (for
                                                                        instance in the context of cohort study). This is our main motivation
                                                                        to propose an explanation system able to understand the behavior
                                                                        of individual predictions. In particular, this method is detailed in
                                                                        [12] and the main principles are described below.

Figure 2: Pareto front of the best past analysis according to              Underlying principle. Our prediction explanation system relies
our two criteria                                                        on the principle of analysing the influence of each attribute on
                                                                        the model prediction. This way, we aim to emphasize the most
                                                                        important attributes according to the model. The explanation of
                                                                        the model is realized by comparing the impact of the absence and
   Consider the full Pareto front of past analysis as a set of rec-     presence of the attributes to determine their influence. However,
ommendations. We can then consider our best possible candidates         considering that each attribute is independent of the others presents
according to our two criteria (as shown in Figure 2), which increases   a limit. Therefore, in order to take into account this dependency
the chances of finding one that suits the user, but requires an addi-   between attributes, it is necessary to consider the influence of
tional step to discriminate between candidates. Indeed, supplying       attribute groups on prediction. These influences are then aggregated
the full set of recommendation would probably be useful to expert       in a unique score by using Shapley’s individuals participation in
users, but is most likely to overwhelm a non-expert.                    group efforts, described in [25].

4   PREDICTIVE MODEL EXPLANATION                                        4.2    Explanation of a single attribut influence
    SYSTEM                                                              Given a dataset of instances described along the attributes of 𝐴,
As introduced in Section 1 and developed in Section 2, the lack of      the influence of the attribute 𝑎𝑖 on the classification of an instance
understanding of prediction recommendations is a real problem           𝑥 by the classifier confidence function 𝑓 on the class 𝐶 can be
encountered by most domain experts users. It leads to a lack of         represented as:
trusts in the models, and impairs its use. Moreover, even through a
guidance, a lack of experience can lead to mistakes when analysing                          𝑖𝑛𝑓 𝑓𝐶,𝑎 (𝑥) = 𝑓 (𝑥𝑎𝑖 ) − 𝑓 (∅)                   (1)
                                                                                                  𝑖
a dataset and considering not so well adapted predictions. In or-
der to address those pitfalls, we aim to help the user understand          Where 𝑓 (𝑥𝑎𝑖 ) represents the probability that the instance 𝑥 is
the recommendation results. For this, we propose an explanation         included in the class 𝐶 with only the knowledge of the attribute
system mainly based on the domain user knowledge.                       𝑎𝑖 (according to the predictive model). This formula can be used
A framework for user assistance on predictive models


with groups of attributes, which leads us to an influence inspired                     5.2    Feature selection via prediction
by Shapley’s work:                                                                            explanation
                   Õ                                                                       Feature engineering - Thanks to the prediction explanations, a
    I𝑎𝐶𝑖 (𝑥) =              𝑝 (𝐴 ′, 𝐴) ∗ (𝑖𝑛𝑓 𝑓𝐶,(𝐴′ ∪𝑎 ) (𝑥) − 𝑖𝑛𝑓 𝑓𝐶,𝐴′ (𝑥))   (2)   user can access the reasoning behind each model, allowing him
                                                      𝑖
                 𝐴′ ⊆𝐴\𝑎𝑖                                                              to detect possible flaws in the proposed models. As an example,
   With 𝑝 (𝐴 ′, 𝐴) the Shapley’s value, a penalty function accounting                  prediction explanation allowed personnel of an hospital performing
for the size of the subset 𝐴 ′ .                                                       a medical study described in [22] to realise that some attributes
                                                                                       should not have been included in their dataset. Moreover, based on
                              |𝐴 ′ |! ∗ (|𝐴| − |𝐴 ′ | − 1)!                            his own domain of expertise, a user can assess the importance of
                    𝑝 (𝐴 ′, 𝐴) =                                    (3)
                                           |𝐴|!                                        each feature, comparatively to the importance given to them by the
   Due to the exponential complexity of the formula, an optimisa-                      models. Thus, the user can select undesirable features and remove
tion of the calculation of the influence of an attribute is proposed in                them from the dataset.
[12]. It produces with a satisfactory approximation with a relatively                     Model selection - Once the final desired features have been de-
small loss in accuracy).                                                               termined, the user exploits his domain knowledge to assess the
                                                                                       reasoning behind each model. This assessment is based both on a
5     ORGANISATION OF THE FRAMEWORK                                                    global evaluation, such as Cohen kappa or the area under the ROC
We now present the framework including the two systems described                       curve, and local information on the prediction. This allows the user
in the two previous sections. Our framework is separated into two                      to select the desired final model by choosing the best performing
use cases. In Section 5.1, we show how a domain expert user can                        model, but also the one with the most relevant use of the dataset
be guided through the complex process of selecting a predictive                        features.
model among a set of possible ones, while Section 5.2 illustrates
how explanations bring new insights during the feature refinement
                                                                                       6     VALIDATION OF THE FRAMEWORK
of a predictive model. These two processes are illustrated in Figure                   In order to validate the answers to our original questions indicated
3 and are based on the most common functionalities of the literature                   in Section 1, we propose a mock-up of our framework. This mock-up
described in Section 2.                                                                illustrates a use case, based on the well-known UCI pimas indians
   Remember that this framework is intended for users who have                         diabetes dataset (available on many platforms, as kaggle), since
no prior knowledge of machine learning, but who have expertise                         familiarity with the dataset is beneficial to the understanding of
in their own field (e.g. biologists, doctors, engineers...). These users               this validation.
produce data that they are required to analyze. It is therefore as-                        In our use case, a biologist is aiming to study the dataset of pimas
sumed that they have a solid knowledge of this data, but not of                        indians diabetes, and uses our recommendation system to provide
machine learning methods.                                                              possible analysis workflows. First, as described in 5, the user enters
                                                                                       the diabetes dataset as input of the recommender system, and asks
5.1     Model selection via prediction explanation                                     it to perform a recommendation.
    Workflows recommendation - First, a user produces data he wants
to analyse. The data is given as input of the recommender system,
                                                                                       6.1    Helping a non expert user appropriate the
along with their specifications for the analysis: the target feature                          results of the recommendation by himself
and their preferences in term of results. The system then suggests                     Instead of recommending one of the workflows of the pareto front
a set of possible workflows which are the most able to analyse the                     (see Section 3.4), the user is presented the four best recommen-
user’s data.                                                                           dations from the pareto front. A description of each workflow is
    Execution - Among this selection of possible workflows, the user                   made available to the user, allowing him to perform a first selec-
can select all or a set of them. He can access a description of each                   tion among the different options. Although these descriptions are
workflow and its inner working if desired, allowing him to perform                     necessarily technical, they are essential for a user to understand
a first selection of the possible workflows. The workflows selected                    what is happening when each workflow is executed. The workflows
are then executed and produce a set of predictive models.                              and their descriptions are depicted in Figure 5. As an example, we
    Model explanation - Using these models, the system can generate                    can see in the figure that a workflow is not only the production
the classification of a given instance of the dataset and provide                      of a predictive model, but also successive operations of transfor-
its afferent explanation for each model. These explanations take                       mation applied in the dataset. These workflows are then executed
the form of attribute influences. For instance, in Figure 4, a user                    and presented to the user through a set of selected instances. These
is informed that a particular patient is predicted to have diabetes                    instances are selected in a way that favours a large diversity in
by both models 𝐴 and 𝐵, but that 𝐴 made this decision consider-                        predictions explanations. The exact algorithm used here is the one
ing mostly the patient’s diet, while 𝐵 also considers his weight as                    presented in [22]. The user can thus explore each predictive model
important. In order to allow the user to explore the models in an                      through this set of instances, by viewing a diverse set of keypoints,
intuitive way, a set of 10 instances are recommended for his review.                   illustrating the models. This allows him to infer how the whole
This selection of instances aims to provide the user with a set of pre-                model works, with minimal information. The instances and their
diction explanations as diverse as possible, without overwhelming                      attached prediction explanations are depicted as in Figure 6. On the
him with a space too large to be explored efficiently by humans.                       left, the user can select the instance he wants to study, and decide
                                                                                                                                 Ferrettini et al.


                                                  Figure 3: Building a predictive model


                                                                        accuracy of a model or the kappa score does not warn a user of an
                                                                        inappropriate attribute which should be removed from the dataset.
                                                                           In our mock-up example, the user can decide that the age of a
                                                                        patient is not that important in determining if he is likely to have
                                                                        diabetes. At the same time, if our user considers a patient’s mass as
                                                                        a valid indicator, it indicates him that the naive bayes model is more
                                                                        interesting in his case (supposing the instances he reviewed are
                                                                        consistent with this explanation). This understanding of a model,
                                                                        its strengths, and its flaws gives the user a stronger confidence
               Figure 4: Explaining a prediction                        in what is being accomplished during the data analysis process.
                                                                        By pinpointing eventual problems in the predictive model, he also
                                                                        becomes able to know when the model is reliable.
to eventually remove attributes from the dataset. On the right is
presented the prediction explanation of the selected instance for
each of the models. In our use case, we can see the scientist selects
                                                                        6.3    Personalising a model without requiring
the instance 49. Automatically, an explanation is proposed where               data analysis knowledge
random forest and bagging J48 (an optimized decision tree) mod-         Once the user have studied his models, he can assess which work-
els mainly base their prediction on their blood pressure and age,       flows fit best his requirements. In particular, the user can identify
whereas the naive bayes is mostly influenced by the mass of the         which features are mainly used by the workflows, and decide which
instance. This allows an immediate access to the inner workings         are important for his study. In our mock-up example, the biologist
of each presented workflow, which is solely based on the domain         might want to study the impact of less evident diabetes indicators,
knowledge of the user. Thus, by presenting the results and how          and decide to remove the insulin and plasma features from his
they were obtained, the user is informed of the conclusion of the       dataset (like presented in Figure 7). This forces the workflows to
prediction, without having to rely blindly on the model.                use the other features, and maybe highlight new important indica-
   Therefore, through prediction explanation, the user can access       tors. We can see in Figure 7 that the J48 workflow has significantly
to a new type of information that does not rely on expertise in data    changed its behavior, while Adaboost model has simply adjusted
analysis to be understood. He can understand and appropriate the        the importance of each attribute.
results of the recommendation system thanks to his own domain-             By this process, the user accomplishes feature selection without
based knowledge: without understanding the inner workings of            having data analysis knowledge or expertise. His domain knowl-
each model. He can visualize how each model uses the data to make       edge allows him to assess the interest of a feature and decide if the
predictions.                                                            workflows are using them well or not.

6.2    Giving a user confidence in the produced                         7     CONCLUSION AND PERSPECTIVES
       results                                                          We have presented a framework that proposes a new way to as-
Through this explanation method, the user can choose between            sist a user in analyzing their data in two steps. In the first step, a
models without having to rely solely on global measures of perfor-      recommendation system provides possible analysis workflows and
mance. He is able to use his own judgement rather than by the only      predictive models, similar to what other users would have done
proposal of a fully automated process. This also makes it possible      in the past for a similar dataset. In the second step, the proposal
to evaluate possible defects in the models, which is not always         of a model explanation allows a domain expert user to study the
possible with only conventional metrics. As an example, the global      predictive models by himself.
A framework for user assistance on predictive models


                                                               Figure 5: Workflow recommendation


                                  Figure 6: Visualization of prediction results through prediction explanations


   We have shown this framework brings an answer to possible                             [3] Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query Rec-
unresolved data analysis pitfalls, by a better understanding of data                         ommendation Using Query Logs in Search Engines. In Proceedings of the 2004
                                                                                             International Conference on Current Trends in Database Technology (Heraklion,
analysis models and building the user’s confidence.                                          Greece) (EDBT’04). Springer-Verlag, Berlin, Heidelberg, 588–596.
   However, this method still has to be tested in real-world situ-                       [4] S.W. Bennett and A.C. Scott. 1985. The Rule-Based Expert Systems: The MYCIN
                                                                                             Experiments of the Stanford Heuristic Programming Project, chap. 19 - Specialized
ations. A prototype is then being developed, in interaction with                             Explanations for Dosage Selection. Addison-Wesley Publishing Company (1985),
biologists of the institute of Cardiovascular and Metabolic Diseases                         363 – 370.
(INSERM institute). A medium-term perspective is to form a co-                           [5] Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Köt-
                                                                                             ter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel.
hort of actual domain expert users to assess the efficiency of our                           2007. KNIME: The Konstanz Information Miner. In Studies in Classification, Data
framework and its capacity to assist them with real-world problems.                          Analysis, and Knowledge Organization (GfKL 2007). Springer.
                                                                                         [6] G. Casalicchio, C. Molnar, and B. Bischl. 2018. Visualizing the Feature Importance
                                                                                             for Black Box Models. arXiv e-prints (April 2018). arXiv:stat.ML/1804.06620
REFERENCES                                                                               [7] Li Chen and Pearl Pu. 2005. Trust Building in Recommender Agents. In in
 [1] Gediminas Adomavicius and Alexander Tuzhilin. 2008. Context-aware Recom-                1st International Workshop on Web Personalization, Recommender Systems and
     mender Systems. In Proceedings of the 2008 ACM Conference on Recommender                Intelligent User Interfaces (WPRSIUI05. 135–145.
     Systems (Lausanne, Switzerland) (RecSys ’08). ACM, New York, NY, USA, 335–336.      [8] Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for
 [2] Julien Aligon, Enrico Gallinucci, Matteo Golfarelli, Patrick Marcel, and Stefano        scaled disagreement or partial credit. Psychological bulletin 70, 4 (1968), 213.
     Rizzi. 2015. A collaborative filtering approach for recommending {OLAP} sessions.   [9] Henriette S. M. Cramer, Vanessa Evers, Satyan Ramlal, Maarten van Someren,
     Decision Support Systems 69 (2015), 20 – 30.                                            Lloyd Rutledge, Natalia Stash, Lora Aroyo, and Bob J. Wielinga. 2008. The effects
                                                                                             of transparency on trust in and acceptance of a content-based art recommender.
                                                                                             User Model. User-Adapt. Interact. 18, 5 (2008), 455–496.
http://www.i2mc.inserm.fr/index.php/en/
                                                                                                                                                                  Ferrettini et al.


                    Figure 7: New prediction explanations once the attributes plasma and insulin have been removed


[10] A. Datta, S. Sen, and Y. Zick. 2016. Algorithmic Transparency via Quantitative         [24] Guy Shani, Lior Rokach, Bracha Shapira, Sarit Hadash, and Moran Tangi. 2013.
     Input Influence: Theory and Experiments with Learning Systems. In 2016 IEEE                 Investigating confidence displays for top-N recommendations. Journal of the
     Symposium on Security and Privacy (SP). 598–617.                                            American Society for Information Science and Technology 64, 12 (2013), 2548–2563.
[11] Janez Demšar, Tomaž Curk, Aleš Erjavec, Črt Gorup, Tomaž Hočevar, Mitar                [25] L. S. Shapley. 1953. A value for n-person games. Contributions to the Theory of
     Milutinovič, Martin Možina, Matija Polajnar, Marko Toplak, Anže Starič, Miha                Games 28 (1953), 307–317.
     Štajdohar, Lan Umek, Lan Žagar, Jure Žbontar, Marinka Žitnik, and Blaž Zupan.          [26] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Im-
     2013. Orange: Data Mining Toolbox in Python. Journal of Machine Learning                    portant Features Through Propagating Activation Differences. In Proceedings of
     Research 14 (2013), 2349–2353. http://jmlr.org/papers/v14/demsar13a.html                    the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW,
[12] Gabriel Ferrettini, Julien Aligon, and Chantal Soulé-Dupuy. 2020. Explaining                Australia) (ICML’17). 3145–3153.
     Single Predictions: A Faster Method. In SOFSEM 2020: Theory and Practice of Com-       [27] Quan Sun, Bernhard Pfahringer, and Michael Mayo. 2012. Full model selection in
     puter Science, Alexander Chatzigeorgiou, Riccardo Dondi, Herodotos Herodotou,               the space of data mining operators. In Proceedings of the 14th annual conference
     Christos Kapoutsis, Yannis Manolopoulos, George A. Papadopoulos, and Florian                companion on Genetic and evolutionary computation. ACM, 1503–1504.
     Sikora (Eds.). Springer International Publishing, Cham, 313–324.                       [28] Nava Tintarev and Judith Masthoff. 2015. Explaining Recommendations: Design
[13] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann,               and Evaluation. Springer US, Boston, MA, 353–382. https://doi.org/10.1007/978-
     and Ian H Witten. 2009. The WEKA data mining software: an update. ACM                       1-4899-7637-6_10
     SIGKDD explorations newsletter 11, 1 (2009), 10–18.                                    [29] Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML:
[14] Jianming He and Wesley W. Chu. 2010. A Social Network-Based Recommender                     Networked Science in Machine Learning. SIGKDD Explorations 15, 2 (2013), 49–
     System (SNRS). Springer US, Boston, MA, 47–74.                                              60.
[15] Markus Hofmann and Ralf Klinkenberg. 2013. RapidMiner: Data Mining Use Cases           [30] Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. 2017. Counterfactual
     and Business Analytics Applications. Chapman & Hall/CRC.                                    Explanations without Opening the Black Box: Automated Decisions and the
[16] U. A. Piumi Ishanka and Takashi Yukawa. 2017. The Prefiltering Techniques                   GDPR. CoRR abs/1711.00399 (2017). arXiv:1711.00399 http://arxiv.org/abs/1711.
     in Emotion Based Place Recommendation Derived by User Reviews. Applied                      00399
     Computational Intelligence and Soft Computing vol. 2017 (2017), 10 pages.              [31] Qi Wang, Jian Ma, Xiuwu Liao, and Wei Du. 2017. A context-aware researcher
[17] Igor Kononenko and Ivan Bratko. 1991. Information-Based Evaluation Criterion                recommendation system for university-industry collaboration on R&D projects.
     for Classifier’s Performance. Machine Learning 6, 1 (Jan. 1991), 67–80.                     Decision Support Systems 103, Supplement C (2017), 46 – 57.
[18] Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model         [32] Monika Zakova, Petr Kremen, Filip Zelezny, and Nada Lavrac. 2011. Automating
     Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon,             knowledge discovery workflow composition through ontology-based planning.
     U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett            Automation Science and Engineering, IEEE Transactions on 8, 2 (2011), 253–264.
     (Eds.). Curran Associates, Inc., 4765–4774. http://papers.nips.cc/paper/7062-a-        [33] Yong Zheng, Bamshad Mobasher, and Robin Burke. 2015. Similarity-Based
     unified-approach-to-interpreting-model-predictions.pdf                                      Context-Aware Recommendation. Springer International Publishing, Cham, 431–
[19] Phong Nguyen, Melanie Hilario, and Alexandros Kalousis. 2014. Using meta-                   447.
     mining to support data mining workflow planning and optimization. Journal of
     Artificial Intelligence Research (2014), 605–644.
[20] William Raynaut. 2018. Meta-analysis perspectives toward assistance in prediction
     and simulation. Theses. Université Paul Sabatier - Toulouse III. https://tel.
     archives-ouvertes.fr/tel-02023797
[21] William Raynaut, Chantal Soule-Dupuy, and Nathalie Valles-Parlangeau. 2016.
     Meta-Mining Evaluation Framework : A large scale proof of concept on Meta-
     Learning. In 29th Australasian Joint Conference on Artificial Intelligence (2016-12-
     05). Springer, 215–228.
[22] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I
     Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the
     22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data
     Mining (San Francisco, California, USA) (KDD ’16). ACM, New York, NY, USA,
     1135–1144.
[23] Floarea Serban, Joaquin Vanschoren, Jörg-Uwe Kietz, and Abraham Bernstein.
     2013. A Survey of Intelligent Assistants for Data Analysis. ACM Comput. Surv.
     45, 3, Article 31 (July 2013), 35 pages. https://doi.org/10.1145/2480741.2480748