A framework for user assistance on predictive models Gabriel Ferrettini William Raynaut Julien Aligon William@hubwa.re Chantal Soulé-Dupuy Hubware firstName.lastName@irit.fr Toulouse, France Université de Toulouse, UT1, IRIT, (CNRS/UMR 5505) Toulouse, France ABSTRACT In this direction, the ambition of this paper is to propose a frame- Data analysis generally requires very specialized skills, especially work assisting domain expert users in performing sensible data anal- when applying machine learning tasks. The ambition of the paper ysis by himself (more precisely our work is focused on the task of is to propose a framework assisting a domain expert user to analyse multi-label classification). This framework includes a recommender his data, in a context of predictive analysis. In particular, the frame- system of workflows for predictive models and a complementary work includes a recommender system for the workflow of analysis system explaining these results. The aim of this explanation sys- tasks. Because the lack of explanation in recommendations can lead tem is to make the model transparent and effective, while giving to loss of confidence, a complementary system is proposed to better confidence to the user. His involvement in all the stages of this understand the predictive models recommended. This complemen- framework should increase his confidence, while relying as little tary system aims to help the user to understand and exploit the as possible on any other knowledge than his domain of expertise. results of the data analysis, by relying on his data expertise. The Thus, the framework should allow a user to better select, personal- framework is validated through a pool of questions and a mock-up ize and understand the recommended predictive models. In order showing the interest of the approach. to achieve this, we explore novel ways of exploiting single predic- tion explanation. These new uses aim to help a non expert user to KEYWORDS appropriate complex data analysis processes. The paper is organized as follows. Section 2 details the limits of Recommendation system, Machine Learning, Predictive model, Pre- the current systems helping a user to analyze and classify his data. diction explanation A review of existing solutions to better explain predictive models is also presented. Section 3 proposes a recommender system to 1 INTRODUCTION be part of our framework and taking into account the drawbacks identified in the literature. Section 4 describes the explanation sys- In many cases, data analysis requires very specialized skills in tem for predictive models. Based on these two systems, Section 5 the implementation and use of models. For instance, designing presents the organisation of the framework and how a user can something as common and popular as a predictive model requires easily select and fit the desired predictive model through the use of expert knowledge in the machine learning field. Thus, these analysis the explanation system. The framework is validated in Section 6 tasks are especially difficult to perform for a domain expert user, thanks to a use case and illustrated by a mock-up. In particular, this i.e. having a deep knowledge of the data to analyze, but without validation shows the framework is able to answer the following any background in machine learning. Several past works have been three questions: proposed to help these types of users, notably thanks to workflow recommender systems and model building assistance (as proposed • How a non expert user can appropriate the results of the in [20]). In general, these recommender systems offer very accurate recommendation by himself? analysis workflows and predictive models (drawing strength from • How users can be confident in the produced results? workflows performed by past users). But the interaction with these • How a model can be personalised without requiring machine systems is often limited to execute the predictive model proposed, learning knowledge? without an easy way to validate and personalize it. This is a major drawback by which a user can lose confidence, due to a lack of 2 RELATED WORKS explanations in the recommender system results. Indeed, neophyte 2.1 From the need of model recommendation... users tend to struggle giving credence to a system they do not Recommender systems based on collaborative filtering are known understand and are not familiar with. Given the fact that important to be effective in various applications. For example, [3] suggests decisions can be made using such a system, giving the user an queries based on previous issues queries, applied in the general opportunity to have confidence in the system is important. For context of databases, while [14] recommends items based on user example, the importance of transparency has been recognized for a data obtained from a social network, and [2] provides sequences of long time in expert systems, as in [4], and studied more widely in queries based on similarities between OLAP user sessions. the recommendation context in [28]. Traditional collaborative filtering approaches however base their recommendations on similarities between users, identifying intrin- "Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- sic traits they have in common. Such systems rarely consider the mons License Attribution 4.0 International (CC BY 4.0)." context in which the user evolves. Ferrettini et al. Context-aware recommender systems should then be preferred usage examples. Unfortunately, detailed descriptions are a poor when the context of the user is complex and more prominent [1]. substitute to an actual training in data analysis. In this case, the information obtained from multiple contexts can Some data analysis platforms, such as RapidMiner [15], Orange be very useful to improve the relevance and effectiveness of the [11], for instance, have dedicated a great attention to the problem recommender systems. Such approaches have attracted a particular of presenting and explaining the analysis results to the user. By attention over the last few years. For example, [16] shows that providing well-designed visualization interfaces, these platforms detecting user emotion (context) and factoring it into a collabora- assist the user in understanding the results produced, which is a tive filtering approach increased user satisfaction. [31] proposes a first step toward actually using them and acting on them. however, system suggesting collaborations between universities and indus- they cannot explain how these specific results have been achieved, tries based on the identification of similar contexts of researchers which remains a significant disincentive for users in areas where (defined on a multitude of aspects). [33] develops a similarity-based wrong decisions can have grave consequences. Helping those users context-aware approach under the assumption that recommenda- to grasp why a particular prediction is being made (in a way that tions should be similar if the contextual situations of the users are would allow them to check this reasoning against their own ex- similar. They demonstrated that integrating a similarity measure pert knowledge) could greatly enhance their trust in a reasonable between multidimensional contexts could improve precision scores. prediction, or on the contrary give them a meaningful reason to The problem of data analysis workflow recommendation has re- discard a biased one. This is the original intuition behind the need cently received an increased interest, advertising several promising for prediction explanations. methods [19, 23, 27, 32]. Their purpose is to assist user in solving a More recently Google with its "what if" tool, mainly based on range of different data analysis problems by the recommendation of [30], proposes many exploratory machine learning tools. Those help adequate workflows (defined as sequences of operators producing a user to understand and exploit machine learning models in an knowledge from data). However, none of these works arise from a intuitive way. This is mainly done by allowing the user to explore perspective of context-aware recommendation, taking into account new data points with a trained model, and displaying different the particular context in which a data-analyst evolves. They only metrics in an easily interpretable way. take into consideration the information related to their purpose, Our approach is to make machine learning more interpretable such as the objective of the analysis for planification methods [19]. by relying on explanation of the predictions provided by predictive We plan to consider more of the relevant information constitut- models. The possible applications of prediction explanations have ing the user’s context. Indeed, a dataset to analyse has multiple been investigated by [22]. According to their paper, the interest for features and is defined by particular characteristics, while the ac- explaining a predictive model is threefold: tual needs of a data-analyst can be complex and largely implicit. • First, it can be seen as a mean to understand how a model Experiments performed in the past can also be considered part of works in general, by peering at how it behaves in diverse this context, as they carry information toward what was and can points of the instance space. be done. Yet, even with a finer approach to the recommendation, • Second, it can help a non expert user to judge of the quality the user is still not able to understand and use fully what is being of a prediction and even pinpoint the cause of flaws in its recommended. classification. Correcting them would then lead the user to As discussed in the introduction, the need for better explana- perform some intuitive feature engineering operations. tions and confidences in the recommendation results is essential • Third, it can allow the user to decide the type of model for a user, furthermore for a domain expert. To overcome this black preferable to another one, even if he has no knowledge of box problem in the recommendations, several solutions exist in the principles underlying each of them. the literature, whose a review can be found in [28]. In particular, a A great number of works pertaining to prediction explanation led number of goals characterizing the different types of explanations to [18], which theorized a category of explanation methods, named in recommender systems is proposed, and how to evaluate them. In additive methods, and produced an interesting review of the differ- particular, the notions of Transparency (i.e. how the system works as ent methods developed in this category. Some of these methods are in [9]), Trust (i.e. perceived confidence with the recommendations described in detail in [10] and [26]. They are summarized in [18] as in [7]), Persuasiveness (i.e. user acceptance with the recommen- as methods attributing for a given prediction, a weight to each at- dations as in [9]) and Effectiveness (i.e. make better decisions as tribute of the dataset. This creates a very simple "predictive model", in [24]) seem to fit at the best the objectives of recommendation mimicking the original model behavior locally. Thus, we have a explanations proposed in our framework. simple interpretable linear model which gives information on the original model inner working in a small vicinity of the predicted 2.2 ...to the need of model explanation instance. The methods from which these weights are attributed to each attribute varies between the different additive methods, but The existing systems and toolkits for machine learning (ML) and the end result is always this vector of weights. data analysis in general mostly focus on providing and explaining Other lines of reasoning have been explored, such as in [6], the methods and algorithms. This approach has proven particularly which investigated prediction explanation from the point of view helpful for expert users, but still requires advanced knowledge of of model performance. Meaning that their metric shows which data analysis. Indeed, some of the most well-known data analysis feature improves the performance of the model, rather than which platforms such as Weka [13] or Knime [5] provide detailed descrip- tions of the methods and algorithms they include, often giving https://pair-code.github.io/what-if-tool/ A framework for user assistance on predictive models feature the model consider as important for its prediction. If this platform [29]. This platform contains more than a hundred meta- line of reasoning is really interesting for the model explanation attributes, from different statistical, information-theoretic, and land- field, it does not correspond to our scope as well as other methods, marking approaches (complete list available on http://www.openml. as we are aiming to help users understand how a model works, org/). and not how to improve it. In this paper, we aim to facilitate the understanding of any machine learning model for users with no 3.2.2 Attribute meta-attributes. Individual attributes of datasets special knowledge of data analysis or machine learning. Thus, it can be characterized along a set of measures, mostly consisting is more relevant to focus on additive methods, as they generate in non-aggregated versions of the previously described Dataset a simple set of importance weights for each attribute. This set of meta-attributes. To build our set of attribute meta-attributes, we use weights is easy to interpret, even for someone without expertise the 72 measures proposed in [21], able to characterize individuals on machine learning. attributes. The key idea is to compare attributes of different datasets along their attributes meta-attributes. However, as the intuition is to make use of all available information, attributes are compared 3 WORKFLOW RECOMMENDATION SYSTEM by most similar pairs: For two datasets 𝐴 and 𝐵, each attribute of In this section is described the basic principles of our workflow 𝐴 is paired with an attribute of 𝐵 such as the total dissimilarity of recommender system, as depicted in Figure 1. This section takes part each pairs is as low as possible. of a previous work, described in more details in [21] and [20]. This recommender system has been showed to be an effective assistant 3.3 Workflow filtering by user preference for relevant workflow selection, through the experiments in [20]. In order to make the recommendations more accurate, the perfor- This system is the base of the framework presented in Section 5. mance of a workflow is filtered according to the current user’s needs. For instance, if a current user has a very high cost of false 3.1 Process overview and definitions negatives (like the early diagnosis of a dangerous disease), then we 3.1.1 Dataset. Datasets are defined as a collection of instances should consider relevant workflows that exhibited good recall on described along attributes. Given 𝐴 = 𝑎 1, ..., 𝑎𝑛 , the attributes of a similar datasets. Our approach is then to consider criteria able to dataset, an instance 𝑥 is a vector of 𝑛 attributes values: the descrip- characterize different aspects of workflow performance to model tion of 𝑥 along the attribute set 𝐴. user preferences. Even considering only problems of supervised classification, many different criteria have been proposed to char- 3.1.2 Workflow. Workflows in their most general form usually acterize different aspects of performance, like Cohen’s Kappa [8], consist in directed graphs of data analysis operators [23]. These can measuring agreement while accounting for the chance of random include the many possible steps of data analysis, such as various good guesses, or the more complex Information Score from [17], preprocessing (data cleaning, normalization, etc.), construction of measuring the amount of non-trivial information produced by the models, search for patterns, or even parameter optimization for model. other operators. Note that, as explanation methods (see Section 4) Then, the preference model of a user is represented as a set of are applied to supervised classification, for now we only consider performance criteria he is interested in (each of them associated workflows arising from such models. to a weight qualifying its relative importance). For instance, a user who wants to avoid false negatives has recall measure as its most 3.1.3 System overview. The recommendation system is based on important criterion. But it does not mean that precision has to be a meta-database of past machine-learning experiments. For each ignored. A higher weight associated to recall represents the user of them, one can access the base dataset of the experiment, the preference. workflow used to create a machine learning model (from the dataset) and its performance. 3.4 Workflow recommendation by pareto front In section 3.2 (step 1 of Figure 1), the method to determine how Considering a current user, analysing a dataset and having defined datasets are similar is presented. Then, in section 3.3 (step 2 of his preferences, the system recommends workflows from past analy- Figure 1), we present how the performance of a workflow is mod- sis. This implies to access to a base of past data analysis experiments, eled according to the current user’s needs. These user preferences where (hopefully expert) users upload the analysis they perform. filter the set of recommendations to propose relevant workflows One such past analysis then consists in a dataset, upon which was described in the last section 3.4 (step 3 of Figure 1). applied a workflow, yielding a result. The suggested workflow for the current analysis should then be determined according to two criteria: 3.2 Dissimilarity between datasets The measure of dissimilarity is based on the characteristics of the (1) The past analysis must have been produced on a dataset datasets to be compared. This dissimilarity is computed through two similar to the one of the current user. levels of meta-attributes: the difference between each dataset meta- (2) Its results, evaluated according to the preferences expressed attribute and the difference between each attribute meta-attribute. by the current user, should be satisfactory. We thus face a problem of multi-criteria optimisation, where 3.2.1 Dataset meta-attributes. In order to dispose of a large selec- both dataset similarity and past performance matter. To solve it, a tion of meta-attributes from diverse categories, we use the OpenML pareto recommendation approach is proposed. Ferrettini et al. Figure 1: Global steps of the recommendation process 4.1 Explaining prediction results Most of the methods of the literature are mainly devoted to explain a predictive model in a global way. These methods are not relevant when domain expert user (for instance a biologist) has to study the behavior of particular dataset instances over a predictive model (for instance in the context of cohort study). This is our main motivation to propose an explanation system able to understand the behavior of individual predictions. In particular, this method is detailed in [12] and the main principles are described below. Figure 2: Pareto front of the best past analysis according to Underlying principle. Our prediction explanation system relies our two criteria on the principle of analysing the influence of each attribute on the model prediction. This way, we aim to emphasize the most important attributes according to the model. The explanation of the model is realized by comparing the impact of the absence and Consider the full Pareto front of past analysis as a set of rec- presence of the attributes to determine their influence. However, ommendations. We can then consider our best possible candidates considering that each attribute is independent of the others presents according to our two criteria (as shown in Figure 2), which increases a limit. Therefore, in order to take into account this dependency the chances of finding one that suits the user, but requires an addi- between attributes, it is necessary to consider the influence of tional step to discriminate between candidates. Indeed, supplying attribute groups on prediction. These influences are then aggregated the full set of recommendation would probably be useful to expert in a unique score by using Shapley’s individuals participation in users, but is most likely to overwhelm a non-expert. group efforts, described in [25]. 4 PREDICTIVE MODEL EXPLANATION 4.2 Explanation of a single attribut influence SYSTEM Given a dataset of instances described along the attributes of 𝐴, As introduced in Section 1 and developed in Section 2, the lack of the influence of the attribute 𝑎𝑖 on the classification of an instance understanding of prediction recommendations is a real problem 𝑥 by the classifier confidence function 𝑓 on the class 𝐶 can be encountered by most domain experts users. It leads to a lack of represented as: trusts in the models, and impairs its use. Moreover, even through a guidance, a lack of experience can lead to mistakes when analysing 𝑖𝑛𝑓 𝑓𝐶,𝑎 (𝑥) = 𝑓 (𝑥𝑎𝑖 ) − 𝑓 (∅) (1) 𝑖 a dataset and considering not so well adapted predictions. In or- der to address those pitfalls, we aim to help the user understand Where 𝑓 (𝑥𝑎𝑖 ) represents the probability that the instance 𝑥 is the recommendation results. For this, we propose an explanation included in the class 𝐶 with only the knowledge of the attribute system mainly based on the domain user knowledge. 𝑎𝑖 (according to the predictive model). This formula can be used A framework for user assistance on predictive models with groups of attributes, which leads us to an influence inspired 5.2 Feature selection via prediction by Shapley’s work: explanation Õ Feature engineering - Thanks to the prediction explanations, a I𝑎𝐶𝑖 (𝑥) = 𝑝 (𝐴 ′, 𝐴) ∗ (𝑖𝑛𝑓 𝑓𝐶,(𝐴′ ∪𝑎 ) (𝑥) − 𝑖𝑛𝑓 𝑓𝐶,𝐴′ (𝑥)) (2) user can access the reasoning behind each model, allowing him 𝑖 𝐴′ ⊆𝐴\𝑎𝑖 to detect possible flaws in the proposed models. As an example, With 𝑝 (𝐴 ′, 𝐴) the Shapley’s value, a penalty function accounting prediction explanation allowed personnel of an hospital performing for the size of the subset 𝐴 ′ . a medical study described in [22] to realise that some attributes should not have been included in their dataset. Moreover, based on |𝐴 ′ |! ∗ (|𝐴| − |𝐴 ′ | − 1)! his own domain of expertise, a user can assess the importance of 𝑝 (𝐴 ′, 𝐴) = (3) |𝐴|! each feature, comparatively to the importance given to them by the Due to the exponential complexity of the formula, an optimisa- models. Thus, the user can select undesirable features and remove tion of the calculation of the influence of an attribute is proposed in them from the dataset. [12]. It produces with a satisfactory approximation with a relatively Model selection - Once the final desired features have been de- small loss in accuracy). termined, the user exploits his domain knowledge to assess the reasoning behind each model. This assessment is based both on a 5 ORGANISATION OF THE FRAMEWORK global evaluation, such as Cohen kappa or the area under the ROC We now present the framework including the two systems described curve, and local information on the prediction. This allows the user in the two previous sections. Our framework is separated into two to select the desired final model by choosing the best performing use cases. In Section 5.1, we show how a domain expert user can model, but also the one with the most relevant use of the dataset be guided through the complex process of selecting a predictive features. model among a set of possible ones, while Section 5.2 illustrates how explanations bring new insights during the feature refinement 6 VALIDATION OF THE FRAMEWORK of a predictive model. These two processes are illustrated in Figure In order to validate the answers to our original questions indicated 3 and are based on the most common functionalities of the literature in Section 1, we propose a mock-up of our framework. This mock-up described in Section 2. illustrates a use case, based on the well-known UCI pimas indians Remember that this framework is intended for users who have diabetes dataset (available on many platforms, as kaggle), since no prior knowledge of machine learning, but who have expertise familiarity with the dataset is beneficial to the understanding of in their own field (e.g. biologists, doctors, engineers...). These users this validation. produce data that they are required to analyze. It is therefore as- In our use case, a biologist is aiming to study the dataset of pimas sumed that they have a solid knowledge of this data, but not of indians diabetes, and uses our recommendation system to provide machine learning methods. possible analysis workflows. First, as described in 5, the user enters the diabetes dataset as input of the recommender system, and asks 5.1 Model selection via prediction explanation it to perform a recommendation. Workflows recommendation - First, a user produces data he wants to analyse. The data is given as input of the recommender system, 6.1 Helping a non expert user appropriate the along with their specifications for the analysis: the target feature results of the recommendation by himself and their preferences in term of results. The system then suggests Instead of recommending one of the workflows of the pareto front a set of possible workflows which are the most able to analyse the (see Section 3.4), the user is presented the four best recommen- user’s data. dations from the pareto front. A description of each workflow is Execution - Among this selection of possible workflows, the user made available to the user, allowing him to perform a first selec- can select all or a set of them. He can access a description of each tion among the different options. Although these descriptions are workflow and its inner working if desired, allowing him to perform necessarily technical, they are essential for a user to understand a first selection of the possible workflows. The workflows selected what is happening when each workflow is executed. The workflows are then executed and produce a set of predictive models. and their descriptions are depicted in Figure 5. As an example, we Model explanation - Using these models, the system can generate can see in the figure that a workflow is not only the production the classification of a given instance of the dataset and provide of a predictive model, but also successive operations of transfor- its afferent explanation for each model. These explanations take mation applied in the dataset. These workflows are then executed the form of attribute influences. For instance, in Figure 4, a user and presented to the user through a set of selected instances. These is informed that a particular patient is predicted to have diabetes instances are selected in a way that favours a large diversity in by both models 𝐴 and 𝐵, but that 𝐴 made this decision consider- predictions explanations. The exact algorithm used here is the one ing mostly the patient’s diet, while 𝐵 also considers his weight as presented in [22]. The user can thus explore each predictive model important. In order to allow the user to explore the models in an through this set of instances, by viewing a diverse set of keypoints, intuitive way, a set of 10 instances are recommended for his review. illustrating the models. This allows him to infer how the whole This selection of instances aims to provide the user with a set of pre- model works, with minimal information. The instances and their diction explanations as diverse as possible, without overwhelming attached prediction explanations are depicted as in Figure 6. On the him with a space too large to be explored efficiently by humans. left, the user can select the instance he wants to study, and decide Ferrettini et al. Figure 3: Building a predictive model accuracy of a model or the kappa score does not warn a user of an inappropriate attribute which should be removed from the dataset. In our mock-up example, the user can decide that the age of a patient is not that important in determining if he is likely to have diabetes. At the same time, if our user considers a patient’s mass as a valid indicator, it indicates him that the naive bayes model is more interesting in his case (supposing the instances he reviewed are consistent with this explanation). This understanding of a model, its strengths, and its flaws gives the user a stronger confidence Figure 4: Explaining a prediction in what is being accomplished during the data analysis process. By pinpointing eventual problems in the predictive model, he also becomes able to know when the model is reliable. to eventually remove attributes from the dataset. On the right is presented the prediction explanation of the selected instance for each of the models. In our use case, we can see the scientist selects 6.3 Personalising a model without requiring the instance 49. Automatically, an explanation is proposed where data analysis knowledge random forest and bagging J48 (an optimized decision tree) mod- Once the user have studied his models, he can assess which work- els mainly base their prediction on their blood pressure and age, flows fit best his requirements. In particular, the user can identify whereas the naive bayes is mostly influenced by the mass of the which features are mainly used by the workflows, and decide which instance. This allows an immediate access to the inner workings are important for his study. In our mock-up example, the biologist of each presented workflow, which is solely based on the domain might want to study the impact of less evident diabetes indicators, knowledge of the user. Thus, by presenting the results and how and decide to remove the insulin and plasma features from his they were obtained, the user is informed of the conclusion of the dataset (like presented in Figure 7). This forces the workflows to prediction, without having to rely blindly on the model. use the other features, and maybe highlight new important indica- Therefore, through prediction explanation, the user can access tors. We can see in Figure 7 that the J48 workflow has significantly to a new type of information that does not rely on expertise in data changed its behavior, while Adaboost model has simply adjusted analysis to be understood. He can understand and appropriate the the importance of each attribute. results of the recommendation system thanks to his own domain- By this process, the user accomplishes feature selection without based knowledge: without understanding the inner workings of having data analysis knowledge or expertise. His domain knowl- each model. He can visualize how each model uses the data to make edge allows him to assess the interest of a feature and decide if the predictions. workflows are using them well or not. 6.2 Giving a user confidence in the produced 7 CONCLUSION AND PERSPECTIVES results We have presented a framework that proposes a new way to as- Through this explanation method, the user can choose between sist a user in analyzing their data in two steps. In the first step, a models without having to rely solely on global measures of perfor- recommendation system provides possible analysis workflows and mance. He is able to use his own judgement rather than by the only predictive models, similar to what other users would have done proposal of a fully automated process. This also makes it possible in the past for a similar dataset. In the second step, the proposal to evaluate possible defects in the models, which is not always of a model explanation allows a domain expert user to study the possible with only conventional metrics. As an example, the global predictive models by himself. A framework for user assistance on predictive models Figure 5: Workflow recommendation Figure 6: Visualization of prediction results through prediction explanations We have shown this framework brings an answer to possible [3] Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query Rec- unresolved data analysis pitfalls, by a better understanding of data ommendation Using Query Logs in Search Engines. In Proceedings of the 2004 International Conference on Current Trends in Database Technology (Heraklion, analysis models and building the user’s confidence. Greece) (EDBT’04). Springer-Verlag, Berlin, Heidelberg, 588–596. However, this method still has to be tested in real-world situ- [4] S.W. Bennett and A.C. Scott. 1985. The Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project, chap. 19 - Specialized ations. A prototype is then being developed, in interaction with Explanations for Dosage Selection. Addison-Wesley Publishing Company (1985), biologists of the institute of Cardiovascular and Metabolic Diseases 363 – 370. (INSERM institute). A medium-term perspective is to form a co- [5] Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Köt- ter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel. hort of actual domain expert users to assess the efficiency of our 2007. KNIME: The Konstanz Information Miner. In Studies in Classification, Data framework and its capacity to assist them with real-world problems. Analysis, and Knowledge Organization (GfKL 2007). Springer. [6] G. Casalicchio, C. Molnar, and B. Bischl. 2018. Visualizing the Feature Importance for Black Box Models. arXiv e-prints (April 2018). arXiv:stat.ML/1804.06620 REFERENCES [7] Li Chen and Pearl Pu. 2005. Trust Building in Recommender Agents. In in [1] Gediminas Adomavicius and Alexander Tuzhilin. 2008. Context-aware Recom- 1st International Workshop on Web Personalization, Recommender Systems and mender Systems. In Proceedings of the 2008 ACM Conference on Recommender Intelligent User Interfaces (WPRSIUI05. 135–145. Systems (Lausanne, Switzerland) (RecSys ’08). ACM, New York, NY, USA, 335–336. [8] Jacob Cohen. 1968. Weighted kappa: Nominal scale agreement provision for [2] Julien Aligon, Enrico Gallinucci, Matteo Golfarelli, Patrick Marcel, and Stefano scaled disagreement or partial credit. Psychological bulletin 70, 4 (1968), 213. Rizzi. 2015. A collaborative filtering approach for recommending {OLAP} sessions. [9] Henriette S. M. Cramer, Vanessa Evers, Satyan Ramlal, Maarten van Someren, Decision Support Systems 69 (2015), 20 – 30. Lloyd Rutledge, Natalia Stash, Lora Aroyo, and Bob J. Wielinga. 2008. The effects of transparency on trust in and acceptance of a content-based art recommender. User Model. User-Adapt. Interact. 18, 5 (2008), 455–496. http://www.i2mc.inserm.fr/index.php/en/ Ferrettini et al. Figure 7: New prediction explanations once the attributes plasma and insulin have been removed [10] A. Datta, S. Sen, and Y. Zick. 2016. Algorithmic Transparency via Quantitative [24] Guy Shani, Lior Rokach, Bracha Shapira, Sarit Hadash, and Moran Tangi. 2013. Input Influence: Theory and Experiments with Learning Systems. In 2016 IEEE Investigating confidence displays for top-N recommendations. Journal of the Symposium on Security and Privacy (SP). 598–617. American Society for Information Science and Technology 64, 12 (2013), 2548–2563. [11] Janez Demšar, Tomaž Curk, Aleš Erjavec, Črt Gorup, Tomaž Hočevar, Mitar [25] L. S. Shapley. 1953. A value for n-person games. Contributions to the Theory of Milutinovič, Martin Možina, Matija Polajnar, Marko Toplak, Anže Starič, Miha Games 28 (1953), 307–317. Štajdohar, Lan Umek, Lan Žagar, Jure Žbontar, Marinka Žitnik, and Blaž Zupan. [26] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Im- 2013. Orange: Data Mining Toolbox in Python. Journal of Machine Learning portant Features Through Propagating Activation Differences. In Proceedings of Research 14 (2013), 2349–2353. http://jmlr.org/papers/v14/demsar13a.html the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, [12] Gabriel Ferrettini, Julien Aligon, and Chantal Soulé-Dupuy. 2020. Explaining Australia) (ICML’17). 3145–3153. Single Predictions: A Faster Method. In SOFSEM 2020: Theory and Practice of Com- [27] Quan Sun, Bernhard Pfahringer, and Michael Mayo. 2012. Full model selection in puter Science, Alexander Chatzigeorgiou, Riccardo Dondi, Herodotos Herodotou, the space of data mining operators. In Proceedings of the 14th annual conference Christos Kapoutsis, Yannis Manolopoulos, George A. Papadopoulos, and Florian companion on Genetic and evolutionary computation. ACM, 1503–1504. Sikora (Eds.). Springer International Publishing, Cham, 313–324. [28] Nava Tintarev and Judith Masthoff. 2015. Explaining Recommendations: Design [13] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Evaluation. Springer US, Boston, MA, 353–382. https://doi.org/10.1007/978- and Ian H Witten. 2009. The WEKA data mining software: an update. ACM 1-4899-7637-6_10 SIGKDD explorations newsletter 11, 1 (2009), 10–18. [29] Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: [14] Jianming He and Wesley W. Chu. 2010. A Social Network-Based Recommender Networked Science in Machine Learning. SIGKDD Explorations 15, 2 (2013), 49– System (SNRS). Springer US, Boston, MA, 47–74. 60. [15] Markus Hofmann and Ralf Klinkenberg. 2013. RapidMiner: Data Mining Use Cases [30] Sandra Wachter, Brent D. Mittelstadt, and Chris Russell. 2017. Counterfactual and Business Analytics Applications. Chapman & Hall/CRC. Explanations without Opening the Black Box: Automated Decisions and the [16] U. A. Piumi Ishanka and Takashi Yukawa. 2017. The Prefiltering Techniques GDPR. CoRR abs/1711.00399 (2017). arXiv:1711.00399 http://arxiv.org/abs/1711. in Emotion Based Place Recommendation Derived by User Reviews. Applied 00399 Computational Intelligence and Soft Computing vol. 2017 (2017), 10 pages. [31] Qi Wang, Jian Ma, Xiuwu Liao, and Wei Du. 2017. A context-aware researcher [17] Igor Kononenko and Ivan Bratko. 1991. Information-Based Evaluation Criterion recommendation system for university-industry collaboration on R&D projects. for Classifier’s Performance. Machine Learning 6, 1 (Jan. 1991), 67–80. Decision Support Systems 103, Supplement C (2017), 46 – 57. [18] Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model [32] Monika Zakova, Petr Kremen, Filip Zelezny, and Nada Lavrac. 2011. Automating Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, knowledge discovery workflow composition through ontology-based planning. U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett Automation Science and Engineering, IEEE Transactions on 8, 2 (2011), 253–264. (Eds.). Curran Associates, Inc., 4765–4774. http://papers.nips.cc/paper/7062-a- [33] Yong Zheng, Bamshad Mobasher, and Robin Burke. 2015. Similarity-Based unified-approach-to-interpreting-model-predictions.pdf Context-Aware Recommendation. Springer International Publishing, Cham, 431– [19] Phong Nguyen, Melanie Hilario, and Alexandros Kalousis. 2014. Using meta- 447. mining to support data mining workflow planning and optimization. Journal of Artificial Intelligence Research (2014), 605–644. [20] William Raynaut. 2018. Meta-analysis perspectives toward assistance in prediction and simulation. Theses. Université Paul Sabatier - Toulouse III. https://tel. archives-ouvertes.fr/tel-02023797 [21] William Raynaut, Chantal Soule-Dupuy, and Nathalie Valles-Parlangeau. 2016. Meta-Mining Evaluation Framework : A large scale proof of concept on Meta- Learning. In 29th Australasian Joint Conference on Artificial Intelligence (2016-12- 05). Springer, 215–228. [22] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). ACM, New York, NY, USA, 1135–1144. [23] Floarea Serban, Joaquin Vanschoren, Jörg-Uwe Kietz, and Abraham Bernstein. 2013. A Survey of Intelligent Assistants for Data Analysis. ACM Comput. Surv. 45, 3, Article 31 (July 2013), 35 pages. https://doi.org/10.1145/2480741.2480748