=Paper=
{{Paper
|id=Vol-3135/dataplat_short2
|storemode=property
|title=Towards Human-centric AutoML via Logic and Argumentation
|pdfUrl=https://ceur-ws.org/Vol-3135/dataplat_short2.pdf
|volume=Vol-3135
|authors=Joseph Giovanelli,Giuseppe Pisano
|dblpUrl=https://dblp.org/rec/conf/edbt/GiovanelliP22
}}
==Towards Human-centric AutoML via Logic and Argumentation==
Towards Human-centric AutoML via Logic and Argumentation Joseph Giovanelli1 , Giuseppe Pisano1 1 ALMA MATER STUDIORUM — Università di Bologna Abstract In the last decade, we have witnessed an exponential growth in both the complexity and the number of Machine Learning (ML) techniques. As a consequence, leveraging such methods to solve real-case problems has become difficult for a Data Scientist (DS). Automated Machine Learning (AutoML) tools were devised to alleviate that task, but easily became as complex as the ML techniques themselves. The DS has started to rely on this kind of tools without understanding their functioning, thus loosing the control over the process. In this vision paper, we propose HAMLET (Human-centric AutoMl via Logic and Argumentation), a framework that would help the DS to redeem her centrality. HAMLET is inspired to the well-known standard process model CRISP-DM. Iteration after iteration, the knowledge is augmented by acquiring more constraints about the problem until a suitable solution is found. HAMLET leverages Logic and Argumentation to merge both constraints and solutions in an uniformed human- and machine-readable medium. Not only it allows an easy exploration of the new knowledge at each iteration, but it also enforces a continuous revision via the AutoML tool and the confrontation between the DS and Domain Experts. Keywords AutoML, Logic, Argumentation, CRISP-DM, Data Scientist 1. Introduction important to the DS to leverage the knowledge about the problem, considering all the ML constraints. Otherwise, In relation to data platforms, it is well-known that Ma- it might lead the AutoML tool to retrieve invalid solu- chine Learning (ML) plays a key role in the process of tions (i.e., the result of those cannot be deemed correct). data analysis. As a matter of fact, it has been pervasively Besides, AutoML tools became that complex to make employed to cope with each and every type of real-case it difficult for the DS to understand their functioning, problems [1, 2, 3, 4]. The Data Scientist (DS) (i.e., a spe- hence losing the control over the process. Researchers cialist of data analysis) starts by collecting raw data in an are aware of these problems [6]. There are some works arbitrary format. Then she typically leverages a process that have prescribed to use a human-centric framework model that will help her to translate the knowledge about for AutoML [7, 8, 9], yet suggesting only design require- the problem into ML constraints, and deploy the solution. ments. Alternatively, the authors in [10] have proposed CRISP-DM [5] is the most acknowledged standard pro- a tool that visualises the best and the worst solutions cess model and we will take it as a reference in the whole retrieved by an AutoML tool. paper. A solution consists of a ML pipeline: a series of We claim that the need of a human-centric framework Data Pre-processing transformations and a ML algorithm. for AutoML is real, and it is crucial for the DS to augment The DS can instantiate both with a large set of techniques, her knowledge via the retrieved solutions. At this pur- which have their own tunable hyper-parameters. These pose we propose HAMLET (Human-centric AutoMl via choices highly affect the performance of a solution. Logic and Argumentation), which leverages Logic and Automated Machine Learning (AutoML) tools have Argumentation to: been devised with the aim of assisting the DS during the ML pipeline instantiation. They leverage state-of-the-art • structure the ML constraints and the AutoML so- optimisation approaches to smartly explore huge search lutions in a Logical Knowledge Base (LogicalKB); spaces of solutions. AutoML has been demonstrated to • parse the structured LogicalKB into a human- and provide accurate performance, even in a limited time bud- machine-readable medium called Problem Graph; get. During the setting up of the search space, it is highly • leverage the Problem Graph to set up an AutoML search space; Published in the Workshop Proceedings of the EDBT/ICDT 2022 Joint • leverage the Problem Graph to allow both the DS Conference (March 29-April 1, 2022), Edinburgh, UK and an AutoML tool to revise the current knowl- $ j.giovanelli@unibo.it (J. Giovanelli); g.pisano@unibo.it (G. Pisano) edge. 0000-0002-0990-3893 (J. Giovanelli); 0000-0003-0230-8212 (G. Pisano) Figure 1 illustrates how CRISP-DM, AutoML, and HAM- © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). LET interact with each other. We remark that our frame- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) work allows the DS to never loose the control over the process, and hence her centrality. Besides, HAMLET the Bayesian Optimisation (i.e., to boost the convergence allows to visualise the knowledge in an human- and process) by suggesting promising configurations (i.e., that machine-readable format. As advocated in [11], the DS worked well in previous similar real-case problems) [17]. requires to understand the AutoML process in order to Ensembling (i.e., construction of a high-performing so- trust the proposed solutions. lution combining several low-performing solutions; e.g., The remain of the paper is structured as follows. Sec- bagging, boosting, stacking) have been leveraged to en- tion 2 and Section 3 introduce the main notions of respec- able AutoML tools to retrieve a solution that combines the tively AutoML and Argumentation. Section 4 illustrates best performing configurations, instead of retrieving just our framework. Finally, Section 5 draws the conclusions the best performing one [16]. Moreover, multi-fidelity and potential leveraging. methods (i.e., the use of several partial estimations to boost the time-consuming evaluation process) have been exploited to let AutoML tools explore as many configura- 2. AutoML tions as possible. All in all, the improvements made over the last years AutoML tools have been conceived with the aim of light- have yielded to be so substantial that AutoML is nowa- ening the DS in the overwhelming practise of finding the days able to handle the entire ML pipeline instantiation. suitable solution for the case at hand. We recall that in Yet, the stacking of complex mechanisms on top of each the context of data platforms, a solution is a ML pipeline, other unavoidably led to a less understanding of the pro- defined as a series of Data Pre-processing transforma- cess by the DS. We believe that the DS has the duty to tions followed by a ML algorithm. In its early days, only revise and supervise the suggested solutions. Unfortu- the instantiation of the latter – the ML algorithm – was nately, state-of-art AutoML tools overlook her role, and addressed. Auto-Weka [12] formalised the problem as do not let that possible. Combined Algorithm Selection and Hyper-parameter Op- timisation (CASH). In a nutshell, in order to find the most performing configuration, various ML algorithms 3. Logic & Argumentation – and related hyper-parameters – have to be tested over a dataset. Such a problem was successfully coped by Logic can be defined as the abstract study of statements, leveraging Bayesian Optimisation (BO) [13], a sequen- sentences and deductive arguments [18]. From its birth, tial design strategy for global optimisation. The process it has been developed and improved widely and now involves several iterations, through which different con- includes a variety of formalisms and technologies. Be- figurations are explored. As the iterations advance, an tween all, Argumentation has proved itself an important increasingly accurate model is built on top of the previ- tool for handling conflicting information (e.g., opinions, ous explored configurations, with the aim of suggesting empirical data). This has led to a great number of re- the most promising ones. The configurations keep being searches trying to establish a computational model of explored, and updating the model, until a budget in terms logical arguments. of either iterations or time is reached. In Abstract Argumentation [19], a scenario can be rep- Recently, AutoML is no longer limited to optimise resented by a directed graph. Each node represents an just the ML algorithm phase, but it includes Data Pre- argument, and each edge denotes an attack by one argu- processing as well. Indeed, with the aid of a series of ment to another. Each argument is regarded as atomic. transformations, it is possible to achieve better perfor- There is no internal structure to an argument. Also, there mance, unattainable with the most performing ML algo- is no specification of what is an argument or an attack. rithm configuration [14]. In [15], the author formalised A graph can then be analysed to determine which argu- the problem as Data Pipeline Selection and Optimisation ments are acceptable according to some general criteria (DPSO). Each of the transformations can be instantiated (i.e., semantics) [20]. with different techniques, which – analogously to the ML A way to link Abstract Argumentation and logical for- algorithms – have their own hyper-parameters. Auto- malisms has been advanced in the field of Structured sklearn [16] includes Data Pre-processing already in its Argumentation [21], where we assume a formal logi- first versions. Yet, they fix the arrangement of the trans- cal language for representing knowledge (i.e., a Logical formations a priori, without considering that the most Knowledge Base), and specifying how arguments and performing arrangement changes according to the case conflicts (i.e., attacks) can be derived from that knowl- and data at hand. Considering several arrangements edge. In the structured approach, premises and claims translates into larger search spaces, not easy to explore. of the argument are made explicit, and the relationship In order to cope with ever larger research spaces, vari- between them is formally defined through rules inter- ous expedients have been employed. Meta-learning (i.e., nal to the formalism. We can build the notion of attack learning on top of learning) has been used to warm-start as a binary relation over structured arguments that de- Figure 1: Integration of the HAMLET framework with the CRISP-DM standard process model and AutoML. notes when one argument is in conflict with another main and Data Understanding might be repeated many (e.g., contradictory claims or premises). One of the main times, until the DS is satisfied by the acquired knowl- frameworks for Structured Argumentation is ASPIC+ [22]. edge. Once she feels confident, she begins to investi- In this formalism arguments are built with two kinds of gate different solutions throughout the next stages: Data inference rules: strict rules, whose premises guarantee Pre-processing, Modelling, and Evaluation. Data Pre- their conclusion, and defeasible rules, whose premises processing and Modelling are conducted to effectively only create a presumption in favour of their conclusion. build the solution, while Evaluation offers a way to mea- Then conflicts between arguments can arise from both sure the performance of it. Finally, the process concludes inconsistencies in the Logical Knowledge Base and the with the Deployment stage (i.e., the actual implementa- defeasibility of the reasoning steps in an argument (i.e., tion of the solution). a defeasible rule used in reaching a certain conclusion We recall that building a solution consists of instan- from a set of premises can also be attacked). tiating a ML pipeline: a series of transformations – In our view, once defined the right logical language defined in the Data Pre-processing stage – and a ML for encoding the DS and AutoML knowledge, a Struc- algorithm—defined in the Modelling stage. Seeking the tured Argumentation model (e.g., an ASPIC+ instance most correct and performing solution, the DS should con- [23]) would provide us with the formal machinery to sider the already known constraints – domain- and data- build an Argumentation framework upon the data, while related – and some new she discovers in the Data Pre- Abstract Argumentation would dispense the evaluation processing and Modelling, respectively: transformation- tools. and algorithm-related constraints (i.e., due to the intrinsic semantic of transformations and algorithms at hand). Throughout the different stages, the DS acquires 4. Towards a human-centric knowledge from different points of view (i.e., domain-, approach data-, transformation-, and algorithm-related). Besides, as illustrated in Figure 1, CRISP-DM might be iterated Addressing ML problems encompasses the DS seeking for many times. The several iterations of the process aim a solution, considering all the constraints of the case. She at augmenting such a knowledge about the problem. Fi- usually leverages a process model as CRISP-DM. The DS nally, the process is ruled by interactions between the starts by collecting raw data in an arbitrary format. Then, DS and Domain Experts, discussing and arguing on both in the first stage, Domain Understanding is conducted. constraints and solutions. The DS works in a close cooperation with Domain Ex- perts, and enlists domain-related constraints (i.e., intrinsic 4.1. AutoML and CRISP-DM of the problem). Follows Data Understanding, devoted to data analysis, and with the aim of extracting data- As described in Section 2, AutoML helps in finding a related constraints (i.e., defined by the data format). Do- suitable ML pipeline instantiation (i.e., automatisation of Data Pre-processing, Modelling, and Evaluation stages). the outcome of the AutoML tool in a uniform format. However, such an automatisation unavoidably leads to a As a result, it would be possible to use the DS knowl- less overall understanding (i.e., the knowledge about the edge as an input for the optimisation process—search problem cannot be properly augmented throughout the space definition. Then, this initial knowledge can be process). augmented with the possible solutions provided by an The definition of the search space has a huge impact AutoML tool. These possible solutions can be exploited on the correctness and performance of the solutions. The to derive new constraints (i.e., the awareness about the DS collects constraints to guarantee the correctness of problem increases). We see the augmented knowledge the solution, anticipating the effect of each of them, and as an awareness determined by an increased expertise finally defining the search space. on the correct constraints. The finding of such correct constraints leads to the finding of the correct solution—if EXAMPLE 1. Let us consider two transformations, exists. In other words, at each CRISP-DM iteration, the namely Discretisation (𝒟) and Normalisation (𝒩 ), knowledge is encoded into the AutoML tool, which pro- and a ML algorithm as Decision Tree (𝒟𝒯 ). Based on vides a feedback (i.e., augmented knowledge) in the same the implementation, a possible algorithm-related con- format. straint may be “require 𝒟 when applying 𝒟𝒯 ”. Ac- Logic could be the key element in defining a common cordingly, we consider a transformation-related con- structure (i.e., a uniformed human- and machine-readable straint “no 𝒩 in pipelines with 𝒟”. This leads to medium) on which the knowledge of both the DS and discard ML pipelines that contain 𝒟, 𝒩 , and 𝒟𝒯 : the AutoML tool can be combined fruitfully. In a way, · · · → 𝒩 → · · · → 𝒟 → · · · → 𝒟𝒯 our approach follows the steps of the well known logical based expert systems, of which it is possible to find a · · · → 𝒟 → · · · → 𝒩 → · · · → 𝒟𝒯 great number of successful examples [26]. In literature, it is also possible to find two well-known issues [27]: lack In real-case problems, consider all the possible effects of scalability and difficulties in the definition of a sound is overwhelming, and inconsistencies might occur. The knowledge base that encodes all the required pieces of problem exacerbates when it comes to cross-cutting is- information. Yet, we believe they do not affect our model. sues, such as those related to ethical and legal fields. For As to the former, the amount of the acquired knowledge instance, topics like racism and gender equality have to (i.e. the problem constraints) through CRISP-DM itera- be treated separately, otherwise they could lead to so- tions is not enough to label such a problem as a big data cial repercussions. As it is well-know, the authors of the problem, and hence scalability should not be an issue. As boston-house dataset [24] engineered a feature assum- to the latter, we believe that the analysis process would ing that racial self-segregation had a positive impact on only benefit from the clearness given by such a structured house prices. A way of addressing such an issue is to en- investigation. code some kind of ethical constraint (e.g., dropping that Logic would also provide the tools to cope with one of particular feature from the data). Furthermore, the ML the distinctive features of the knowledge we want to deal result is expected to be compliant to the laws of the in- with: the possible inconsistency. Indeed, the ML process volved countries. To the best of our knowledge there is no is the product of possible attempts, validated or refuted by attempt to properly treat such ML constraints, and hence a consequent evaluation. Hence, the mechanism used to ease the search space definition. Most of the tools are encode the knowledge is required to manage this constant not customisable (i.e., weak-constrained search spaces, revision process. This is the role of Argumentation—one e.g., Auto-Weka, [12] Auto-Sklearn [16]), and others are of the main approaches for dealing with inconsistent far too permissive (i.e., no assistance at all; e.g., Hyper- knowledge and defeasible reasoning. Opt [25]). AutoML is not clear enough to provide the DS with a feedback that would help to augment her knowl- 4.3. HAMLET edge about the problem. We claim that a human-centric framework should provide the mechanisms to: i) help In the last paragraphs we identified two main require- the DS to structure her knowledge about the problem ments for a human-centric framework (i.e., structure the in an effective search space; ii) augment the knowledgeDS knowledge in a well-defined AutoML search space, initially possessed by the DS with the one produced by and provide the solutions in accordance with the input the AutoML optimisation process. knowledge). We also introduced Computational Logic – Argumentation in particular – as the main tool in our investigation. Let us now delve into details of how these 4.2. The role of logic pieces converge in our framework. The two identified requisites share a common need: en- Figure 1 illustrates a scheme of HAMLET. The DS con- coding both the DS knowledge about the problem and ducts the stages from Domain & Data Understanding to Listing 1: Example of a LogicalKB using a logical formalism. t1 := > trans forma tion ( d i s c r e t i s a t i o n ) . t2 := > trans forma tion ( n o r m a l i s a t i o n ) . a1 : = > a l g o r i t h m ( d e c i s i o n _ t r e e ) . c1 := > m a n d a t o r y _ t r a n s f o r m a t i o n _ f o r _ a l g o r i t h m ( [ d i s c r e t i s a t i o n ] , d e c i s i o n _ t r e e ) . c2 := > i n v a l i d _ t r a n s f o r m a t i o n _ s e t ( [ n o r m a l i s a t i o n , d i s c r e t i s a t i o n ] ) . and Domain Experts to correct, revise, and supervise the process. Accordingly, possible inconsistencies – due to diverging constraints – can be verified by the DS using her knowledge. Once the knowledge has been accurately revised, an AutoML tool is leveraged to automatise the ML pipeline instantiation. Throughout the exploration, different solu- tions are tested, which contribute to augment the global knowledge about the problem. Accordingly, some of the originally encoded knowledge by the DS and Domain Experts might be refuted or found inconsistent. HAM- LET is designed to enable a transparent augmentation of the knowledge in the Problem Graph according to the newfound solutions. The updating procedure is the same as the one employed by the DS during the constraint encoding phase. Specifically, the AutoML solutions are Figure 2: Example of a Problem Graph. Green nodes are valid automatically transposed to our logical language in the arguments, red ones are refuted. form of new constraints, and then added to the Logi- calKB. Of course, a change in the LogicalKB translates in a change in the Problem Graph, allowing the DS and Data Pre-processing & Modelling, and thus gathers all Domain Experts to visualise and argue about it. The re- the constraints that represent the knowledge discovered vision of the Graph is the key element in the process of so far. The Logical Knowledge Base (LogicalKB) provides augmenting the knowledge: the DS and Domain Experts a vehicle to encode such constraints. In particular, the can consult each other and discuss how the new insights DS leverages an intuitive logical language, and enlists relate with their initial knowledge. Indeed, thanks to the the constraints one-by-one. In Section 3 we introduced nature of the Problem Graph, it would be extremely easy the notion of Structured Argumentation as a formal tool to identify new possible conflicts and supporting argu- to convert elements from a logical language into an Ar- ments. Consequently, new constraints can be derived. gumentation graph. Implementing and exploiting such EXAMPLE 2. In Example 1 we introduce two possi- a Structured Argumentation tool, HAMLET proceeds to ble ML constraints. We now provide their encoding in resolve conflicts in the LogicalKB: the logical-encoded the LogicalKB, and the resulting Problem Graph. For knowledge is transformed in a Problem Graph. the sake of clarity, we focus only on Discretisation (𝒟) The benefit of the Problem Graph is two-fold. First and Normalisation (𝒩 ) as transformations, and Deci- of all, it can be leveraged by both the DS and Domain sion Tree (𝒟𝒯 ) as the ML algorithm. Listing 1 con- Experts to understand and summarise the current knowl- tains the LogicalKB expressed in a logic language: t1 edge. Second of all, thanks to its nature, it is straightfor- and t2 represent 𝒟 and 𝒩 respectively, a1 represents ward to convert such a graph of constraints into a space 𝒟𝒯 . We consider the algorithms-related constraint of possible solutions (i.e., exploiting Argumentation se- c1, namely “require 𝒟 when applying 𝒟𝒯 ”, and the mantics, it is easy to obtain all the sets of arguments – trnasformation-related constraint c2, that is “no 𝒩 in constraints – which hold together). As a matter of fact, pipelines with 𝒟”. This LogicalKB is used to gener- this feature would relieve the DS of the burden of manu- ate the Problem Graph shown in Figure 2, nodes rep- ally considering all the effects of the possible constraints. resent arguments and edges represent attacks among It is important to notice that, although the increased de- them. There are five possible ML pipelines: 𝒟𝒯 (p1), gree of automatisation, the Problem Graph allows the DS 𝒟 → 𝒟𝒯 (p2), 𝒩 → 𝒟𝒯 (p3), 𝒟 → 𝒩 → 𝒟𝒯 (p4), 𝒩 → 𝒟 → 𝒟𝒯 (p5). With no constraints, available literature and similar real-case problems. we cannot discard any ML pipeline (i.e., there are no incompatibilities between the arguments). By intro- ducing c1, attacks against p1 and p3 are generated 5. Conclusions and potential (both pipelines contain 𝒟𝒯 but not 𝒟). By introduc- leveraging ing c2, attacks against p4 and p5 are generated (both pipelines contain 𝒟 and 𝒩 ). We can leverage a stan- The increasing complexity in the state-of-the-art AutoML dard argumentation semantics (e.g., Dung’s grounded tools has led the DS to lose the control over the resolution semantics [19]) to evaluate the graph. In our case, all process. We believe that human awareness about all the the arguments with no attacks are admissible. Among constraints and possible solutions of a ML problem is a them, we retrieve the ones representing pipelines. p2 fundamental aspect to consider, and consequently should is the only valid pipeline, and it will be used to gener- play a key role in the design of next-generation data ate the AutoML search space. platforms. Accordingly, in this vision paper we present Example 2 illustrates how HAMLET leverages Logic HAMLET, a human-centric AutoML framework based on and Argumentation to handle the DS knowledge. The Logic and Structured Argumentation. Logic is exploited proposed logic formalism allows to easily encode the dif- to give a structure to the knowledge that the DS has to ferent ML constraints into a LogicalKB. We highlight that consider while deploying a solution. The advantage of the Problem Graph generation is handled by an argumen- such a choice is twofold. First of all, the logical encoding tation engine, which is available in the Supplementary of the knowledge allows an easy exploration and verifi- Material 1 . The use of the Problem Graph allows to prune cation of all the constraints that may apply to the case at the considered ML pipeline for the AutoML search space. hand—it is overwhelming for the DS to correctly handle AutoML could update the Problem Graph by extracting the vast amount of them. Second of all, it provides a constraints from the performed exploration, and trans- medium that is both human- and machine- readable. The posing them into the LogicalKB. For instance, the DS may DS and Domain experts can revise the knowledge, as well not have considered that data at hand contain missing as the AutoML tool, thus creating a constant feedback cy- values. AutoML could help in identifying transformation- cle. We further remark that our framework could be able related constraints such as: “require Imputation (ℐ) in to address a wide range of AutoML-related challenges. all the pipelines”. The resulting constraints might be in We already highlighted a few of them: the embodiment conflict with the previous knowledge. In our vision, the of both ethical and legal constraints, and the construction DS is able to visualise such inconsistencies through the of a shared knowledge among the DS community. Problem Graph, and resolve them. The road for future expansions is straightforward: we We remark how our framework is compliant with the plan to extend this work providing a sound formalisation iterative nature of the CRISP-DM standard process model. of HAMLET, and then a working implementation. It will This aspect is crucial when trying to solve real-case prob- be then possible to effectively quantify the benefits of our lems through the use of modern data platforms. Indeed, framework and test its efficacy on real-case problems. not only the different CRISP-DM stages can be executed several times, but the whole process can be iterated, bring- ing new information about the problem. We claim that References our framework support and ease the adoption of the de- [1] L. Zhou, S. Pan, J. Wang, A. V. Vasilakos, Ma- scribed resolution process model, by providing a tool that chine learning on big data: Opportunities and is both human- and machine-readable. The knowledge challenges, Neurocomputing 237 (2017) 350–361. can be automatically handled throughout iterations, sup- doi:10.1016/j.neucom.2017.01.026. porting the DS in the whole analysis, in a continuous [2] P. Agrawal, R. Arya, A. Bindal, S. Bhatia, A. Gagneja, revision of the problem constraints. At each iteration, a J. Godlewski, Y. Low, T. Muss, M. M. Paliwal, S. Ra- portion of the knowledge is known and other is discov- man, V. Shah, B. Shen, L. Sugden, K. Zhao, M.-C. ered. Its integration into a unified augmented knowledge Wu, Data platform for machine learning, in: Pro- graph allows to: i) derive new constraints from the dis- ceedings of the 2019 International Conference on covered knowledge, ii) jgcseamlessly visualise possible Management of Data, SIGMOD ’19, Association for inconsistencies and conflicts. This naturally leads to a Computing Machinery, New York, NY, USA, 2019, new iteration based on the new augmented knowledge. p. 1803–1816. URL: https://doi.org/10.1145/3299869. Besides, the entire process might be boosted with the aid 3314050. doi:10.1145/3299869.3314050. of an external knowledge. In our vision, the DS commu- [3] M. Francia, E. Gallinucci, M. Golfarelli, A. G. Leoni, nity could create a shared LogicalKB derived from the S. Rizzi, N. Santolini, Making data platforms smarter with MOSES, Future Gener. Comput. 1 https://queueinc.github.io/HAMLET-DATAPLAT2022/ Syst. 125 (2021) 299–313. doi:10.1016/j.future. Workshop on Design, Optimization, Languages and 2021.06.031. Analytical Processing of Big Data (DOLAP), volume [4] C. Forresi, M. Francia, E. Gallinucci, M. Golfarelli, 2840 of CEUR Workshop Proceedings, CEUR-WS.org, Optimizing execution plans in a multistore, in: 2021, pp. 1–10. L. Bellatreche, M. Dumas, P. Karras, R. Matule- [15] A. Quemy, Data pipeline selection and optimiza- vičius (Eds.), Advances in Databases and Informa- tion., in: DOLAP, 2019. tion Systems, Springer International Publishing, [16] M. Feurer, A. Klein, K. Eggensperger, J. T. Springen- Cham, 2021, pp. 136–151. berg, M. Blum, F. Hutter, Auto-sklearn: efficient [5] R. Wirth, J. Hipp, Crisp-dm: Towards a standard and robust automated machine learning, in: Auto- process model for data mining, in: Proceedings of mated Machine Learning, Springer, Cham, 2019, pp. the 4th international conference on the practical ap- 113–134. plications of knowledge discovery and data mining, [17] J. Giovanelli, B. Bilalli, A. Abelló, Data pre- volume 1, Springer-Verlag London, UK, 2000. processing pipeline generation for autoetl, Infor- [6] D. Xin, E. Y. Wu, D. J. L. Lee, N. Salehi, A. G. mation Systems (2021) 101957. Parameswaran, Whither automl? understanding [18] L. C. Paulson, Computational logic: its origins and the role of automation in machine learning work- applications, Proceedings of the Royal Society A: flows, in: CHI ’21: CHI Conference on Human Mathematical, Physical and Engineering Sciences Factors in Computing Systems, ACM, 2021, pp. 83:1– 474 (2018). doi:10.1098/rspa.2017.0872. 83:16. doi:10.1145/3411764.3445306. [19] P. M. Dung, On the acceptability of arguments and [7] Y. Gil, J. Honaker, S. Gupta, Y. Ma, V. D’Orazio, its fundamental role in nonmonotonic reasoning, D. Garijo, S. Gadewar, Q. Yang, N. Jahanshad, To- logic programming and n-person games, Artifi- wards human-guided machine learning, in: Pro- cial Intelligence 77 (1995) 321–358. doi:10.1016/ ceedings of the 24th International Conference on 0004-3702(94)00041-X. Intelligent User Interfaces, 2019, pp. 614–624. [20] P. Baroni, M. Caminada, M. Giacomin, An introduc- [8] D. J.-L. Lee, S. Macke, A human-in-the-loop per- tion to argumentation semantics, Knowledge Engi- spective on automl: Milestones and the road ahead, neering Review 26 (2011) 365–410. doi:10.1017/ IEEE Data Engineering Bulletin (2020). S0269888911000166. [9] D. Wang, J. D. Weisz, M. Muller, P. Ram, W. Geyer, [21] P. Besnard, A. J. García, A. Hunter, S. Modgil, C. Dugan, Y. Tausczik, H. Samulowitz, A. Gray, H. Prakken, G. R. Simari, F. Toni, Introduction to Human-ai collaboration in data science: Exploring structured argumentation, Argument & Computa- data scientists’ perceptions of automated ai, Pro- tion 5 (2014) 1–4. doi:10.1080/19462166.2013. ceedings of the ACM on Human-Computer Interac- 869764. tion 3 (2019) 1–24. [22] S. Modgil, H. Prakken, The ASPIC + framework [10] J. P. Ono, S. Castelo, R. Lopez, E. Bertini, J. Freire, for structured argumentation: a tutorial, Argu- C. T. Silva, Pipelineprofiler: A visual analytics tool ment & Computation 5 (2014) 31–62. doi:10.1080/ for the exploration of automl pipelines, IEEE Trans- 19462166.2013.869766. actions on Visualization and Computer Graphics [23] R. Calegari, G. Pisano, A. Omicini, G. Sartor, Arg2P: 27 (2021) 390–400. An argumentation framework for explainable intel- [11] J. Drozdal, J. Weisz, D. Wang, G. Dass, B. Yao, ligent systems, Journal of Logic and Computation C. Zhao, M. Muller, L. Ju, H. Su, Trust in automl: (2021). doi:10.1093/logcom/exab089. exploring information needs for establishing trust [24] D. Harrison, D. Rubinfeld, Hedonic housing prices in automated machine learning systems, in: Pro- and the demand for clean air, Journal of Environ- ceedings of the 25th International Conference on mental Economics and Management 5 (1978) 81– Intelligent User Interfaces, 2020, pp. 297–307. 102. doi:10.1016/0095-0696(78)90006-2. [12] L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, [25] J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, D. D. K. Leyton-Brown, Auto-weka: Automatic model Cox, Hyperopt: a python library for model selection selection and hyperparameter optimization in weka, and hyperparameter optimization, Computational in: Automated Machine Learning, Springer, Cham, Science & Discovery 8 (2015) 014008. 2019, pp. 81–95. [26] H. Tan, A brief history and technical review of [13] P. I. Frazier, A tutorial on bayesian optimization, the expert system research, IOP Conference Se- CoRR abs/1807.02811 (2018). URL: http://arxiv.org/ ries: Materials Science and Engineering 242 (2017). abs/1807.02811. arXiv:1807.02811. doi:10.1088/1757-899X/242/1/012111. [14] J. Giovanelli, B. Bilalli, A. Abelló, Effective data [27] P. K. Coats, Why expert systems fail, Financial pre-processing for automl, in: K. Stefanidis, P. Mar- Management 17 (1988) 77–86. URL: http://www. cel (Eds.), Proceedings of the 23rd International jstor.org/stable/3666074.