<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Syst.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1016/j.future</article-id>
      <title-group>
        <article-title>Towards Human-centric AutoML via Logic and Argumentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joseph Giovanelli</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Pisano</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ALMA MATER STUDIORUM - Università di Bologna</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>125</volume>
      <issue>2021</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In the last decade, we have witnessed an exponential growth in both the complexity and the number of Machine Learning (ML) techniques. As a consequence, leveraging such methods to solve real-case problems has become dificult for a Data Scientist (DS). Automated Machine Learning (AutoML) tools were devised to alleviate that task, but easily became as complex as the ML techniques themselves. The DS has started to rely on this kind of tools without understanding their functioning, thus loosing the control over the process. In this vision paper, we propose HAMLET (Human-centric AutoMl via Logic and Argumentation), a framework that would help the DS to redeem her centrality. HAMLET is inspired to the well-known standard process model CRISP-DM. Iteration after iteration, the knowledge is augmented by acquiring more constraints about the problem until a suitable solution is found. HAMLET leverages Logic and Argumentation to merge both constraints and solutions in an uniformed human- and machine-readable medium. Not only it allows an easy exploration of the new knowledge at each iteration, but it also enforces a continuous revision via the AutoML tool and the confrontation between the DS and Domain Experts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AutoML</kwd>
        <kwd>Logic</kwd>
        <kwd>Argumentation</kwd>
        <kwd>CRISP-DM</kwd>
        <kwd>Data Scientist</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        important to the DS to leverage the knowledge about the
problem, considering all the ML constraints. Otherwise,
In relation to data platforms, it is well-known that Ma- it might lead the AutoML tool to retrieve invalid
soluchine Learning (ML) plays a key role in the process of tions (i.e., the result of those cannot be deemed correct).
data analysis. As a matter of fact, it has been pervasively Besides, AutoML tools became that complex to make
employed to cope with each and every type of real-case it dificult for the DS to understand their functioning,
problems [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3, 4</xref>
        ]. The Data Scientist (DS) (i.e., a spe- hence losing the control over the process. Researchers
cialist of data analysis) starts by collecting raw data in an are aware of these problems [6]. There are some works
arbitrary format. Then she typically leverages a process that have prescribed to use a human-centric framework
model that will help her to translate the knowledge about for AutoML [7, 8, 9], yet suggesting only design
requirethe problem into ML constraints, and deploy the solution. ments. Alternatively, the authors in [10] have proposed
CRISP-DM [5] is the most acknowledged standard pro- a tool that visualises the best and the worst solutions
cess model and we will take it as a reference in the whole retrieved by an AutoML tool.
paper. A solution consists of a ML pipeline: a series of We claim that the need of a human-centric framework
Data Pre-processing transformations and a ML algorithm. for AutoML is real, and it is crucial for the DS to augment
The DS can instantiate both with a large set of techniques, her knowledge via the retrieved solutions. At this
purwhich have their own tunable hyper-parameters. These pose we propose HAMLET (Human-centric AutoMl via
choices highly afect the performance of a solution. Logic and Argumentation), which leverages Logic and
      </p>
      <p>Automated Machine Learning (AutoML) tools have Argumentation to:
been devised with the aim of assisting the DS during the
ML pipeline instantiation. They leverage state-of-the-art
optimisation approaches to smartly explore huge search
spaces of solutions. AutoML has been demonstrated to
provide accurate performance, even in a limited time
budget. During the setting up of the search space, it is highly
• structure the ML constraints and the AutoML
so</p>
      <p>lutions in a Logical Knowledge Base (LogicalKB);
• parse the structured LogicalKB into a human- and</p>
      <p>machine-readable medium called Problem Graph;
• leverage the Problem Graph to set up an AutoML</p>
      <p>search space;
• leverage the Problem Graph to allow both the DS
and an AutoML tool to revise the current
knowledge.
Figure 1 illustrates how CRISP-DM, AutoML, and
HAMLET interact with each other. We remark that our
framework allows the DS to never loose the control over the
process, and hence her centrality. Besides, HAMLET
allows to visualise the knowledge in an human- and
machine-readable format. As advocated in [11], the DS
requires to understand the AutoML process in order to
trust the proposed solutions.</p>
      <p>The remain of the paper is structured as follows.
Section 2 and Section 3 introduce the main notions of
respectively AutoML and Argumentation. Section 4 illustrates
our framework. Finally, Section 5 draws the conclusions
and potential leveraging.</p>
    </sec>
    <sec id="sec-2">
      <title>2. AutoML</title>
      <p>the Bayesian Optimisation (i.e., to boost the convergence
process) by suggesting promising configurations (i.e., that
worked well in previous similar real-case problems) [17].</p>
      <p>Ensembling (i.e., construction of a high-performing
solution combining several low-performing solutions; e.g.,
bagging, boosting, stacking) have been leveraged to
enable AutoML tools to retrieve a solution that combines the
best performing configurations, instead of retrieving just
the best performing one [16]. Moreover, multi-fidelity
methods (i.e., the use of several partial estimations to
boost the time-consuming evaluation process) have been
exploited to let AutoML tools explore as many
configurations as possible.</p>
      <p>All in all, the improvements made over the last years
have yielded to be so substantial that AutoML is
nowadays able to handle the entire ML pipeline instantiation.</p>
      <p>Yet, the stacking of complex mechanisms on top of each
other unavoidably led to a less understanding of the
process by the DS. We believe that the DS has the duty to
revise and supervise the suggested solutions.
Unfortunately, state-of-art AutoML tools overlook her role, and
do not let that possible.</p>
      <p>AutoML tools have been conceived with the aim of
lightening the DS in the overwhelming practise of finding the
suitable solution for the case at hand. We recall that in
the context of data platforms, a solution is a ML pipeline,
defined as a series of Data Pre-processing
transformations followed by a ML algorithm. In its early days, only
the instantiation of the latter – the ML algorithm – was
addressed. Auto-Weka [12] formalised the problem as
Combined Algorithm Selection and Hyper-parameter
Optimisation (CASH). In a nutshell, in order to find the
most performing configuration, various ML algorithms 3. Logic &amp; Argumentation
– and related hyper-parameters – have to be tested over
a dataset. Such a problem was successfully coped by Logic can be defined as the abstract study of statements,
leveraging Bayesian Optimisation (BO) [13], a sequen- sentences and deductive arguments [18]. From its birth,
tial design strategy for global optimisation. The process it has been developed and improved widely and now
involves several iterations, through which diferent con- includes a variety of formalisms and technologies.
Beifgurations are explored. As the iterations advance, an tween all, Argumentation has proved itself an important
increasingly accurate model is built on top of the previ- tool for handling conflicting information (e.g., opinions,
ous explored configurations, with the aim of suggesting empirical data). This has led to a great number of
rethe most promising ones. The configurations keep being searches trying to establish a computational model of
explored, and updating the model, until a budget in terms logical arguments.
of either iterations or time is reached. In Abstract Argumentation [19], a scenario can be
rep</p>
      <p>Recently, AutoML is no longer limited to optimise resented by a directed graph. Each node represents an
just the ML algorithm phase, but it includes Data Pre- argument, and each edge denotes an attack by one
arguprocessing as well. Indeed, with the aid of a series of ment to another. Each argument is regarded as atomic.
transformations, it is possible to achieve better perfor- There is no internal structure to an argument. Also, there
mance, unattainable with the most performing ML algo- is no specification of what is an argument or an attack.
rithm configuration [ 14]. In [15], the author formalised A graph can then be analysed to determine which
arguthe problem as Data Pipeline Selection and Optimisation ments are acceptable according to some general criteria
(DPSO). Each of the transformations can be instantiated (i.e., semantics) [20].
with diferent techniques, which – analogously to the ML A way to link Abstract Argumentation and logical
foralgorithms – have their own hyper-parameters. Auto- malisms has been advanced in the field of Structured
sklearn [16] includes Data Pre-processing already in its Argumentation [21], where we assume a formal
logiifrst versions. Yet, they fix the arrangement of the trans- cal language for representing knowledge (i.e., a Logical
formations a priori, without considering that the most Knowledge Base), and specifying how arguments and
performing arrangement changes according to the case conflicts (i.e., attacks) can be derived from that
knowland data at hand. Considering several arrangements edge. In the structured approach, premises and claims
translates into larger search spaces, not easy to explore. of the argument are made explicit, and the relationship</p>
      <p>In order to cope with ever larger research spaces, vari- between them is formally defined through rules
interous expedients have been employed. Meta-learning (i.e., nal to the formalism. We can build the notion of attack
learning on top of learning) has been used to warm-start as a binary relation over structured arguments that
denotes when one argument is in conflict with another main and Data Understanding might be repeated many
(e.g., contradictory claims or premises). One of the main times, until the DS is satisfied by the acquired
knowlframeworks for Structured Argumentation is ASPIC+[22]. edge. Once she feels confident, she begins to
investiIn this formalism arguments are built with two kinds of gate diferent solutions throughout the next stages: Data
inference rules: strict rules, whose premises guarantee Pre-processing, Modelling, and Evaluation. Data
Pretheir conclusion, and defeasible rules, whose premises processing and Modelling are conducted to efectively
only create a presumption in favour of their conclusion. build the solution, while Evaluation ofers a way to
meaThen conflicts between arguments can arise from both sure the performance of it. Finally, the process concludes
inconsistencies in the Logical Knowledge Base and the with the Deployment stage (i.e., the actual
implementadefeasibility of the reasoning steps in an argument (i.e., tion of the solution).
a defeasible rule used in reaching a certain conclusion We recall that building a solution consists of
instanfrom a set of premises can also be attacked). tiating a ML pipeline: a series of transformations –</p>
      <p>In our view, once defined the right logical language defined in the Data Pre-processing stage – and a ML
for encoding the DS and AutoML knowledge, a Struc- algorithm—defined in the Modelling stage. Seeking the
tured Argumentation model (e.g., an ASPIC+ instance most correct and performing solution, the DS should
con[23]) would provide us with the formal machinery to sider the already known constraints – domain- and
databuild an Argumentation framework upon the data, while related – and some new she discovers in the Data
PreAbstract Argumentation would dispense the evaluation processing and Modelling, respectively:
transformationtools. and algorithm-related constraints (i.e., due to the intrinsic
semantic of transformations and algorithms at hand).</p>
      <p>Throughout the diferent stages, the DS acquires
4. Towards a human-centric knowledge from diferent points of view (i.e., domain-,
approach data-, transformation-, and algorithm-related). Besides,
as illustrated in Figure 1, CRISP-DM might be iterated
many times. The several iterations of the process aim
at augmenting such a knowledge about the problem.
Finally, the process is ruled by interactions between the
DS and Domain Experts, discussing and arguing on both
constraints and solutions.</p>
      <sec id="sec-2-1">
        <title>Addressing ML problems encompasses the DS seeking for</title>
        <p>a solution, considering all the constraints of the case. She
usually leverages a process model as CRISP-DM. The DS
starts by collecting raw data in an arbitrary format. Then,
in the first stage, Domain Understanding is conducted.
The DS works in a close cooperation with Domain
Experts, and enlists domain-related constraints (i.e., intrinsic
of the problem). Follows Data Understanding, devoted
to data analysis, and with the aim of extracting
datarelated constraints (i.e., defined by the data format).
Do</p>
        <sec id="sec-2-1-1">
          <title>4.1. AutoML and CRISP-DM</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>As described in Section 2, AutoML helps in finding a</title>
        <p>suitable ML pipeline instantiation (i.e., automatisation of
Data Pre-processing, Modelling, and Evaluation stages). the outcome of the AutoML tool in a uniform format.
However, such an automatisation unavoidably leads to a As a result, it would be possible to use the DS
knowlless overall understanding (i.e., the knowledge about the edge as an input for the optimisation process—search
problem cannot be properly augmented throughout the space definition. Then, this initial knowledge can be
process). augmented with the possible solutions provided by an</p>
        <p>The definition of the search space has a huge impact AutoML tool. These possible solutions can be exploited
on the correctness and performance of the solutions. The to derive new constraints (i.e., the awareness about the
DS collects constraints to guarantee the correctness of problem increases). We see the augmented knowledge
the solution, anticipating the efect of each of them, and as an awareness determined by an increased expertise
ifnally defining the search space. on the correct constraints. The finding of such correct
EXAMPLE 1. Let us consider two transformations, ceoxnissttsr.aIinntostlheeardswtoortdhse, afintdeinagchofCtRhIeScPo-rDreMctisteorluattiioonn—,tihf e
namely Discretisation () and Normalisation ( ), knowledge is encoded into the AutoML tool, which
protahned iamMplLemalegnotraitthiomn,aas Dpoescsiisbi olen aTlgreoeri(thm-)r.elaBtaesdedcoonn- vfoidrmesaat.feedback (i.e., augmented knowledge) in the same
sctorradiinntglmy,aywebeco“nresiqdueirreatrawnhsefonr mapaptiloyinn-grelated”. cAonc-- strLuoctguicrec(oi.uel.d,abuentihfoerkmeeydehleummeannt-iannddemfiniancghianec-ormeamdaobnle
straint “no  in pipelines with ”. This leads to medium) on which the knowledge of both the DS and
discard ML pipelines that contain ,  , and  : the AutoML tool can be combined fruitfully. In a way,
our approach follows the steps of the well known logical
· · · →  → · · · →  → · · · →  based expert systems, of which it is possible to find a
· · · →  → · · · →  → · · · →  great number of successful examples [26]. In literature, it
is also possible to find two well-known issues [ 27]: lack</p>
        <p>In real-case problems, consider all the possible efects of scalability and dificulties in the definition of a sound
is overwhelming, and inconsistencies might occur. The knowledge base that encodes all the required pieces of
problem exacerbates when it comes to cross-cutting is- information. Yet, we believe they do not afect our model.
sues, such as those related to ethical and legal fields. For As to the former, the amount of the acquired knowledge
instance, topics like racism and gender equality have to (i.e. the problem constraints) through CRISP-DM
iterabe treated separately, otherwise they could lead to so- tions is not enough to label such a problem as a big data
cial repercussions. As it is well-know, the authors of the problem, and hence scalability should not be an issue. As
boston-house dataset [24] engineered a feature assum- to the latter, we believe that the analysis process would
ing that racial self-segregation had a positive impact on only benefit from the clearness given by such a structured
house prices. A way of addressing such an issue is to en- investigation.
code some kind of ethical constraint (e.g., dropping that Logic would also provide the tools to cope with one of
particular feature from the data). Furthermore, the ML the distinctive features of the knowledge we want to deal
result is expected to be compliant to the laws of the in- with: the possible inconsistency. Indeed, the ML process
volved countries. To the best of our knowledge there is no is the product of possible attempts, validated or refuted by
attempt to properly treat such ML constraints, and hence a consequent evaluation. Hence, the mechanism used to
ease the search space definition. Most of the tools are encode the knowledge is required to manage this constant
not customisable (i.e., weak-constrained search spaces, revision process. This is the role of Argumentation—one
e.g., Auto-Weka, [12] Auto-Sklearn [16]), and others are of the main approaches for dealing with inconsistent
far too permissive (i.e., no assistance at all; e.g., Hyper- knowledge and defeasible reasoning.
Opt [25]). AutoML is not clear enough to provide the DS
with a feedback that would help to augment her knowl- 4.3. HAMLET
edge about the problem. We claim that a human-centric
framework should provide the mechanisms to: i) help
the DS to structure her knowledge about the problem
in an efective search space; ii) augment the knowledge
initially possessed by the DS with the one produced by
the AutoML optimisation process.</p>
      </sec>
      <sec id="sec-2-3">
        <title>In the last paragraphs we identified two main require</title>
        <p>ments for a human-centric framework (i.e., structure the
DS knowledge in a well-defined AutoML search space,
and provide the solutions in accordance with the input
knowledge). We also introduced Computational Logic
– Argumentation in particular – as the main tool in our
investigation. Let us now delve into details of how these
pieces converge in our framework.</p>
        <p>Figure 1 illustrates a scheme of HAMLET. The DS
conducts the stages from Domain &amp; Data Understanding to</p>
        <sec id="sec-2-3-1">
          <title>4.2. The role of logic</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>The two identified requisites share a common need: encoding both the DS knowledge about the problem and</title>
        <p>Listing 1: Example of a LogicalKB using a logical formalism.
t 1 : = &gt; t r a n s f o r m a t i o n ( d i s c r e t i s a t i o n ) .
t 2 : = &gt; t r a n s f o r m a t i o n ( n o r m a l i s a t i o n ) .
a1 : = &gt; a l g o r i t h m ( d e c i s i o n _ t r e e ) .
c1 : = &gt; m a n d a t o r y _ t r a n s f o r m a t i o n _ f o r _ a l g o r i t h m ( [ d i s c r e t i s a t i o n ] , d e c i s i o n _ t r e e ) .
c2 : = &gt; i n v a l i d _ t r a n s f o r m a t i o n _ s e t ( [ n o r m a l i s a t i o n , d i s c r e t i s a t i o n ] ) .
and Domain Experts to correct, revise, and supervise the
process. Accordingly, possible inconsistencies – due to
diverging constraints – can be verified by the DS using
her knowledge.</p>
        <p>Once the knowledge has been accurately revised, an
AutoML tool is leveraged to automatise the ML pipeline
instantiation. Throughout the exploration, diferent
solutions are tested, which contribute to augment the global
knowledge about the problem. Accordingly, some of the
originally encoded knowledge by the DS and Domain
Experts might be refuted or found inconsistent.
HAMLET is designed to enable a transparent augmentation
of the knowledge in the Problem Graph according to the
newfound solutions. The updating procedure is the same
as the one employed by the DS during the constraint
encoding phase. Specifically, the AutoML solutions are
Figure 2: Example of a Problem Graph. Green nodes are valid automatically transposed to our logical language in the
arguments, red ones are refuted. form of new constraints, and then added to the
LogicalKB. Of course, a change in the LogicalKB translates
in a change in the Problem Graph, allowing the DS and
Data Pre-processing &amp; Modelling, and thus gathers all Domain Experts to visualise and argue about it. The
rethe constraints that represent the knowledge discovered vision of the Graph is the key element in the process of
so far. The Logical Knowledge Base (LogicalKB) provides augmenting the knowledge: the DS and Domain Experts
a vehicle to encode such constraints. In particular, the can consult each other and discuss how the new insights
DS leverages an intuitive logical language, and enlists relate with their initial knowledge. Indeed, thanks to the
the constraints one-by-one. In Section 3 we introduced nature of the Problem Graph, it would be extremely easy
the notion of Structured Argumentation as a formal tool to identify new possible conflicts and supporting
arguto convert elements from a logical language into an Ar- ments. Consequently, new constraints can be derived.
gumentation graph. Implementing and exploiting such EXAMPLE 2. In Example 1 we introduce two
possia Structured Argumentation tool, HAMLET proceeds to ble ML constraints. We now provide their encoding in
resolve conflicts in the LogicalKB: the logical-encoded the LogicalKB, and the resulting Problem Graph. For
knowledge is transformed in a Problem Graph. the sake of clarity, we focus only on Discretisation ()</p>
        <p>The benefit of the Problem Graph is two-fold. First and Normalisation ( ) as transformations, and
Deciof all, it can be leveraged by both the DS and Domain sion Tree ( ) as the ML algorithm. Listing 1
conExperts to understand and summarise the current knowl- tains the LogicalKB expressed in a logic language: t1
edge. Second of all, thanks to its nature, it is straightfor- and t2 represent  and  respectively, a1 represents
wofaprdostsoibcloensvoelruttsiounchs (ai.eg.r,aepxhploofitcionngstArarginutmsiennttoataiospnascee- c1, .naWmeelyco“nresqidueirrethe walhgeonritahpmplsy-irneglated ”c,onanstdratihnet
mantics, it is easy to obtain all the sets of arguments – trnasformation-related constraint c2, that is “no  in
constraints – which hold together). As a matter of fact, pipelines with ”. This LogicalKB is used to
generthis feature would relieve the DS of the burden of manu- ate the Problem Graph shown in Figure 2, nodes
repally considering all the efects of the possible constraints. resent arguments and edges represent attacks among
It is important to notice that, although the increased de- them. There are five possible ML pipelines:  (p1),
gree of automatisation, the Problem Graph allows the DS
 →  (p2),  →  (p3),  →  → 
(p4),  →  →  (p5). With no constraints, available literature and similar real-case problems.
we cannot discard any ML pipeline (i.e., there are no
incompatibilities between the arguments). By
introducing c1, attacks against p1 and p3 are generated 5. Conclusions and potential
(both pipelines contain  but not ). By introduc- leveraging
ing c2, attacks against p4 and p5 are generated (both
pipelines contain  and  ). We can leverage a stan- The increasing complexity in the state-of-the-art AutoML
dard argumentation semantics (e.g., Dung’s grounded tools has led the DS to lose the control over the resolution
semantics [19]) to evaluate the graph. In our case, all process. We believe that human awareness about all the
the arguments with no attacks are admissible. Among constraints and possible solutions of a ML problem is a
them, we retrieve the ones representing pipelines. p2 fundamental aspect to consider, and consequently should
is the only valid pipeline, and it will be used to gener- play a key role in the design of next-generation data
ate the AutoML search space. platforms. Accordingly, in this vision paper we present</p>
        <p>Example 2 illustrates how HAMLET leverages Logic HAMLET, a human-centric AutoML framework based on
and Argumentation to handle the DS knowledge. The Logic and Structured Argumentation. Logic is exploited
proposed logic formalism allows to easily encode the dif- to give a structure to the knowledge that the DS has to
ferent ML constraints into a LogicalKB. We highlight that consider while deploying a solution. The advantage of
the Problem Graph generation is handled by an argumen- such a choice is twofold. First of all, the logical encoding
tation engine, which is available in the Supplementary of the knowledge allows an easy exploration and
verifiMaterial 1. The use of the Problem Graph allows to prune cation of all the constraints that may apply to the case at
the considered ML pipeline for the AutoML search space. hand—it is overwhelming for the DS to correctly handle
AutoML could update the Problem Graph by extracting the vast amount of them. Second of all, it provides a
constraints from the performed exploration, and trans- medium that is both human- and machine- readable. The
posing them into the LogicalKB. For instance, the DS may DS and Domain experts can revise the knowledge, as well
not have considered that data at hand contain missing as the AutoML tool, thus creating a constant feedback
cyvalues. AutoML could help in identifying transformation- cle. We further remark that our framework could be able
related constraints such as: “require Imputation (ℐ) in to address a wide range of AutoML-related challenges.
all the pipelines”. The resulting constraints might be in We already highlighted a few of them: the embodiment
conflict with the previous knowledge. In our vision, the of both ethical and legal constraints, and the construction
DS is able to visualise such inconsistencies through the of a shared knowledge among the DS community.
Problem Graph, and resolve them. The road for future expansions is straightforward: we</p>
        <p>We remark how our framework is compliant with the plan to extend this work providing a sound formalisation
iterative nature of the CRISP-DM standard process model. of HAMLET, and then a working implementation. It will
This aspect is crucial when trying to solve real-case prob- be then possible to efectively quantify the benefits of our
lems through the use of modern data platforms. Indeed, framework and test its eficacy on real-case problems.
not only the diferent CRISP-DM stages can be executed
several times, but the whole process can be iterated,
bringing new information about the problem. We claim that References
our framework support and ease the adoption of the
described resolution process model, by providing a tool that
is both human- and machine-readable. The knowledge
can be automatically handled throughout iterations,
supporting the DS in the whole analysis, in a continuous
revision of the problem constraints. At each iteration, a
portion of the knowledge is known and other is
discovered. Its integration into a unified augmented knowledge
graph allows to: i) derive new constraints from the
discovered knowledge, ii) jgcseamlessly visualise possible
inconsistencies and conflicts. This naturally leads to a
new iteration based on the new augmented knowledge.</p>
        <p>Besides, the entire process might be boosted with the aid
of an external knowledge. In our vision, the DS
community could create a shared LogicalKB derived from the
1https://queueinc.github.io/HAMLET-DATAPLAT2022/</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Vasilakos</surname>
          </string-name>
          ,
          <article-title>Machine learning on big data: Opportunities and challenges</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>237</volume>
          (
          <year>2017</year>
          )
          <fpage>350</fpage>
          -
          <lpage>361</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.neucom.
          <year>2017</year>
          .
          <volume>01</volume>
          .026.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bindal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gagneja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Godlewski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Muss</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Paliwal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Raman</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Sugden</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Wu</surname>
          </string-name>
          ,
          <article-title>Data platform for machine learning</article-title>
          ,
          <source>in: Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>1803</fpage>
          -
          <lpage>1816</lpage>
          . URL: https://doi.org/10.1145/3299869. 3314050. doi:
          <volume>10</volume>
          .1145/3299869.3314050.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Francia</surname>
          </string-name>
          , E. Gallinucci,
          <string-name>
            <given-names>M.</given-names>
            <surname>Golfarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Leoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Santolini</surname>
          </string-name>
          ,
          <article-title>Making data platforms smarter with MOSES, Future Gener</article-title>
          . Comput.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>