=Paper= {{Paper |id=Vol-3087/paper_40 |storemode=property |title=Combining Data-Driven and Knowledge-Based AI Paradigms for Engineering AI-Based Safety-Critical Systems |pdfUrl=https://ceur-ws.org/Vol-3087/paper_40.pdf |volume=Vol-3087 |authors=Juliette Mattioli,Gabriel Pedroza,Souhaiel Khalfaoui,Bertrand Leroy |dblpUrl=https://dblp.org/rec/conf/aaai/MattioliPKL22 }} ==Combining Data-Driven and Knowledge-Based AI Paradigms for Engineering AI-Based Safety-Critical Systems== https://ceur-ws.org/Vol-3087/paper_40.pdf
                 Combining Data-Driven and Knowledge-Based AI Paradigms
                     for Engineering AI-Based Safety-Critical Systems
     Juliette MATTIOLI1 , Gabriel PEDROZA2 , Souhaiel KHALFAOUI 3,5 , Bertrand LEROY 4,5
    1
      Thales, France - 2 CEA List, U. Paris-Saclay, France - 3 Valeo, France - 4 Renault, France - 5 IRT SystemX, France,
 juliette.mattioli@thalesgroup.com - gabriel.pedroza@cea.fr - souhaiel.khalfaoui@valeo.com - bertrand.leroy@renault.com

   This work received French government aid under the Investments for the Future program (PIA) within the framework of
                                      SystemX Technological Research Institute

                            Abstract                                  responsibility and accountability of AICS and their out-
                                                                      comes. As any critical system, an AICS needs to be verified,
  The development of AI-based systems entails a manifold of           validated, qualified and even certified, following, a suitable
  doubled-hard challenges. They are mainly due, on one side,          development methodology. Explainability is also an issue
  to the technical debt of involved engineering disciplines (sys-
  tems, safety, security), their inherent complexity, their yet-
                                                                      because AI systems and their decisions need to be explained
  to-solve concerns, and, on the other side, to the emergent          to integration and maintenance teams as well as end-users in
  risks of AI autonomy, the trade-offs between AI heuristics          an understandable manner. Auditability for assessing algo-
  vs. required determinism, and, overall, the difficulty to de-       rithms, data, knowledge, design and integration processes is
  fine, characterize, assess and prove that AI-based systems are      also a key property that has to be handled. The Fig. 1 high-
  sufficiently safe and trustworthy. Despite the vast amount of       lights various non-functional requirements that have to be
  research contributions and the undeniable progress in many          verified and demonstrated for a sound AI-based component
  fields over the last decades, a gap still exists between exper-     deployment within an AICS. All such requirements need a
  imental and certifiable AIs. The present paper aims at bridg-       sound and trustworthy AI engineering methodology with ef-
  ing this gap “by design”. Considering engineering paradigms         ficient supporting tools, while addressing various levels of
  as a basis to specify, relate and infer knowledge, a new
  paradigm is proposed to achieve AI certification. The pro-
                                                                      granularity. On one hand they must encompass specific al-
  posed paradigm recognizes existing AI approaches, namely            gorithmic domains engineering, including associated data,
  connectionist, symbolic, and hybrid, and proffers to lever-         models and knowledge representations. On the other hand,
  age their essential traits captured as knowledge. A concep-         they must guarantee architecture design correctness up to the
  tual meta-body is thus obtained respectively containing cat-        complete system engineering cycle.
  egories for Data-, Knowledge- and Hybrid- driven. Since it
  is observed that research strays from Knowledge-driven and
  it rather strives for Data-driven approaches, our paradigm
  calls for empowering Knowledge Engineering relying upon
  Hybrid-driven approaches to improve their coupling and ben-
  efit from their complementarity.


                        Introduction
Safety can be defined as “freedom from risk which is not
tolerable” (ISO). This definition implies that a safe system
is one in which scenarios with non-tolerable consequences             Figure 1: AISCS induced various non functional require-
have a sufficiently low probability, or frequency, of occur-          ments
ring. Thus, Safety critical systems must be dependable dur-
ing all their life-cycle, supporting evolution without incur-
ring prohibitive costs. It becomes mandatory that an AI-                 By the taxonomy of disciplines involved (Systems,
based Critical System (AICS) does what it has been spec-              Knowledge, Algorithms, Safety, and Security Engineering),
ified to do (correctness). In the near term, the goal of de-          their inherent complexity, the yet-to-solve concerns within
ploying AI on critical systems motivates research to handle           each of them (technical debt), and the fact that AICS lay
accountability, reliability, suitability, timeliness, etc. More-      down in the intersection of those disciplines, then AICS de-
over, AICS need to be resilient, safe and (cyber)-secure.             velopment becomes a doubled-hard challenge. Indeed, from
Complex mechanisms have to be integrated to ensure both               an engineering perspective, questions like, Which are the
                                                                      fundamental notions and features to characterize AICS?
Copyright © 2022 for this paper by its authors. Use permitted under   Which development languages and methods can sufficiently
Creative Commons License Attribution 4.0 International (CC BY         comprehend and interrelate those notions? How can knowl-
4.0).                                                                 edge be structured to suitably elicit, fulfill and verify require-
ments? and last, yet not the least, Whether the bundle of cri-      certain cases, the complementarity between techniques even
teria (ranging from data-sensitivity to explainability) can be      leads to overcome certain limitations of each other. Indeed,
harmonized and how?                                                 on one side, data-driven AI approaches successfully charac-
   This position paper introduces the need of a conceptual          terize and capture the salient traits of the data sets. However,
paradigm which aims to provide a basis upon which an-               being the connectionist models heuristic and agnostic of typ-
swers to previous questions can be elicited. To handle the          ical notion-encapsulation archetypes, they lack argumenta-
complexity of the subject, several choices are made. First,         tion necessary for explainability. On the other side, sym-
since autonomy is a distinctive feature of AICS, and it is          bolic approaches introduce a semantic layer aligned with
intrinsic to safety, the latter is placed as top-priority crite-    the notion-encapsulation archetype, which is amenable for
rion. Then it is used to guide design, conduct analyses and         expressing domain knowledge and concerns useful for vali-
align the rest of criteria: all in all, we target Safety-Critical   dation and argumentation.
AICS (AISCS). Secondly, the proposed paradigm assumes
that fundamental notions, as such, should appear in there, ir-      Paradigm Foreground
respective of the development methodology or framework.             The proposed paradigm leverages the background described
Last yet not the least, the paradigm also assumes that chal-        in previous Subsection in the following manner. First, sym-
lenges in AISCS development can be addressed via a body             bolic, connectionist and hybrid are methods and techniques
of knowledge amenable to (1) emulate whatever human-                with distinctive features which are applied to achieve the in-
beings perceive as intelligence and (2) integrate related con-      tended AI functionality. In that respect, once a technique
cerns and in particular safety. Overall, this paper is dedicated    is selected, the engineering choices are mostly oriented to
to provide a first specification of a conceptual paradigm as        explore and decide HOW and WHEN to apply. Given the
a basis for safe AI-based systems development, in order to          referred challenges for AISCS development, and in order
design a sound and tooled AI engineering methodology that           to extend the space of engineering choices, it is proposed
encompasses with objective of trustworthy AI algorithm en-          to consider such techniques as instances of a more abstract
gineering, data engineering, knowledge engineering and AI           meta-structure: an AI-body of knowledge. Such body is
system engineering.                                                 meant to include structured information amenable to dis-
                                                                    sert about WHAT and WHY, in a first place, and in addition
Data-Knowledge-Based Paradigm for Safe AI                           HOW and WHEN. Thus, since the connectionist approaches
                                                                    treat and depend upon data sets, then the AI-body of knowl-
In this Section, we shortly describe the background taken as
                                                                    edge includes the category of approaches driven by data, i.e.
reference for the description of our paradigm and how it is
                                                                    a Data-driven AI. Similarly, the symbolic approaches rely
leveraged to constitute the expected foreground.
                                                                    upon rules allowing reasoning on terms and notions to infer
                                                                    further knowledge, then the AI-body contains a category of
Paradigm Background                                                 approaches driven by knowledge, i.e. a Knowledge-driven
After conducting a brief survey of approaches for AI de-            AI. A mix of Data- and Knowledge-Driven yields the third
velopment (Foggia, Genna, and Vento 2001) (Sun 2015)                category named Hybrid-driven AI.
(Besold et al. 2017), it is observed that, research production         From the literature survey, the research seems to stray
is mostly distributed over two big fields, namely symbolic          from symbolic AI methods and instead leverage learning-
(Belle 2020) and connectionist (Kasabov 2012). Symbolic             based artificial neural networks. However, some underlin-
approaches are based upon a syntax that is endowed with             ing issues of current data-driven approaches such as ro-
formal semantics (meaning) useful for properties expres-            bustness, fairness, explainability, maintainability, etc. are
sion and verification. They have been successfully applied          leading some to call for a return of knowledge-based or
to increase system’s trustworthiness in different applica-          some reconciliation of the two main paradigms through Hy-
tion domains like health care, automotive, aeronautics, rail-       brid AI (Garnelo and Shanahan 2019). Following this call,
way (Hofer-Schmitz and Stojanović 2020). Contrary to sym-          the paradigm proposed hereinafter strives for strengthening
bolic, connectionist approaches are not built upon an explicit      Knowledge-driven AIs, targeting a tighter coupling to Data-
representation of human expertise: the behaviour is learned         driven AIs and relying upon the Hybrid-driven approaches.
from data instances. Connectionist algorithms are based
upon a statistical or probabilistic model which is tightly cou-     AI Safety stakes
pled to data sets which are first used for model training, and      The conceptual paradigm aims to address several stakes of
once tuned, for performance evaluation. The model is often          AISCS. Some of the most salient are listed below.
structured as a set of nodes defined by multi-value functions
or random variables. The nodes are interconnected, and the           • Quantitative safety metrics and methods. Metrics based
links can be randomly weighted by values influencing nodes             upon failure rates have been proven effective to achieve
inputs/outputs (Kasabov 2012). More recently (Sun 2015),               suitable levels of safety. However, new metrics and meth-
hybrid approaches integrating symbolic-based and connec-               ods to measure errors and misbehaviours of AI modules
tionist paradigms have been proposed. Hybrid approaches                are yet to be defined and incorporated.
aim to profit of salient features of both symbolic and con-          • Qualitative safety metrics and methods. Human inter-
nectionist leaving out any potential concurrence (Foggia,              pretability calls for qualitative metrics. Indeed, explain-
Genna, and Vento 2001), (Garnelo and Shanahan 2019). In                ability, risks, and even safety of AI are notions that re-
   quire qualitative interpretation (meaning) in order to be        or analytically define - through statistical approaches by
   assessed.                                                        inferring the inherent structure of a set of examples (input
 • Data modeling and quality. Data modeling is proposed             data) that can be used for mapping new data samples.
   as a mean for assessment of data features. The influence       • (Newell and Simon 2007) claimed that “Symbols lie at
   of data traits (e.g., data diversity (Ashmore and Mda-           the root of intelligent action” and should therefore be a
   har 2019), existing vs. possible input values) over the AI       central component in the design of artificial intelligence.
   modules and their intended functionality need to be as-          In its initial form, knowledge-based AI focused on the
   sessed and integrated during design.                             transfer process; transferring the expertise of a problem-
 • Traceability of safety-related events. Irrespective of the       solving human into a program that could take the same
   design method, safety events need to be traceable. In a          data and make the same conclusions.
   top-down perspective, high-level safety scenarios should
   cascade down over the detailed architecture so as to in-      Knowledge-based AI
   fer dependencies and identify critical subsystems. In a       In Software Engineering, the distinction between a func-
   bottom-up perspective, errors/failures at component level     tional specification and the design/implementation of a sys-
   need to be propagated upwards to determine safety ef-         tem is often discussed as a separation of what and how. Dur-
   fects. A structuring layer to support such analyses seems     ing the specification phase, what the system should do is es-
   necessary but is still missing.                               tablished in interaction with the users. How the system func-
 • AI safety levels and certification. AI errors and misbe-      tionality is realized is defined during design and implemen-
   haviours shall be characterized in such manner that their     tation (e.g., which algorithmic solution can be applied). This
   effects only lead to bearable risks. Assurance and trust      separation does not work in the same way for a knowledge-
   on AI can be built upon provable levels of safety incor-      based system (KBS) which is a computer system that rep-
   porated and evaluated all along the development cycle.        resents and uses knowledge to carry out a task and inference
   The conceptual paradigm aims to be a basis of a frame-        procedures to solve problems that are difficult enough to re-
work for representation and integration of previous as-          quire significant human expertise for their resolution. Thus,
pects as well as for inference and assessment. The build-        such system has two distinctive features: a knowledge base
ing process consists in empowering Knowledge Engineering         and an inference engine, where knowledge is then assumed
through a better coupling of Data- and Knowledge-driven          to be given “declaratively“ by a set of Horn clauses, produc-
approaches. The rest of the paper is dedicated to describe       tion rules, constraints or frames and where inference engines
the salient aspects and constituents of the proposal.            like unification, forward or backward resolution, and inheri-
                                                                 tance capture the dynamic part of deriving new information.
Empowering Knowledge-based AI systems by                            For instance, constraint programming (CP) is a
                                                                 knowledge-based AI approach (Rossi, Van Beek, and
        Knowledge Engineering                                    Walsh 2008) for solving combinatorial problems where
Introduced in 1956, AI is a computer sciences discipline         constraints model the problem and a general purpose
concerned with the theory and development of artificial sys-     constraint solver is used to solve it. The main idea is to
tems able to perform cognitive activities such as reasoning,     propose (1) a modeling language for combinatorial opti-
knowledge representation, planning, learning, natural lan-       mization problems (through variable and constraints) and
guage processing, perception and decision. AI includes a         (2) a generic search algorithm able to solve a combinatorial
wide range of technologies which can be divided into two         problem described using the modeling language. Mainly,
broad categories: (1) Data-driven AI which includes neu-         a constraint solver aims at reducing dedicated algorithms
ral networks, statistical learning, evolutionary computing...;   implementation costs and constitutes a framework for reuse
and (2) Knowledge-based AI which focuses on the develop-         in combinatorial optimization. In other words, the essence
ment of ontology and semantics graphs, knowledge-based           of CP is based on a clean separation between the statement
systems and reasoning... However, each AI paradigm only          of the problem (the variables and the constraints), and the
focuses on portions of the information and decision chain,       resolution of the problem (the algorithms) (Heipcke 1999).
leading to solutions that are thus not driven by the global
“good decision” goal, making them globally inefficient.          Knowledge engineering
The main AI paradigms                                            Knowledge engineering (KE) is the process of understand-
                                                                 ing and then representing human knowledge in data struc-
The premises of data-driven AI and knowledge-based AI are        tures, semantic models (conceptual diagram of the data as
fundamentally different (see figure 2). The paradigm of data-    it relates to the real world) and heuristics. Expert systems,
driven AI is based on brain-style learning such as neural        constraint programming, ontologies, ... are examples that
networks, whereas Knowledge-based AI approaches employ           form the basis of the representation and application of this
model and knowledge reasoning.                                   knowledge. The basic assumption is that both knowledge
 • Often used in the context of pattern recognition, classi-     and experience can be captured and archived in textual or
   fication, clustering or perception, data-driven AI such       rule-based form, using formalization methods.In its initial
   as machine learning aims at capturing tacit knowledge -       form, KE focused on transferring the expertise of a problem-
   knowledge which is difficult or impractical to explicitly     solving human into a program that could take the same data
          Figure 2: Data-driven AI, Knowledge-based AI and Hybrid AI paradigms illustrated with some techniques


and make the same conclusions. In the 1990s, the KE com-           from the synthesis of prior knowledge. One of the most im-
munity shifted gradually to domain knowledge, in particu-          portant key points of knowledge discovery is to ensure that
lar reusable representations in the form of ontologies. This       correct and relevant knowledge is extracted and represented
evolution aimed at alleviating KE limitation to accurately         to the stakeholders and decision makers. No matter what
reflect how humans make decisions and more specifically            kind of knowledge is collected, this process can be realized
its failure to take into account intuition and “gut feeling”,      in a manual way and in an automatic way.
known as “reasoning by analogy”. Nevertheless, designing              Even if knowledge discovery is today dominated by ma-
knowledge-based AI component induces some general fun-             chine learning (ML) approaches, the iterative execution of
damental problems.                                                 the CRISP-DM1 methodology (Chapman et al. 1999), which
1. Knowledge discovery: how do we translate knowledge              is today considered the de-facto standard for knowledge dis-
   as it currently exists in textbooks, articles, databases, and   covery projects, assumes an interaction between domain ex-
   human skills into abstract representations in a computer?       perts and the data scientists. In practice, the ML model cre-
                                                                   ation process tends to involve a highly iterative exploratory
2. Knowledge representation: how do we represent human             process. In this sense, an effective ML modeling process re-
   knowledge in terms of data structures that can be pro-          quires solid knowledge and understanding of the different
   cessed by a computer? How to determine the best repre-          types of ML algorithms and their parameter tuning (Maher
   sentation for any given problem?                                and Sakr 2019), which can be guided by domain knowledge
3. Knowledge reasoning: how do we use these abstract               or heuristics (Gibert et al. 2018).
   data structures to generate useful information in the con-
   text of a specific case? How to manipulate the knowledge        Knowledge Representation and Reasoning
   to provide explanations to the user?                            Knowledge Representation and Reasoning (KRR) repre-
4. KBS development lifecycle: How to verify and up-                sents information from the real world for a computer to un-
   date the knowledge base? How to evaluate and validate           derstand and then utilize this knowledge to solve complex
   knowledge-based systems?                                        real-life problems. KRR is not just about storing data in a
                                                                   database, it is the study of how what we know can at the
                 Knowledge discovery                               same time be represented as comprehensibly as possible and
                                                                   reasoned with as effectively as possibly. One of the main is-
While some knowledge is easy to obtain and understand,             sue is to find the best trade-off between these two concerns.
other knowledge may be difficult to obtain or interpret. In           For (Sowa 2000), “Knowledge Representation is the ap-
many situations, experts do not have any formal basis for          plication of logic and ontology to the task of constructing
problem solving or for explaining their reasoning process.         computable models for some domain”. Therefore, the way
So they tend to use “rules of thumb” (heuristics) devel-           a knowledge representation is conceived reflects a particu-
oped on the basis of their experience to help them make            lar insight or understanding of how people reason. The se-
decisions. Thus, Knowledge discovery is the process of
collecting, extracting, transferring, accumulating, structur-         1
                                                                       CRISP-DM stands for CRoss Industry Standard Process for
ing, transforming and organizing (domain) knowledge (e.g.,         Data Mining is a model proposed by a consortium initially com-
problem-solving expertise) from data and information or            posed with DaimlerChryrler, SPSS and NCR
lection of any of the currently available representation tech-    ity to discover features in high-dimensional data with little
nologies (such as logic, knowledge bases, ontology, seman-        or no human intervention. Several features must be taken
tic networks...) commits one to fundamental views on the          into account when developing a KBS:
nature of intelligent reasoning and consequently very dif-          • Redundancy: are there identical or equivalent knowledge
ferent goals and definitions of success. As we manipulate             model (such as rules within expert systems, concepts
concepts with words, all ontologies use human language                within ontologies, constraints withing constrained solv-
to “represent” the world. Thus, ontology is expressed as a            ing) that is a special case of another (subsumed)?
formal representation of knowledge by a set of concepts
                                                                    • Consistency: Are there ambiguous or conflicting knowl-
within a domain and the relationships between these con-
                                                                      edge, is there indeterminacy in its application? Is it in-
cepts. Nevertheless, the “fidelity” of the representation de-
                                                                      tended? Are several outcomes possible, for example, de-
pends on what the knowledge-based system captures from
                                                                      pending on the strategy (the order in which the knowl-
the real thing and what it omits. If such system has an im-
                                                                      edge models are ordered)?
perfect model of its universe, knowledge exchange or shar-
ing may increase or compound errors during the reasoning            • Minimality: can the knowledge set be reduced and sim-
process. As such, a fundamental step is to establish effective        plified? Is the reduced form logically equivalent to the
knowledge representation (symbolic representation) that can           first one?
be used by future hybrid systems. Symbolic methods may be           • Completeness: Are all possible entries covered by the
more adapted to dealing with sparse data, support enhanced            knowledge of the set?
explainability and incorporate past human knowledge, while        Thus, a good KBS must have properties such as:
machine learning methods excel at pattern recognition and
data clustering/classification problems.                            • Representational Accuracy: It should represent all kinds
                                                                      of required knowledge.
          The symbol grounding problem                              • Inferential Adequacy: It should be able to manipulate
Developers building knowledge-based systems (KBS), usu-               the representational structures to produce new knowl-
ally create knowledge bases from scratch through a tedious            edge corresponding to the existing structure.
and time-consuming process. First, they have to deal with           • Inferential Efficiency: The ability to direct the inferen-
the diversity and heterogeneity of knowledge representation           tial knowledge mechanism into the most productive di-
formalisms and with modeling, taxonomical, and termino-               rections by storing appropriate guides.
logical mismatch of different knowledge items, even if they         • Acquisitional Efficiency: The ability to acquire new
belong to the same application domain. Thus, while the data           knowledge easily using automatic methods.
engineers focus on building the data pipes and data scientists       Another key concern in knowledge based modeling is sta-
focus on inference methods, knowledge engineers focus on          bility. How much variability is there between instances of
modeling structural use cases and detailing concepts of ex-       the problem? How stable is the solution method to small
pert knowledge. Knowledge engineering methods adapt to            changes? Is the problem very dynamic? What happens if (a
use cases of knowledge and can model for specific require-        small amount of) the data changes? Do solutions need to be
ments and in many cases produce reusable formats. One of          robust to small changes? Many such questions need to be
the main limitations of knowledge-based systems lie in the        answered before we can be sure that a particular knowledge
abstract nature of the considered knowledge, in acquiring         based AI technique is a suitable technology.
and manipulating large volumes of information or data, and
the limitations of cognitive and other scientific techniques.          Knowledge-driven AISCS Engineering
   Despite of the progress in KE and ontology engineering
in the last decade, obstacles remain. Modeling is still a dif-    ML based AISCS engineering is often portrayed as the cre-
ficult task, as with the choice of the suitable knowledge-        ation of a ML/DL model, and its deployment. In practice,
based AI technology. Like every model, such a model is            however, the ML/DL model is only a small part of the
only an approximation of the reality. The modeling pro-           overall system and significant additional functionality is re-
cess is often cyclic. Expert Knowledge, notably via Mod-          quired to ensure that the ML/DL model can operate in a reli-
eling bias, whereby a human manually designing a model            able and predictable fashion with proper engineering of data
(or part of a model) does not take into account some as-          pipelines, monitoring and logging, etc. To capture these as-
pects of the environment in building the model, consciously       pects of AI engineering we defined the ML algorithm engi-
or unconsciously. New observations may lead to a refine-          neering pipeline (see fig. 3), where we distinguish between
ment, modification, or completion of the already built-up         requirements driven development, outcome-driven develop-
model. On the other side, the model may guide the fur-            ment and AI-driven development. As the starting point, data
ther acquisition of knowledge. Therefore an evaluation of         must be available for training. Based on data engineering,
the model with respect to reality is indispensable for the cre-   there are various ways to collect and qualify data set and
ation of an adequate model. These limitations relate to the       divide it to training, testing, and cross-validation sets. Engi-
so-called symbol grounding problem (Harnad 1990), and             neering activities have to be encapsulated as a series of steps
concern the extent to which representational elements are         within the pipeline such as:
hand-crafted rather than learned from data. By contrast, one       • 1) Problem specification, including the Operational De-
of the strengths of machine learning methods are their abil-         sign Domain (ODD), that is the description of the specific
                                   Figure 3: Proposed ML algorithm engineering pipeline


  operating condition(s) in which a safety-critical function     • 5) Evaluation and verification, after an acceptable set of
  or system is designed to properly operate, including but         hyperparameters is found and the model accuracy is op-
  not limited to environmental conditions and other do-            timized, we can finally test our model. Testing uses our
  main constraints. These requirements describe the spe-           test dataset and is meant to verify/demonstrate that our
  cific function that the ML items should implement as             models are correct and guarantee some required proper-
  well as the safety, performance, and other requirements          ties such as robustness and/or explainability. Based on
  that the ML items should achieve.                                the feedback, we may return to training the model to im-
• 2) Data engineering, including data collection, prepa-           prove correctness, accuracy and robustness, then adjust
  ration and data segregation propose some guidelines. A           output settings, or deploy the model as needed.
  machine learning model requires large amounts of data,         • 6) Model Deployment in the overall system with respect
  which help the model learning about system objectives            to safety and cyber-security system requirements. Learn-
  and purpose. Before it can be used, data needs to be             ing assurance case methods can be used.
  collected and usually also prepared. Data collection is          Then, an ML algorithm has to be designed or selected on
  the process of aggregating data from multiple sources.        existing ML library (such as Scikit Learn (Pedregosa et al.
  The collected data needs to be sizable, accessible, under-    2011)), to provide a ML model together with its hyperpa-
  standable, reliable, and usable. Data preparation, or data    rameters. Next, the model is trained with the training data.
  pre-processing, is the process of transforming raw data       During the training phase, the system is iteratively tuned so
  into usable information.                                      that the output has a good match with the “right answers”
• 3) ML Algorithm Design, after feeding training set to         in the training material. This trained model can also be val-
  the ML algorithm, it can learn appropriate parameters         idated with different data. If this validation is successful –
  and features. Once training is complete, the model will       with any criteria we decide to use – the model is ready for
  be refined by using the validation data-set. This may in-     deployment, similarly to any other component.
  volve modifying or discarding variables and including
  a process of tweaking model-specific settings (hyperpa-                              Conclusion
  rameters) until an acceptable accuracy level is reached.
                                                                “Data-driven AI is the AI of the senses, and knowledge-
• 4) Implementation, to develop ML components, we               based AI is the AI of meaning”2 . This is why, in order
  have to decide on the targeted hardware platform, the         to cover all cognitive capacities, the future lies in the hy-
  IDE (Integrated Development Environment) and the lan-         bridization of these two paradigms, which are often placed
  guage for development. There are several choices avail-       in opposition to AI. Indeed, the shortcomings of deep learn-
  able. Most of these would meet our requirements easily        ing align with the strengths of knowledge-based AI, which
  as all of them provide the implementation of AI algo-
  rithms discussed so far, but sometimes we have to take           2
                                                                     David Sadek, VP Research Technologies and Innovation at
  into account embedded constraints.                            Thales
raises the possible benefits of hybridization. First, thanks to   Belle, V. 2020. Symbolic Logic meets Machine Learning: A
their declarative nature, symbolic representations can eas-       Brief Survey in Infinite Domains. CoRR, abs/2006.08480.
ily be reused in multiple tasks, which promotes data effi-        Besold, T. R.; d’Avila Garcez, A.; Bader, S.; Bowman, H.;
ciency. Second, symbolic representations tend to be high-         Domingos, P.; Hitzler, P.; Kuehnberger, K.-U.; Lamb, L. C.;
level and abstract, which facilitates generalization. Lastly,     Lowd, D.; Lima, P. M. V.; de Penning, L.; Pinkas, G.; Poon,
because of their propositional nature, symbolic representa-       H.; and Zaverucha, G. 2017. Neural-Symbolic Learning and
tions are amenable to human understanding. AI algorithms          Reasoning: A Survey and Interpretation. arXiv:1711.03902.
need relevant observations to be able to predict the outcome      Chapman, P.; Clinton, J.; Kerber, R.; Khabaza, T.; Reinartz,
of future scenarios accurately, and thus, data-driven models      T.; Shearer, C.; and Wirth, R. 1999. The CRISP-DM user
alone may not be sufficient to ensure safety as usually we        guide. In 4th CRISP-DM SIG Workshop.
do not have exhaustive and fully relevant data. Nevertheless,
as any critical system, an AISCS needs to have well defined       Foggia, P.; Genna, R.; and Vento, M. 2001. Symbolic
development methods from its design to its deployment and         vs. connectionist learning: an experimental comparison in
qualification. This requires a complete tool chain ensuring       a structured domain. IEEE Transactions on Knowledge and
trust at all stages, as:                                          Data Engineering, 13(2): 176–195.
1. Specification, knowledge and data management,                  Garnelo, M.; and Shanahan, M. 2019. Reconciling deep
                                                                  learning with symbolic artificial intelligence: representing
2. Algorithm and system architecture design,
                                                                  objects and relations. Current Opinion in Behavioral Sci-
3. AI functions characterization, verification and validation,    ences, 29: 17–23.
4. Deployment, particularly on embedded architecture,             Gibert, K.; Izquierdo, J.; Sànchez-Marrè, M.; Hamilton,
5. Qualification, certification from a system point of view.      S. H.; Rodrı́guez-Roda, I.; and Holmes, G. 2018. Which
                                                                  method to use? An assessment of data mining methods in
                                                                  Environmental Data Science. Environmental modelling &
                                                                  software, 110: 3–27.
                                                                  Harnad, S. 1990. The symbol grounding problem. Physica
                                                                  D: Nonlinear Phenomena, 42(1-3): 335–346.
                                                                  Heipcke, S. 1999. Comparing constraint programming and
                                                                  mathematical programming approaches to discrete optimi-
                                                                  sation—the change problem. Journal of the Operational Re-
Figure 4: Revisiting all engineering disciplines for a sound      search Society, 50(6): 581–595.
deployment of AISCS
                                                                  Hofer-Schmitz, K.; and Stojanović, B. 2020. Towards for-
                                                                  mal verification of IoT protocols: A Review. Computer Net-
   All that demands a sound and tooled AI engineering             works, 174: 107233.
methodology that encompasses, with objective of trustwor-
thy AI algorithm engineering, data engineering, knowledge         Kasabov, N. 2012. Evolving spiking neural networks for
engineering and AI system engineering by addressing the is-       spatio-and spectro-temporal pattern recognition. In 2012 6th
sues described above. Academic research already proposes          IEEE International Conference Intelligent Systems, 27–32.
solutions towards AI certification3 , industry should take over   Maher, M.; and Sakr, S. 2019. Smartml: A meta learning-
now (see Fig. 4). At French national level major industrial       based framework for automated selection and hyperparam-
players in the fields of Automotive, Aeronautics, Defense,        eter tuning for machine learning algorithms. In The 22nd
Manufacturing and Energy (Air Liquide, Airbus, Atos, EDF,         EDBT.
Naval-Group, Renault, Safran, SopraSteria, Thales, Total          Newell, A.; and Simon, H. A. 2007. Computer science as
and Valeo) with the support of academic partners (CEA, IN-        empirical inquiry: Symbols and search. In ACM Turing
RIA, IRT Saint Exupéry and IRT SystemX) are collaborat-          award lectures, 1975.
ing together to address such issues through the French Na-        Pedregosa, F.; Varoquaux, G.; Gramfort, A.; et al. 2011.
tional Program “Confiance.ai” (https://www.confiance.ai/).        Scikit-learn: Machine learning in Python. the Journal of ma-
Based on the specifications described above, this program         chine Learning research, 12: 2825–2830.
aims to bridge the gap between AI Proof of Concepts and AI
deployment within critical systems toward certification by        Rossi, F.; Van Beek, P.; and Walsh, T. 2008. Constraint pro-
providing an interoperable engineering workbench to sup-          gramming. Foundations of Artificial Intelligence, 3: 181–
port AI processes and practices through methods and tools         211.
during the overall lifecycle of the AI-based system.              Sowa, J. F. 2000. Guided tour of ontology. Retrieved from.
                                                                  Sun, R. 2015. Artificial Intelligence: Connectionist and
                              References                          Symbolic Approaches. In Wright, J. D., ed., International
Ashmore, R.; and Mdahar, B. 2019. Rethinking Diversity in         Encyclopedia of the Social & Behavioral Sciences (2nd Edi-
the Context of Autonomous Systems. Safety-Critical Sys-           tion), 35–40. Oxford: Elsevier, second edition edition.
tems Symposium 2019, 175–192.
   3
       https://www.deel.ai/