=Paper=
{{Paper
|id=Vol-3762/598
|storemode=property
|title=Data & AI for Industrial Application
|pdfUrl=https://ceur-ws.org/Vol-3762/598.pdf
|volume=Vol-3762
|authors=Antimo Angelino
|dblpUrl=https://dblp.org/rec/conf/ital-ia/Angelino24
}}
==Data & AI for Industrial Application==
<pdf width="1500px">https://ceur-ws.org/Vol-3762/598.pdf</pdf>
<pre>
                                Data & AI for Industrial Application
                                Antimo Angelino

                                1MBDA Italia SpA, via Carlo Calosi, Bacoli, 80070, Italy


                                                    Abstract
                                                    The use of Artificial Intelligence in the Industry can lead to recovery of efficiency for industrial
                                                    processes (such as reduce scrap and rework rate, increase throughput time), and this can carry
                                                    competitive advantages. Nevertheless, to correctly deploy artificial intelligence projects it is
                                                    needed to have connectivity and quality data. Both are enabling factors for AI projects, and
                                                    industries must put in place processes to reach them before to start the AI journey.

                                                    Keywords
                                                    Artificial Intelligence, Data Analytics, Data Quality, Data Strategy, Industrial Network 1


                                1. Where AI comes from                                                         o   expert     systems:    capable     of
                                                                                                                   simulating      deductive     logical
                                    The Artificial Intelligence (AI) is something wider                            reasoning
                                and older that the hype of last years. Its birth can be                        o fuzzy logic : capable of introducing
                                dated back to 1943 when McCulloc and Pitts                                         uncertainty management into
                                introduced the concept of artificial neurons for the                               logical reasoning
                                first time. Concept then taken up by Rosenblatt in                             o genetic algorithms: which, by
                                1958 who presented the first artificial neural                                     imitating natural selection, are
                                network: the perceptron. In the middle, Alan Turing                                able to identify the optimal
                                (the father of computer science) in 1950 introduced                                solution to a given problem;
                                the concept of an intelligent machine.                                         o artificial neural networks : systems
                                    In its life AI has undergone various ups and downs                             that simulate the neural networks
                                and the alternating fortunes have always been linked                               of our brain are able to learn from
                                to successes in real use cases or to the emergence of                              data and extrapolate behaviors
                                favorable technological conditions, as well as periods                             and information;
                                of abandonment have been conditioned by the failures                 •    ML: specific AI techniques that make
                                in projects too ambitious or not yet technologically                      computers capable of learning;
                                mature. Since the mid-2000s there has been a                         •    DL: a subset of ML techniques specifically
                                rediscovery of AI thanks to the birth of a branch that                    based on deep (or multilayer) neural
                                is well suited to the resolution of predictive problems:                  networks suitable for solving computer
                                the machine learning (ML).                                                vision, image recognition and signal
                                    Figure 1 shows the relationship between AI, ML,                       processing problems;
                                DL (deep learning) and GEN_AI (generative AI), which                 •    GEN_AI: a sub set of DL that use NLP (natural
                                we could briefly define as:                                               language processing) technique to elaborate
                                                                                                          text and predict sentence starting from an
                                      •     AI: any technique that makes computers                        input (prompt)
                                            capable of imitating human behavior; among
                                            these we remember the most emblematic:


                                Ital-IA 2024: 4th National Conference on Artificial Intelligence,             © 2024 Copyright for this paper by its authors. Use permitted under
                                                                                                              Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                organized by CINI, May 29-30, 2024, Naples, Italy
                                ∗ Corresponding author.

                                   antimo.angelino@mbda.it (A. Angelino);


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                         floor) are raw data without the minimum
                                                         requirement for trustworthiness and quality, and so
                                                         they need to be pre-processed before to be used; data
                                                         coming from main informative systems are normally
                                                         trusted date, because there are in place processed to
                                                         control and validate them before charged into the
                                                         systems.
                                                         Besides the collection of the data is very complex due
                                                         to the different nature of their sources. In fact the
                                                         main informative systems are normally available over
                                                         the company enterprise network, while machines and
                                                         sensors are often isolated and when connected they
                                                         are appended to a special network (normally called
                                                         edge network). Due to cyber security risks, it is not
                                                         possible to connect directly the two type of network.
                                                         Therefore, to collect all the data it is necessary to put
                                                         in place and industrial network according the
                                                         standard ISA-IEC 62443 (known also as ISA-99, that
 Figure 1: relationship between AI and its major         replaced the ISA-95 the first standard for Industrial
frameworks application                                   Network).


2. What means Industrial Domain
With the reference “industrial domain” we means all
the processes involved into manufacturing,
maintenance and quality activities of industries;
where industry start when raw materials arrive and
finish when manufactured item is delivered, so
excluding supply chain and customer support.
The data involved into the industrial domain are
generated by manufacturing and test machines and by
IIoT (industrial internet of things) sensors;
nevertheless also specific data contained into the MRP
(manufacturing       resource      planning),      MES
(manufacturing execution system) and QMS (quality
management system) are involved.


                                                         Figure 3: industrial network schema

                                                         3. AI in the Industrial Domain
                                                         However, the birth of ML was not the only triggering
                                                         factor behind the rediscovery of AI in the new
                                                         millennium, but rather there was a mix of accelerating
                                                         factors such as:

                                                              1.   The exponential increase in data
Figure 2: what is Industrial Domain                                availability, thanks to the internet,
                                                                   connectivity systems (both wired and
Not only the source but also the type and format of                mobile) and intelligent sensors (commonly
these data are very different: data coming from                    called IoT = Internet of Things );
machines and sensors (normally located on the shop
     2.   The possibility of collecting data in real        from a reactive approach, i.e. chasing the problem
          time,     thus     providing  an   instant        after the event; to a predictive one, i.e. anticipating the
          representation of reality;                        problem before the event happens.
     3.   The constant increase in computing power
          combined with the miniaturization of
          devices available at ever-decreasing costs
          (at least until pre-Covid )

All this has placed data at the center of decision-
making strategies, effectively evolving decision-
making models from knowledge models based or based
on knowledge (often empirical and built with
experience), to data driven models . A first effect
induced by this epochal change is that while
knowledge-based models were (and still are) used
mainly for descriptive analysis (i.e. describing an         Figure 5: timing line of data analysis
event that occurred), data-driven models can be used
for predictive analyses. (i.e. predicting an event before   Predictive and prescriptive analyzes have been used
it happens).                                                for years in various sectors: from financial to risk
Therefore, if data constitute the fuel of new decision-     management, from marketing to communication and
making models, data analytics and advanced data             even politics; always with the aim of predicting events
analytics techniques constitute their engine. In            and (trying to) influence them through targeted
particular, while data analysis techniques are              actions.
essentially based on the most common statistical            In the industrial domain, the principal application is
formulas and are used to describe an event and              the prediction of failures, both on the product and on
diagnose its cause; the advanced ones are based on AI       the machinery, and the prescription of actions aimed
(mainly ML) algorithms and used to predict an event         at ensuring that they do not occur. In fact, these
and prescribe actions to influence it. Figure 4 (Gartner    applications have a direct impact on the efficiency and
2012) shows the evolution of data analytics in four         effectiveness     of    industrial    processes,    and
phases: descriptive, diagnostic, predictive and             consequently on competitiveness and cost reduction
prescriptive; depicted in a Cartesian plane whose axes      of companies. There are various declinations and use
represent the value (returned by the analysis) and the      cases implemented in industries, which have given
difficulty (of the analysis itself).                        rise to different methodologies:

                                                                 •    failure prediction (prediction of product
                                                                      failures),
                                                                 •    predictive quality (prediction of product
                                                                      quality),
                                                                 •    predictive maintenance (prediction of
                                                                      machinery failures).

                                                            The adoption of predictive systems in the industrial
                                                            world, although pushed by managers who see the
                                                            potential benefits, often encounters the reluctance of
                                                            technicians who, accustomed to analyzes based on
                                                            empirical experience, have difficulty accepting the
                                                            predictions made from a heuristic model using ML
                                                            algorithms. Therefore, a fundamental step for a
Figure 4: Gartner data analysis phases                      predictive model to be accepted (and consequently
                                                            then correctly used) is to demonstrate its reliability:
          The Figure 5 illustrates how the adoption of      its predictions and prescriptions are true. To do this
data-driven decision-making models combined with            the model validation phase is fundamental; it takes
advanced analysis techniques enables another                place after the training and testing phases. This phase
epochal change in business processes: the                   is developed using real data, i.e. historical data
anticipation of corrective actions. Therefore moving        recorded in the company and referring to real events:
the predictive model, once trained and tested, is asked
to provide a prediction starting from known data; the
model's prediction is acquired and compared with
what really happened.
    To give some more elements, it is necessary to
divide the predictive models based on their
applications: regression models and classification
models. The former are models that predict values,
while the latter predict the classification of an event.
A classic explanatory example is the predictive
meteorology systems: a regression system predicts
the temperature value, a classification model whether
the day will be sunny, cloudy or rainy. The most
commonly used metrics for validating these models          Table 2: SWOT matrix for AI adoption
are:
                                                           It is so clear that managing data is the most critical
    1.   RMSE (Root Mean Squared Error) for                part in the deployment of an AI project. In particular,
         regression models
                                                           the industrial domain there are completely different
    2.   Confusion Matrix for classification models
                                                           data source and data format.
    The first metric consists of calculating the square    First it is necessary to identify all data sources and
root of the mean square error between the value            define how to connect them to allow data collection
predicted by the model and the actual value recorded       and assure the possibility to recovery data in
in the company. An acceptable error value is less than     continuous and automatic way. After that it is
3%. The confusion matrix , on the other hand, is a         necessary to identify (for each data source) the data
slightly more complex metric, which is based on the        format, so to label them and create a data catalog. It is
combination of the exact and incorrect classifications     important to note that not all the data generated are
made by the predictive model compared to the real          in a format “ready to use” for an AI model. Often it is
ones. Figure 4 illustrates this in a very intuitive way.   necessary to pre-process the data with specific actions
The parameters of confusion matrix should have a           (i.e. cleaning, pruning, filtering, augmenting, etc.) to
value greater than 95%.                                    transform them from raw data to quality data. In
                                                           particular in the industrial domain all the data
                                                           generated by machines and sensors are normally raw
                                                           data. It is therefore important in this phase to well
                                                           define which pre-processing activities are necessary
                                                           to transform raw data into quality data. International
                                                           standards are available and can be used as reference:

                                                               1.   ISO/IEC 8183:2023 - Data Life Cycle
                                                                    Framework
                                                               2.   ISO/IEC 42001:2023 - Artificial intelligence
                                                                    Management System
                                                               3.   ISO/IEC DIS 5259-1 - Data quality for
Table 1: Confusion Matrix                                           analytics and machine learning

4. Data Strategy                                           In conclusion it is necessary that companies put in
                                                           place a data strategy to correctly manage their data,
We can use a SOWT matrix to resume the use of AI in        setting specific processes with dedicated role profiles.
the industrial domain.                                     An appropriate data strategy will safe companies
                                                           against the threat of collecting big amount of data not
                                                           useful for an AI model.
                                                           The Table 3 shows the principal activities for a Data
                                                           Strategy, with indication of specific role profiles (data
                                                           steward and data engineer) to set up in the business
                                                           divisions. Normally a role of data architect is set up in
the information technology division to manage data          the human must then validate and execute? In this
catalog and data storage policies.                          case we talk about the dualism "autonomous systems
                                                            vs human in the loop systems". The former are systems
                                                            in which AI is given the opportunity to make decisions
                                                            and carry out actions, in the second case the AI
                                                            systems process the information and then suggest a
                                                            decision or action that the human being must validate.
                                                            The analysis is not trivial: there has been discussion
                                                            about autonomous systems for years and various
                                                            experiments and research have been conducted in
                                                            many fields of application, but to date no system has
                                                            fully convinced.
Table 3: main principles of a data strategy


5. Safeguards and Ethics
     A problem that often emerges and leads models to
make incorrect predictions is the bias (conditioning)
of the data. In fact, a fundamental element for a model
to make reliable predictions is that the data with
which it is trained and tested are representative of the
event it wants to predict. If the data only partially
represents the event or represents a distorted view of
it, even if the predictive model passes the training and
testing phases, it will then make incorrect predictions.
     Furthermore, since, by their intrinsic nature, it is
difficult to analytically verify the behavior of an AI-
based predictive model. Therefore to mitigate the
effects of an incorrect prediction, the concept of risk
management was introduced, which provides control
mechanisms on decisions taken following predictions         Figure 6: EU AI ACT risk management
with the intention of limiting potential induced errors.
The practice is now an international standard, which            The latest research frontier in this field is called
finds its place in the reference document ISO/IEC           explainable artificial intelligence (XAI) and was
23894. Also the recent EU AI ACT is based on the risk       introduced by Michael Van Lent in 2004. Its scope is
management. The figure 6 shows the level of risk            to give put in clear the activities done by an AI system
based on the AI model implication. For example an           during its training, testing and execution phases.
autonomous system that classify spam email is
considered as low risk, while a system used for social
scoring is an unacceptable risk.
     The problems inherent to the possibility of
incorrect behavior of predictive models based on AI,
added to the difficulty of analytical and timely
feedback, lead to a much broader reflection that
introduces the topic of ethics in the use of AI: what use
should we make of AI? How much decision-making
autonomy can we leave them? On the first, a conscious
use of AI can help automate tasks that are currently
manual and repetitive, giving a notable boost to            Figure 7: explanation AI model
industrial processes. The second question is much
more complex: can AI-based systems be autonomous            6. AI & Quantum Computing
in making decisions and carrying out actions or do
                                                               An important step in the AI systems is the
they only have to suggest a decision (or action) which
                                                            quantum machine learning (QML), it is the
combination of machine learning and quantum                [7]  E. Rich, K. Knight, Intelligenza Artificiale,
computing with the aim to use the machine learning              seconda edizione, McGraw Hill Italia, 1994.
model on the quantum computers. There are a lot of         [8] R. H. Nielsen, Neurocomputing, Addison Wesley,
research on this field in academic world, while there           1991.
are growing the first proof of concept application in      [9] S. J. Russell, P. Norvig, Intelligenza Artificiale,
the industrial companies.                                       terza edizione, Pearson, Vol. 1 e 2, 2010.
    Nevertheless there is still a lot of work to do: the   [10] M. Van Lent, An explainable artificial intelligence
quantum computer for industrial application will be             system for small-unit tactical behavior, in:
available probably within 5-7 years; while the                  Proceedings of the Nineteenth National
classical machine learning models are not yet all               Conference on Artificial Intelligence, Sixteenth
transformed into quantum one.                                   Conference on Innovative Applications of
                                                                Artificial Intelligence, July 25-29, 2004, San Jose,
                                                                California, USA
                                                           [11] M. Schuld, I. Sinayskiy, F. Petruccione. “An
                                                                introduction to quantum machine learning.”
                                                                Contemporary Physics 56 (2014): 172-185.

                                                           A. Online Resources
                                                           A complete explanation of the confusion matrix is at:
                                                           https://www.andreainini.com/ai/machine-
                                                           learning/matrice-di-confusione

                                                           The official web page of EU AI ACT is:
Figure 8: quantum machine learning models                  https://artificialintelligenceact.eu/the-act/

7. Conclusion                                              The US Department of Defense official web page for
    In conclusion, AI and its growing applications in      Explainable AI is:
industries can represent an opportunity to achieve         https://www.darpa.mil/program/explainable-
otherwise unattainable target. Nevertheless we must        artificial-intelligence
know and govern AI, in order to use it in correctly and
exploiting all its potential in the way most suited to
the application context and comfortable to industries
operating way.

References
[1]   W. McCulloch, W. Pitts. "A Logical Calculus of
      Ideas Immanent in Nervous Activity." Bulletin of
      Mathematical Biophysics 5 (1943): 115-133.
[2]   F. Rosenblatt. “The perceptron: a probabilistic
      model for information storage and organization
      in the brain.” Cornell Aeronautical Laboratory,
      Psychological Review 65.6 (1958): 386-408.
[3]   Alan M. Turing. “Computing machinery and
      intelligence.” Mind 59 (1950): 433-460
[4]   A.     Angelino.   “L’Intelligenza  Artificiale.”
      Ingegneri Napoli 1 (2013): 3-10
[5]   A. Angelino. “L’Intelligenza Artificiale per i
      processi industriali.” Sistemi & Impresa 4
      (2022): 28-32.
[6]   A. Angelino. “La convergenza tra IT ed OT.”
      Sistemi & Impresa 2 (2022): 36-39.

</pre>