AI: any technique that makes computers capable of imitating human behavior; among these we remember the most emblematic:

Data & AI for Industrial Application

Antimo Angelino

antimo.angelino@mbda.it 0 0 Ital-IA 2024: 4th National Conference on Artificial Intelligence , organized by CINI 1 MBDA Italia SpA , via Carlo Calosi, Bacoli, 80070 , Italy

The use of Artificial Intelligence in the Industry can lead to recovery of efficiency for industrial processes (such as reduce scrap and rework rate, increase throughput time), and this can carry competitive advantages. Nevertheless, to correctly deploy artificial intelligence projects it is needed to have connectivity and quality data. Both are enabling factors for AI projects, and industries must put in place processes to reach them before to start the AI journey.

eol>Artificial Intelligence Data Analytics Data Quality Data Strategy Industrial Network 1

AI: any technique that makes computers capable of imitating human behavior; among these we remember the most emblematic:

• • • o expert systems: capable of simulating deductive logical reasoning o fuzzy logic : capable of introducing uncertainty management into logical reasoning o genetic algorithms: which, by imitating natural selection, are able to identify the optimal solution to a given problem; o artificial neural networks : systems that simulate the neural networks of our brain are able to learn from data and extrapolate behaviors and information; ML: specific AI techniques that make computers capable of learning; DL: a subset of ML techniques specifically based on deep (or multilayer) neural networks suitable for solving computer vision, image recognition and signal processing problems; GEN_AI: a sub set of DL that use NLP (natural language processing) technique to elaborate text and predict sentence starting from an input (prompt) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

2. What means Industrial Domain

With the reference “industrial domain” we means all the processes involved into manufacturing, maintenance and quality activities of industries; where industry start when raw materials arrive and finish when manufactured item is delivered, so excluding supply chain and customer support. The data involved into the industrial domain are generated by manufacturing and test machines and by IIoT (industrial internet of things) sensors; nevertheless also specific data contained into the MRP (manufacturing resource planning), MES (manufacturing execution system) and QMS (quality management system) are involved. Not only the source but also the type and format of these data are very different: data coming from machines and sensors (normally located on the shop floor) are raw data without the minimum requirement for trustworthiness and quality, and so they need to be pre-processed before to be used; data coming from main informative systems are normally trusted date, because there are in place processed to control and validate them before charged into the systems.

Besides the collection of the data is very complex due to the different nature of their sources. In fact the main informative systems are normally available over the company enterprise network, while machines and sensors are often isolated and when connected they are appended to a special network (normally called edge network). Due to cyber security risks, it is not possible to connect directly the two type of network. Therefore, to collect all the data it is necessary to put in place and industrial network according the standard ISA-IEC 62443 (known also as ISA-99, that replaced the ISA-95 the first standard for Industrial Network).

3. AI in the Industrial Domain

However, the birth of ML was not the only triggering factor behind the rediscovery of AI in the new millennium, but rather there was a mix of accelerating factors such as: 1. The exponential increase in data availability, thanks to the internet, connectivity systems (both wired and mobile) and intelligent sensors (commonly called IoT = Internet of Things ); 2. The possibility of collecting data in real time, thus providing an instant representation of reality; 3. The constant increase in computing power combined with the miniaturization of devices available at ever-decreasing costs (at least until pre-Covid ) All this has placed data at the center of decisionmaking strategies, effectively evolving decisionmaking models from knowledge models based or based on knowledge (often empirical and built with experience), to data driven models . A first effect induced by this epochal change is that while knowledge-based models were (and still are) used mainly for descriptive analysis (i.e. describing an event that occurred), data-driven models can be used for predictive analyses. (i.e. predicting an event before it happens).

Therefore, if data constitute the fuel of new decisionmaking models, data analytics and advanced data analytics techniques constitute their engine. In particular, while data analysis techniques are essentially based on the most common statistical formulas and are used to describe an event and diagnose its cause; the advanced ones are based on AI (mainly ML) algorithms and used to predict an event and prescribe actions to influence it. Figure 4 (Gartner 2012) shows the evolution of data analytics in four phases: descriptive, diagnostic, predictive and prescriptive; depicted in a Cartesian plane whose axes represent the value (returned by the analysis) and the difficulty (of the analysis itself).

The Figure 5 illustrates how the adoption of data-driven decision-making models combined with advanced analysis techniques enables another epochal change in business processes: the anticipation of corrective actions. Therefore moving from a reactive approach, i.e. chasing the problem after the event; to a predictive one, i.e. anticipating the problem before the event happens. Predictive and prescriptive analyzes have been used for years in various sectors: from financial to risk management, from marketing to communication and even politics; always with the aim of predicting events and (trying to) influence them through targeted actions.

In the industrial domain, the principal application is the prediction of failures, both on the product and on the machinery, and the prescription of actions aimed at ensuring that they do not occur. In fact, these applications have a direct impact on the efficiency and effectiveness of industrial processes, and consequently on competitiveness and cost reduction of companies. There are various declinations and use cases implemented in industries, which have given rise to different methodologies: • • • failure prediction (prediction of product failures), predictive quality (prediction of product quality), predictive maintenance (prediction of machinery failures).

The adoption of predictive systems in the industrial world, although pushed by managers who see the potential benefits, often encounters the reluctance of technicians who, accustomed to analyzes based on empirical experience, have difficulty accepting the predictions made from a heuristic model using ML algorithms. Therefore, a fundamental step for a predictive model to be accepted (and consequently then correctly used) is to demonstrate its reliability: its predictions and prescriptions are true. To do this the model validation phase is fundamental; it takes place after the training and testing phases. This phase is developed using real data, i.e. historical data recorded in the company and referring to real events: the predictive model, once trained and tested, is asked to provide a prediction starting from known data; the model's prediction is acquired and compared with what really happened.

To give some more elements, it is necessary to divide the predictive models based on their applications: regression models and classification models. The former are models that predict values, while the latter predict the classification of an event. A classic explanatory example is the predictive meteorology systems: a regression system predicts the temperature value, a classification model whether the day will be sunny, cloudy or rainy. The most commonly used metrics for validating these models are: 1. RMSE (Root Mean Squared Error) for regression models 2. Confusion Matrix for classification models The first metric consists of calculating the square root of the mean square error between the value predicted by the model and the actual value recorded in the company. An acceptable error value is less than 3%. The confusion matrix , on the other hand, is a slightly more complex metric, which is based on the combination of the exact and incorrect classifications made by the predictive model compared to the real ones. Figure 4 illustrates this in a very intuitive way. The parameters of confusion matrix should have a value greater than 95%.

4. Data Strategy

We can use a SOWT matrix to resume the use of AI in the industrial domain. It is so clear that managing data is the most critical part in the deployment of an AI project. In particular, the industrial domain there are completely different data source and data format.

First it is necessary to identify all data sources and define how to connect them to allow data collection and assure the possibility to recovery data in continuous and automatic way. After that it is necessary to identify (for each data source) the data format, so to label them and create a data catalog. It is important to note that not all the data generated are in a format “ready to use” for an AI model. Often it is necessary to pre-process the data with specific actions (i.e. cleaning, pruning, filtering, augmenting, etc.) to transform them from raw data to quality data. In particular in the industrial domain all the data generated by machines and sensors are normally raw data. It is therefore important in this phase to well define which pre-processing activities are necessary to transform raw data into quality data. International standards are available and can be used as reference: 1. ISO/IEC 8183:2023 - Data Life Cycle

Framework 2. ISO/IEC 42001:2023 - Artificial intelligence

Management System 3. ISO/IEC DIS 5259-1 - Data quality for analytics and machine learning In conclusion it is necessary that companies put in place a data strategy to correctly manage their data, setting specific processes with dedicated role profiles. An appropriate data strategy will safe companies against the threat of collecting big amount of data not useful for an AI model.

The Table 3 shows the principal activities for a Data Strategy, with indication of specific role profiles (data steward and data engineer) to set up in the business divisions. Normally a role of data architect is set up in the information technology division to manage data catalog and data storage policies.

5. Safeguards and Ethics

A problem that often emerges and leads models to make incorrect predictions is the bias (conditioning) of the data. In fact, a fundamental element for a model to make reliable predictions is that the data with which it is trained and tested are representative of the event it wants to predict. If the data only partially represents the event or represents a distorted view of it, even if the predictive model passes the training and testing phases, it will then make incorrect predictions.

Furthermore, since, by their intrinsic nature, it is difficult to analytically verify the behavior of an AIbased predictive model. Therefore to mitigate the effects of an incorrect prediction, the concept of risk management was introduced, which provides control mechanisms on decisions taken following predictions with the intention of limiting potential induced errors. The practice is now an international standard, which finds its place in the reference document ISO/IEC 23894. Also the recent EU AI ACT is based on the risk management. The figure 6 shows the level of risk based on the AI model implication. For example an autonomous system that classify spam email is considered as low risk, while a system used for social scoring is an unacceptable risk.

The problems inherent to the possibility of incorrect behavior of predictive models based on AI, added to the difficulty of analytical and timely feedback, lead to a much broader reflection that introduces the topic of ethics in the use of AI: what use should we make of AI? How much decision-making autonomy can we leave them? On the first, a conscious use of AI can help automate tasks that are currently manual and repetitive, giving a notable boost to industrial processes. The second question is much more complex: can AI-based systems be autonomous in making decisions and carrying out actions or do they only have to suggest a decision (or action) which the human must then validate and execute? In this case we talk about the dualism "autonomous systems vs human in the loop systems". The former are systems in which AI is given the opportunity to make decisions and carry out actions, in the second case the AI systems process the information and then suggest a decision or action that the human being must validate. The analysis is not trivial: there has been discussion about autonomous systems for years and various experiments and research have been conducted in many fields of application, but to date no system has fully convinced.

The latest research frontier in this field is called explainable artificial intelligence (XAI) and was introduced by Michael Van Lent in 2004. Its scope is to give put in clear the activities done by an AI system during its training, testing and execution phases.

6. AI & Quantum Computing

An important step in the AI systems is the quantum machine learning (QML), it is the combination of machine learning and quantum computing with the aim to use the machine learning model on the quantum computers. There are a lot of research on this field in academic world, while there are growing the first proof of concept application in the industrial companies.

Nevertheless there is still a lot of work to do: the quantum computer for industrial application will be available probably within 5-7 years; while the classical machine learning models are not yet all transformed into quantum one.

7. Conclusion

In conclusion, AI and its growing applications in industries can represent an opportunity to achieve otherwise unattainable target. Nevertheless we must know and govern AI, in order to use it in correctly and exploiting all its potential in the way most suited to the application context and comfortable to industries operating way.

A. Online Resources

A complete explanation of the confusion matrix is at: https://www.andreainini.com/ai/machinelearning/matrice-di-confusione

The official web page of EU AI ACT is:

https://artificialintelligenceact.eu/the-act/ The US Department of Defense official web page for Explainable AI is: https://www.darpa.mil/program/explainableartificial-intelligence

[1]

McCulloch ,

Pitts . "A Logical Calculus of Ideas Immanent in Nervous Activity." Bulletin of Mathematical Biophysics 5 ( 1943 ): 115 - 133 .

[2]

Rosenblatt . “ The perceptron: a probabilistic model for information storage and organization in the brain .” Cornell Aeronautical Laboratory, Psychological Review 65.6 ( 1958 ): 386 - 408 .

[3] Alan

Turing . “ Computing machinery and intelligence . ” Mind 59 ( 1950 ): 433 - 460

[4]

Angelino. “L'Intelligenza Artificiale .” Ingegneri Napoli 1 ( 2013 ): 3 - 10

[5]

Angelino. “L'Intelligenza Artificiale per i processi industriali . ” Sistemi & Impresa 4 ( 2022 ): 28 - 32 .

[6]

Angelino . “La convergenza tra IT ed OT. ” Sistemi & Impresa 2 ( 2022 ): 36 - 39 .

[7]

Rich ,

Knight , Intelligenza Artificiale, seconda edizione, McGraw Hill Italia , 1994 .

[8]

R. H.

Nielsen , Neurocomputing, Addison Wesley, 1991 .

[9]

S. J.

Russell , P. Norvig, Intelligenza Artificiale, terza edizione, Pearson , Vol. 1 e 2 , 2010 .

[10] M. Van Lent , An explainable artificial intelligence system for small-unit tactical behavior , in: Proceedings of the Nineteenth National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, July 25-29 , 2004 , San Jose, California, USA

[11]

Schuld ,

Sinayskiy , F. Petruccione. “ An introduction to quantum machine learning . ” Contemporary Physics 56 ( 2014 ): 172 - 185 .