Data & AI for Industrial Application Antimo Angelino 1MBDA Italia SpA, via Carlo Calosi, Bacoli, 80070, Italy Abstract The use of Artificial Intelligence in the Industry can lead to recovery of efficiency for industrial processes (such as reduce scrap and rework rate, increase throughput time), and this can carry competitive advantages. Nevertheless, to correctly deploy artificial intelligence projects it is needed to have connectivity and quality data. Both are enabling factors for AI projects, and industries must put in place processes to reach them before to start the AI journey. Keywords Artificial Intelligence, Data Analytics, Data Quality, Data Strategy, Industrial Network 1 1. Where AI comes from o expert systems: capable of simulating deductive logical The Artificial Intelligence (AI) is something wider reasoning and older that the hype of last years. Its birth can be o fuzzy logic : capable of introducing dated back to 1943 when McCulloc and Pitts uncertainty management into introduced the concept of artificial neurons for the logical reasoning first time. Concept then taken up by Rosenblatt in o genetic algorithms: which, by 1958 who presented the first artificial neural imitating natural selection, are network: the perceptron. In the middle, Alan Turing able to identify the optimal (the father of computer science) in 1950 introduced solution to a given problem; the concept of an intelligent machine. o artificial neural networks : systems In its life AI has undergone various ups and downs that simulate the neural networks and the alternating fortunes have always been linked of our brain are able to learn from to successes in real use cases or to the emergence of data and extrapolate behaviors favorable technological conditions, as well as periods and information; of abandonment have been conditioned by the failures • ML: specific AI techniques that make in projects too ambitious or not yet technologically computers capable of learning; mature. Since the mid-2000s there has been a • DL: a subset of ML techniques specifically rediscovery of AI thanks to the birth of a branch that based on deep (or multilayer) neural is well suited to the resolution of predictive problems: networks suitable for solving computer the machine learning (ML). vision, image recognition and signal Figure 1 shows the relationship between AI, ML, processing problems; DL (deep learning) and GEN_AI (generative AI), which • GEN_AI: a sub set of DL that use NLP (natural we could briefly define as: language processing) technique to elaborate text and predict sentence starting from an • AI: any technique that makes computers input (prompt) capable of imitating human behavior; among these we remember the most emblematic: Ital-IA 2024: 4th National Conference on Artificial Intelligence, © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). organized by CINI, May 29-30, 2024, Naples, Italy ∗ Corresponding author. antimo.angelino@mbda.it (A. Angelino); CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings floor) are raw data without the minimum requirement for trustworthiness and quality, and so they need to be pre-processed before to be used; data coming from main informative systems are normally trusted date, because there are in place processed to control and validate them before charged into the systems. Besides the collection of the data is very complex due to the different nature of their sources. In fact the main informative systems are normally available over the company enterprise network, while machines and sensors are often isolated and when connected they are appended to a special network (normally called edge network). Due to cyber security risks, it is not possible to connect directly the two type of network. Therefore, to collect all the data it is necessary to put in place and industrial network according the standard ISA-IEC 62443 (known also as ISA-99, that Figure 1: relationship between AI and its major replaced the ISA-95 the first standard for Industrial frameworks application Network). 2. What means Industrial Domain With the reference “industrial domain” we means all the processes involved into manufacturing, maintenance and quality activities of industries; where industry start when raw materials arrive and finish when manufactured item is delivered, so excluding supply chain and customer support. The data involved into the industrial domain are generated by manufacturing and test machines and by IIoT (industrial internet of things) sensors; nevertheless also specific data contained into the MRP (manufacturing resource planning), MES (manufacturing execution system) and QMS (quality management system) are involved. Figure 3: industrial network schema 3. AI in the Industrial Domain However, the birth of ML was not the only triggering factor behind the rediscovery of AI in the new millennium, but rather there was a mix of accelerating factors such as: 1. The exponential increase in data Figure 2: what is Industrial Domain availability, thanks to the internet, connectivity systems (both wired and Not only the source but also the type and format of mobile) and intelligent sensors (commonly these data are very different: data coming from called IoT = Internet of Things ); machines and sensors (normally located on the shop 2. The possibility of collecting data in real from a reactive approach, i.e. chasing the problem time, thus providing an instant after the event; to a predictive one, i.e. anticipating the representation of reality; problem before the event happens. 3. The constant increase in computing power combined with the miniaturization of devices available at ever-decreasing costs (at least until pre-Covid ) All this has placed data at the center of decision- making strategies, effectively evolving decision- making models from knowledge models based or based on knowledge (often empirical and built with experience), to data driven models . A first effect induced by this epochal change is that while knowledge-based models were (and still are) used mainly for descriptive analysis (i.e. describing an Figure 5: timing line of data analysis event that occurred), data-driven models can be used for predictive analyses. (i.e. predicting an event before Predictive and prescriptive analyzes have been used it happens). for years in various sectors: from financial to risk Therefore, if data constitute the fuel of new decision- management, from marketing to communication and making models, data analytics and advanced data even politics; always with the aim of predicting events analytics techniques constitute their engine. In and (trying to) influence them through targeted particular, while data analysis techniques are actions. essentially based on the most common statistical In the industrial domain, the principal application is formulas and are used to describe an event and the prediction of failures, both on the product and on diagnose its cause; the advanced ones are based on AI the machinery, and the prescription of actions aimed (mainly ML) algorithms and used to predict an event at ensuring that they do not occur. In fact, these and prescribe actions to influence it. Figure 4 (Gartner applications have a direct impact on the efficiency and 2012) shows the evolution of data analytics in four effectiveness of industrial processes, and phases: descriptive, diagnostic, predictive and consequently on competitiveness and cost reduction prescriptive; depicted in a Cartesian plane whose axes of companies. There are various declinations and use represent the value (returned by the analysis) and the cases implemented in industries, which have given difficulty (of the analysis itself). rise to different methodologies: • failure prediction (prediction of product failures), • predictive quality (prediction of product quality), • predictive maintenance (prediction of machinery failures). The adoption of predictive systems in the industrial world, although pushed by managers who see the potential benefits, often encounters the reluctance of technicians who, accustomed to analyzes based on empirical experience, have difficulty accepting the predictions made from a heuristic model using ML algorithms. Therefore, a fundamental step for a Figure 4: Gartner data analysis phases predictive model to be accepted (and consequently then correctly used) is to demonstrate its reliability: The Figure 5 illustrates how the adoption of its predictions and prescriptions are true. To do this data-driven decision-making models combined with the model validation phase is fundamental; it takes advanced analysis techniques enables another place after the training and testing phases. This phase epochal change in business processes: the is developed using real data, i.e. historical data anticipation of corrective actions. Therefore moving recorded in the company and referring to real events: the predictive model, once trained and tested, is asked to provide a prediction starting from known data; the model's prediction is acquired and compared with what really happened. To give some more elements, it is necessary to divide the predictive models based on their applications: regression models and classification models. The former are models that predict values, while the latter predict the classification of an event. A classic explanatory example is the predictive meteorology systems: a regression system predicts the temperature value, a classification model whether the day will be sunny, cloudy or rainy. The most commonly used metrics for validating these models Table 2: SWOT matrix for AI adoption are: It is so clear that managing data is the most critical 1. RMSE (Root Mean Squared Error) for part in the deployment of an AI project. In particular, regression models the industrial domain there are completely different 2. Confusion Matrix for classification models data source and data format. The first metric consists of calculating the square First it is necessary to identify all data sources and root of the mean square error between the value define how to connect them to allow data collection predicted by the model and the actual value recorded and assure the possibility to recovery data in in the company. An acceptable error value is less than continuous and automatic way. After that it is 3%. The confusion matrix , on the other hand, is a necessary to identify (for each data source) the data slightly more complex metric, which is based on the format, so to label them and create a data catalog. It is combination of the exact and incorrect classifications important to note that not all the data generated are made by the predictive model compared to the real in a format “ready to use” for an AI model. Often it is ones. Figure 4 illustrates this in a very intuitive way. necessary to pre-process the data with specific actions The parameters of confusion matrix should have a (i.e. cleaning, pruning, filtering, augmenting, etc.) to value greater than 95%. transform them from raw data to quality data. In particular in the industrial domain all the data generated by machines and sensors are normally raw data. It is therefore important in this phase to well define which pre-processing activities are necessary to transform raw data into quality data. International standards are available and can be used as reference: 1. ISO/IEC 8183:2023 - Data Life Cycle Framework 2. ISO/IEC 42001:2023 - Artificial intelligence Management System 3. ISO/IEC DIS 5259-1 - Data quality for Table 1: Confusion Matrix analytics and machine learning 4. Data Strategy In conclusion it is necessary that companies put in place a data strategy to correctly manage their data, We can use a SOWT matrix to resume the use of AI in setting specific processes with dedicated role profiles. the industrial domain. An appropriate data strategy will safe companies against the threat of collecting big amount of data not useful for an AI model. The Table 3 shows the principal activities for a Data Strategy, with indication of specific role profiles (data steward and data engineer) to set up in the business divisions. Normally a role of data architect is set up in the information technology division to manage data the human must then validate and execute? In this catalog and data storage policies. case we talk about the dualism "autonomous systems vs human in the loop systems". The former are systems in which AI is given the opportunity to make decisions and carry out actions, in the second case the AI systems process the information and then suggest a decision or action that the human being must validate. The analysis is not trivial: there has been discussion about autonomous systems for years and various experiments and research have been conducted in many fields of application, but to date no system has fully convinced. Table 3: main principles of a data strategy 5. Safeguards and Ethics A problem that often emerges and leads models to make incorrect predictions is the bias (conditioning) of the data. In fact, a fundamental element for a model to make reliable predictions is that the data with which it is trained and tested are representative of the event it wants to predict. If the data only partially represents the event or represents a distorted view of it, even if the predictive model passes the training and testing phases, it will then make incorrect predictions. Furthermore, since, by their intrinsic nature, it is difficult to analytically verify the behavior of an AI- based predictive model. Therefore to mitigate the effects of an incorrect prediction, the concept of risk management was introduced, which provides control mechanisms on decisions taken following predictions Figure 6: EU AI ACT risk management with the intention of limiting potential induced errors. The practice is now an international standard, which The latest research frontier in this field is called finds its place in the reference document ISO/IEC explainable artificial intelligence (XAI) and was 23894. Also the recent EU AI ACT is based on the risk introduced by Michael Van Lent in 2004. Its scope is management. The figure 6 shows the level of risk to give put in clear the activities done by an AI system based on the AI model implication. For example an during its training, testing and execution phases. autonomous system that classify spam email is considered as low risk, while a system used for social scoring is an unacceptable risk. The problems inherent to the possibility of incorrect behavior of predictive models based on AI, added to the difficulty of analytical and timely feedback, lead to a much broader reflection that introduces the topic of ethics in the use of AI: what use should we make of AI? How much decision-making autonomy can we leave them? On the first, a conscious use of AI can help automate tasks that are currently manual and repetitive, giving a notable boost to Figure 7: explanation AI model industrial processes. The second question is much more complex: can AI-based systems be autonomous 6. AI & Quantum Computing in making decisions and carrying out actions or do An important step in the AI systems is the they only have to suggest a decision (or action) which quantum machine learning (QML), it is the combination of machine learning and quantum [7] E. Rich, K. Knight, Intelligenza Artificiale, computing with the aim to use the machine learning seconda edizione, McGraw Hill Italia, 1994. model on the quantum computers. There are a lot of [8] R. H. Nielsen, Neurocomputing, Addison Wesley, research on this field in academic world, while there 1991. are growing the first proof of concept application in [9] S. J. Russell, P. Norvig, Intelligenza Artificiale, the industrial companies. terza edizione, Pearson, Vol. 1 e 2, 2010. Nevertheless there is still a lot of work to do: the [10] M. Van Lent, An explainable artificial intelligence quantum computer for industrial application will be system for small-unit tactical behavior, in: available probably within 5-7 years; while the Proceedings of the Nineteenth National classical machine learning models are not yet all Conference on Artificial Intelligence, Sixteenth transformed into quantum one. Conference on Innovative Applications of Artificial Intelligence, July 25-29, 2004, San Jose, California, USA [11] M. Schuld, I. Sinayskiy, F. Petruccione. “An introduction to quantum machine learning.” Contemporary Physics 56 (2014): 172-185. A. Online Resources A complete explanation of the confusion matrix is at: https://www.andreainini.com/ai/machine- learning/matrice-di-confusione The official web page of EU AI ACT is: Figure 8: quantum machine learning models https://artificialintelligenceact.eu/the-act/ 7. Conclusion The US Department of Defense official web page for In conclusion, AI and its growing applications in Explainable AI is: industries can represent an opportunity to achieve https://www.darpa.mil/program/explainable- otherwise unattainable target. Nevertheless we must artificial-intelligence know and govern AI, in order to use it in correctly and exploiting all its potential in the way most suited to the application context and comfortable to industries operating way. References [1] W. McCulloch, W. Pitts. "A Logical Calculus of Ideas Immanent in Nervous Activity." Bulletin of Mathematical Biophysics 5 (1943): 115-133. [2] F. Rosenblatt. “The perceptron: a probabilistic model for information storage and organization in the brain.” Cornell Aeronautical Laboratory, Psychological Review 65.6 (1958): 386-408. [3] Alan M. Turing. “Computing machinery and intelligence.” Mind 59 (1950): 433-460 [4] A. Angelino. “L’Intelligenza Artificiale.” Ingegneri Napoli 1 (2013): 3-10 [5] A. Angelino. “L’Intelligenza Artificiale per i processi industriali.” Sistemi & Impresa 4 (2022): 28-32. [6] A. Angelino. “La convergenza tra IT ed OT.” Sistemi & Impresa 2 (2022): 36-39.