=Paper=
{{Paper
|id=Vol-3335/SIoT_paper1
|storemode=property
|title=Semantic Models and Machine Learning Approach in
CPS : a Survey
|pdfUrl=https://ceur-ws.org/Vol-3335/SIoT_Paper1.pdf
|volume=Vol-3335
|authors=Hafidi Mohamed Madani,Meriem Djezzar,Hemam Mounir,Ahmed Seghir Zianou,Moufida Maimour
}}
==Semantic Models and Machine Learning Approach in
CPS : a Survey ==
Semantic Models and Machine Learning Approach in CPS :A Survey⋆ Hafidi Mohamed Madani1,2,∗,† , Meriem Djezzar1,3 , Hemam Mounir1,2,∗,† , Ahmed Seghir ZIANOU1,2 and Moufida MAIMOUR4,∗,† 1 University of , Khenchela, Algeria 2 ICOSI Laboratory, Khenchela, Algeria 3 LIRE Laboratory, Constantine, Algeria 4 Université de Lorraine, CNRS, CRAN, F-54000 Nancy, France Abstract The amazing growth and advancements reached in information and communication technologies in recent years allow easily the integration of intelligent components and systems into the traditional manufacturing industry. Enabling new challenges and applications in Industry 4.0 (I4.0) new systems. Cyber Physical Systems (CPSs) are a new generation of systems composed of a set of collaborative cyber and physical components with computation capabilities, generating and exchanging data in a loop between digital and physical worlds in a highly interconnected network. These enormous large amounts of data produced in or between CPSs are heterogeneous in terms of format and type due to different data sources, which leads to errors and malfunction in these systems due to the lack of interoperability between their components. As a result, sharing and exchanging data in CPSs is a challenging task to do. On other hand, modeling digital systems that reflect the current state of the physical entities and their behavior is a complex task to achieve. Especially, when communicating and processing data in real time to extract useful information. To overcome these challenges, semantic data models and knowledge representation when applied with Machine Learning (ML) techniques can enable solutions to interoperability problems in CPS, making it possible to mirror the physical reality and monitor it through cyberspace without misinterpretation and miscommunication in the system. This paper aims to provide a survey on the state of the art of available solutions to the semantic interoperability problem in CPS, integrating semantic models, ML, or both technologies combined in a reference architecture to achieve visionary Interoperable CPSs. Keywords Interoperability, Cyber Physical Systems, Machine Learning, Digital Twin, Semantic models, Ontology SIoT-2022: International Workshop on Semantic IoT (SIoT-2022), Co-located with the KGSWC-2022, November 21-23, 2022, Madrid, Spain. ∗ Corresponding author. † These authors contributed equally. Envelope-Open hafidi.mohamedmadani@univ-khenchela.dz (H. M. Madani); meriem.djezzar@univ-khenchela.dz (M. Djezzar); mounir.hemam@univ-khenchela.dz (H. Mounir); zianou.ahmed.saghir@univ-khenchela.dz (A. S. ZIANOU); moufida.maimour@univ-lorraine.fr (M. MAIMOUR) Orcid 0000-0003-2431-4816 (H. M. Madani) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1. Introduction The need to increase the product effectiveness, quality, and customization of artificial products made humans develop and produce new technologies suitable for the elaboration of production processes that humans are experiencing today [1]. Since the 1980s, and with the third industrial revolution, the integration of Artificial Intelligence (AI) and manufacturing made the rise of intelligent manufacturing [2]. introducing new technologies such as robots, Computer Numeric Control (CNC) machines, and industrial and electronic automation [3]. As AI evolves, smart technologies such as the IoT, semantics and data models, Machine Learning (ML) algorithms, CPSs, and DTs are taking intelligent manufacturing to a new digital age known as the fourth industrial revolution, also called Industry 4.0 (I4.0) [4], which is an initiative that started in Germany to automate production systems efficiently. The main core of I4.0 is the CPS concept, represented as a multidimensional complex distributed system integrating a cyber and a dynamic physical world interconnected in an intensive network providing many functionalities and services such as: real-time sensing, monitoring, and real- time interaction between entities from both worlds, through the collaboration of computing, communication and control [5]. The main focus of CPS is to network devices of I4.0 by handling the interaction between physical reality and computing infrastructures with the use of a communication interface. DT is the virtual representation of a physical component, it mirrors the status and the activity of the physical entity by collecting data from the physical environment using IoT devices and sending it to software systems in order to be analyzed. AI-ML techniques are one of the most promising and advanced technologies used to create and deploy DTs that mimic the behavior of physical components. With the use of communication technologies, DTs and IoT. CPS can be deployed and imple- mented with respect to the vision of I4.0. But, the massive number of communication between interconnected physical components and digital systems in Cyber Physical Systems (CPSs) implies a huge exchange of data, mostly generated, sensed, and shared from different physical and digital entities in an autonomous way which results in heterogeneity and interoperability problems, making it hard and barely impossible to implement this new generation of systems in a full scale without solving and filling the requirements of interoperability and heterogeneity in communication. Semantic data models technologies were proposed and used across academia in order to solve semantic interoperability problems faced at the deployment of CPSs, combined with the use of ML techniques to model and mirror the physical components of the system in the digital space. This article is organized as follows. Section 2 provides a brief presentation on I4.0 and CPS, listing their key concepts and technologies, and focusing on the CPS reference architectures. Section 3 outlines different CPS proposed architectures and solutions integrating semantic models and ML algorithms and performs a brief discussion for each solution. Section 4 concludes with open issues and future direction for researchers. 2. Background This section covers the different necessary concepts related to the topic to better understand the proposed survey. It presents a brief definition of the CPS concept, going through its most known reference architectures in academia, and then a comparison between reference architectures is discussed. Finally, related concepts of I4.0 and semantic heterogeneity are defined. 2.1. Cyber Physical Systems The CPS concept was presented for the first time in 2006 by Helen Gill from the National Science Foundation (NSF) in United States [6]. The concept was introduced as a prospective shift towards the next generation of engineered systems using future networking and information technology (NIT) [7]. Many funding programs were created to promote and accelerate the evolution in this field, CPS-VO is a US organization created to link between CPS professionals in academia, government, and industry all around the world [7]. Another foundation was created in Europe named ARTEMIS, its main goal is to develop and accelerate research on the CPS concept, by creating a partnership between European counties and Industry companies with the goal to achieve a smart and physical aware world connecting all machines, objects, and systems in a network with the use of digital information around them to communicate and collaborate together [7]. Cyber Physical Systems (CPSs) were defined differently by the scientific community. In [8], CPS is defined as: “integration of computation with physical processes”. [9] describes CPS as: “physical and engineered systems, whose operations are monitored, coordinated, controlled, and integrated by a computing and communicating core”. From Gill’s perspective [10], CPS are: “physical, biological, and engineered systems whose operations are integrated, monitored, and/or controlled by a computational core. Components are networked at every scale. Computing is deeply embedded into every physical component, possibly even into materials. The computational core is an embedded system, usually demands a real-time response, and is most often distributed”. CPS is defined in a generic way as software networks of computer systems and physical processes, with feedback loops, used to monitor and control the physical process [6]. CPS links each physic component to its DT, enabling knowledge sharing and fast decision-making. through the integration of computing, communication, and control, making it possible to have an interactive industrial environment [11]. Unlike embedded systems, CPS focus on the interaction between various digital systems and their physical twins, by handling with a control unit sensors and actuators that influence the physical space, and process the data produced and exchanged through a communication interface [11]. An important characteristic of CPS is the ability to send and obtain information and services from different devices to other systems autonomously. So it is necessary to ensure its reliability, efficiency, and security in communication [12]. That is why one of the main requirements of I4.0 is achieving high interoperability between CPS components. 2.2. IoT, Industry 4.0 and Cyber Physical Systems Although CPS was presented in 2006, it became a popular trend due to the advancement in information technology and IoT, which play a big role in collecting real-time data from the physical world using sensor technology and sharing it with the digital world. The physical- to-digital connection uses communication technology that enables transmitting data of the physical components to its virtual twin, by storing it in databases, with the use of WIFI and cellular networks and their protocols …[13]. The virtual twin converts the data stored using information and data technologies such as knowledge representation and reasoning to extract explicit-implicit information and use it to mirror the functioning of the physical twin, which makes it possible to monitor and predict the status of the physical environment from the virtual world [14]. The digital-to-physical connection is represented by the information circulating from the virtual to the physical world, this information may influence the state of the physical components by changing its parameters or by executing a task to achieve a certain goal (prognostic, diagnosis, optimization). Both data and Information circulating between the physical and digital twins are stored in historical databases in order to be processed and analyzed to take final decisions about optimization, prognostic and diagnosis with the use of different machine learning algorithms, and semantic web technologies with reasoning [14]. Many benefits can be obtained from implementing IoT and CPS in Industry 4.0, like achieving better use and performance of assets, minimization of the time and cost of producing assets, optimization of the product life cycle, faster decision making, improving mass customization and production in manufacturing [15, 16], monitoring and connecting several manufacturing machines in a smart environment (software networks) [17]. Despite all the advantages, implementing IoT combined with CPSs in the vision of Industry 4.0 stays a complex mission because of the huge number of different architectures and approaches of digitalization and networking adopted by enterprises that makes communication and systems interoperability problems rise [18]. 2.3. 5C Architecture The 5C Architecture of CPS was proposed by Lee in [20]. It consists of a unified 5-level architecture that serves as a step-by-step guide for developing and implementing CPS in manufacturing. It is considered a well-known reference data-driven model widely adopted in developing CPS since it focuses on adopting and defining the data flow in these systems from the initial data acquisition, to analytics, until the decision making, as shown in Figure 1. The different levels of the architecture are outlined in Table 1. 2.4. RAMI 4.0 RAMI 4.0 stands for Reference Architectural Model and it consists of a three-dimensional model that describes all aspects of I4.0 space [21]. In this way, complex interrelations can be broken down into smaller and simpler clusters. The RAMI 4.0 is composed of: • The “Hierarchy Levels” axis: on the right horizontal axis in Figure 2 shows the hierarchy levels from IEC 62264 and IEC 61512, the international standards series for enterprise IT Figure 1: 5C Architecture of CPS adapted from [19] Table 1 Different levels of 5C architecture. 5C Level Role Smart Connection This level is responsible for data acquisition from different interconnected physical world components with effective management and transfer to the central server. Physical component identification and specification are required. Data to Information This level manages the conversion of data collected from the previous step to Conversion information. This step brings self-awareness to machines. Cyber This level uses inferred information from the previous step to create Digital Twins (virtual representation) of physical components of the real world. It is considered as the center that guarantees communication between physical assets since all inferred information of collected data is sent to it. Cognition This level uses information gathered to acquire proper knowledge to monitor the system and make prognostics for failure prediction and maintenance optimization. Configuration at this level, operations are sent as feedback from cyberspace to the physical world to control machines, making the physical space self-configure and self-adaptive and control systems. These hierarchy levels (Enterprise, Work Centers, Station, Control Device) represent the different functionalities within factories or facilities. Other layers (product, field device) were added to the hierarchy levels to support the representation of the I4.0 space. The top layer labeled “connected world” serves to connect and reach external partners through service networks. • The “Life Cycle and Value Stream” axis: The left horizontal axis represents the life cycle of facilities and products, based on IEC 62890 for life-cycle management. A distinction is made between the two levels “types” and “instances”. When the product is in design and development phase. We are in the “type” level. When designing the actual product have been completed and production starts, we are in the “instance” level. • The “6 Layers” axis: represented by the vertical axis serves to describe physical ma- chines and objects from real world: The first layer labeled “Asset” consists of defining the physical objects and their properties, such as parts, documents, diagrams, humans, etc…. The second layer named “integration” maps the physical objects to the digital world components, it serves as a link between physical and digital worlds where transforma- tions and easy processing of information happens. The third layer “Communication” is responsible for providing standardized communication between the integration layer and the information layer, the standardization uses a unified data format that serves to represent data in an organized way in the information layer, and to process and transform the data in the integration layer so it can link physical and digital worlds together. The fourth layer “Information Layer” has the role to store and hold data in an organized way, its main purpose is to provide relevant information about products from the data stored (such as statistics on the number of sales, and the number of objects produced). It can also give the machines used in a production of a product or information about customers and their feedback…etc. The “functional” layer has the goal to take actions, coordinate, and select components to proceed with tasks, it involves various activities that can be made in the system like authentication, user inputs, remote access…etc. The last layer named “business layer” contains the business strategy, business environment, and management. It deals with several business activities in order to achieve the business plan model. 2.5. IIRA The Industrial Internet Reference Architecture (IIRA) is an open cross-industry architecture developed by IIC based on IIoT standards [23], emphasizing interoperability among industries such as manufacturing, energy, healthcare, and transportation. This model is organized in five Viewpoints [24] presented as layers in IIRA, each layer is linked to certain functionalities: • Business layer: Functions that enable end-to-end tasks in an industrial system (e.g., enterprise resource planning, life cycle management, planning, and scheduling); • Application layer: Functions that enable business operations by implementing application logics (e.g., activity/workflows, application programming interface, user interface). • Information layer: Functions that handle data gathering and deployment (e.g., data collection and storage, semantics, quality processing). • Operations layer: Functions that enable operations related to component monitoring and diagnosis throughout their life cycle (e.g., provisioning and deployment, optimization). • Control layer: Functions that enable the control of an industrial system (sensing and actuation, communication, abstraction, digitalization, analytics, asset management). 2.6. Comparison between Architectures This subsection consists of analyzing the similarities and differences between the architecture reference models described in previous subsections, highlighting their main goals and relations Figure 2: RAMI 4.0 Architecture adapted from [22] that make them interoperable with each other. All these architecture reference models define and represent CPS concepts, but their proposals lie in targeting different goals [24]. At first, the 5C architecture targets modeling and describing assets data collection and processing, mainly found in small smart industries and IoT environments. It is considered the first architecture published in the literature and it is adapted only for horizontal integration. RAMI 4.0 was proposed to describe and model CPS systems in the I4.0 scenario. It focuses on defining manufacturing assets operations and it focuses on describing in detail CPS design, control, communication, and business by integrating the value chain company and product life-cycle. With IIoT proposal as a highlight, IIRA is based on the ISO/IEC/ IEEE 42010 and describes the development plan to create the IIoT system. It deeply focuses on the IIoT system as a core concept in all sectors, such as product life-cycle from design to maintenance and control[24]. As we can see from the architecture reference models, several layers from RAMI 4.0 archi- tecture are similar in functionalities with levels in the 5C and IIRA architectures. These layers and levels can be mapped together ensuring possible interoperability establishment between different architectures. 2.7. Semantic Models Semantic models have been considered an important technology to achieve interoperability between different systems [25]. They have the power to model and describe the properties and relationships between concepts and entities. One of the most used semantic models in research is ontology. An ontology is defined as an ‘explicit specification of a conceptualization of a domain’ [26]. It is used to represent in a generic way a domain knowledge, by encoding it in the form of axioms, natural language labels, synonyms, definitions, and other types of annotation properties making it possible to achieve an agreed understanding between applications [27]. Ontology itself is relatively complex concept that requires a special set of expertise to involve and maintain it [28]. Most of the ontologies are encoded using the Ontology Web Language (OWL) since it’s more expressive than other ontology languages like Resource Description Framework Schema (RDFS). OWL is a part of the Semantic Web stack and it is based on Description Logic [29, 30]. 2.7.1. RDF RDF is a model for encoding semantic relationships between items of data so that these rela- tionships can be interpreted computationally. It is considered the primary foundation for the Semantic Web. 2.7.2. RDFS RDFS is a set of classes with certain properties using the RDF extensible knowledge represen- tation data model, providing basic elements for the description of ontologies. It uses various forms of RDF vocabularies, intended to structure RDF resources. 2.7.3. OWL OWL is a semantic markup language designed to represent complex knowledge as concepts and their relationships. It is used to publish and share ontologies. It is developed as a vocabulary extension of RDF. [31] 2.7.4. Semantic Heterogeneity The semantic heterogeneity concept does not have a unique definition [32]. Several definitions that provide a certain degree of understanding of the term were found in the literature : • According to Merriam-Webster dictionary, semantic heterogeneity is defined as a quality or a state of being made up of parts that are different — related to the meanings of words and phrases. • From [33], the term is presented as differences in the meaning and use of data that make it difficult to identify the various relationships that exist between similar or related objects in different components. • Semantic heterogeneity can be defined also as differences in the real-world interpretation of context, meaning, and use of data [34]. 2.8. Digital Twin DT concept refers to the virtual representation of a physical entity, including all its properties and functioning. the term was first coined in 2011 by John Vickers. The concept was defined by its pioneers Grieves and Vickers [35] as ”a set of virtual information constructs that fully describes a potential or actual physical manufactured product from the micro atomic level to the macro geometrical level. At its optimum, any information that could be obtained from inspecting a physically manu- factured product can be obtained from its DT”. It is mainly used to mirror the physical states of a manufactured product and to help users predict properties or take action from the digital space. We can define Digital Twinning as a process that involves a physical entity, a cyber twin that mirrors the physical entity and a physical connection (communication interface) used to share and translate data between them, as shown in Figure 3. Figure 3: Digital Twining concept schema drawn from [36] 2.9. AI-ML AI can be defined as the ability to reproduce intelligence reasoning by computer systems for decision-making purposes. Machine Learning (ML) is a subset of AI. ML can be defined as the set of techniques that allow a machine to learn, solve or perform a task without having to program it explicitly. This set of techniques concerns the analysis, design, development, and implementation of methods allowing the machine to follow a systematic process to solve a problem, where it will be difficult or impossible to do so by a classical algorithmic method [37]. The ability to make a machine learns to perform a task requires a set of data that contains a lot of information specific to the latter. The techniques of ML analyze and process all these data in order to extract the knowledge and features found and then apply and reuse them on new data to solve real-life problems. ML techniques are usually categorized into three methods namely supervised, unsupervised, and Reinforcement Learning (RL) [37]. 2.9.1. Supervised Learning Supervised learning is employed when the training dataset is labeled. Each sample of data is associated with the desired result (output). The goal of the ML algorithm is to find a function that maps the input data to the output. Unseen data can be fed as input to a trained model in order to predict and map it to the most relevant output. 2.9.2. Unsupervised Learning Unsupervised learning is used in situations where the training data that contains the different examples are not labeled in advance. Unsupervised learning consists in partitioning the examples of the training data into categories based on chosen similarity criteria. It allows the automatic construction of classes without any intervention, but it requires a good estimation of the number of classes. 2.9.3. Reinforcement Learning RL is a framework for solving control tasks and decision-making by building agents that learn from the environment by interacting with it through trial and error and receiving positive and negative rewards for correct and incorrect performances respectively as feedback. The ultimate goal of the agent is to maximize its reward in any given situation. 3. Semantic Models, AI-ML, DT Integration in Cyber-Physical Systems In this section, we discuss the current deployments and implementation of CPS developed with the integration of Semantic models technologies, AI-ML techniques, and DT creation. Table 2 outlines the selected works from academia to be discussed and the technologies integrated into their deployments. Table 2 Selected papers and the integrated technologies. Ref Semantic models AI-ML DTs integration [38] ! ! [16] ! ! [39] ! ! [40] ! ! [41] ! ! [15] ! ! [42] ! ! [43] ! ! [44] ! [45] ! ! [46] ! ! [47] ! ! [48] ! [49] ! ! [50] ! [51] ! [52] ! ! 3.1. Semantic Models Integration in CPS Semantic models can serve to bridge the gap between different project participants in CPSs by assuring shared understandable information. In [38], the authors show how machine-to-machine communication is important for collab- orative manufacturing automation (smart manufacturing). The article introduced a concept of semantic aware CPS based on industrial machines that can perform semantic machine-to- machine communications. The authors proposed a semantic and communication layer on top of the physical and cyber layers of a CPS. The semantic layer provides logical communication between CPSs from different organizations without the need to be attached together but acts as if they were by converting the messages exchanged into semantic expressions. The communica- tion layer then transfers the semantic expressions between CPSs as web requests using HTTP network protocols. Different engineering models are created to illustrate different components and tools in CPSs where each engineering model uses local terminologies related to its engineering environment. Semantic models can be used as an integration tool of these different models together by creating ontologies and aligning them at terminological and instance levels to achieve a good representative DT of the physical system. Each ontology brings specific semantics about each physical component aspect and provides it to its DT to fulfill and reach a certain goal. Semantic models can be classified and differentiated according to the semantics they define as: • Geometric model: describes the geometric properties (shape, size) of the physical entity. • Physical model: describes the different parts or the abilities of the physical entity (composition, capacity…). • Behavioral model: refers to the behavior of the physical entity communicating with other entities. • Rule Model: defines the relation between domain knowledge concepts (constraints, associations, deduction, negation…). • Process Model: describes the underlying process and function in which the physical entity takes part in the CPS. In the operational phase of CPSs. The DT receives collected data from sensing the physical environment. This collected data comes from different components and describes different properties. Semantic models are used to describe and represent the collected data appropriately, making it possible to visualize and monitor dynamic systems. A semantic model based on an extension of IoT-Lite ontology is proposed in [16] to represent and integrate digital twins of devices in an IIoT system. The semantic model focuses on describ- ing data information of the physical abilities collected from IoT devices and the relationships between them in a high-level form so it can be represented and interpreted by digital twins of the IoT system. The semantic model offers the ability to show information to end users in a better manner. Moreover, semantic models can extract implicit information about physical objects by reasoning about the relations between objects in the knowledge base. This kind of information can help to detect early faults and error occurrences [53]. For example, a DT for part assembly is created based on an ontology that describes geometrical information about parts and constraints in assembly units [15], the expressiveness of the ontology made it possible to reason and infer relations between concepts of the domain knowledge and define assembly requirements between parts. In Smart manufacturing, [39] proposed a 4-layer architecture based on ontology to manage and reconfigure resources in smart manufacturing. The ontology describes manufacturing resources and their properties using OWL. A rule base model is used to reason about manufac- turing resources ontology and to reconfigure their status. A reconfiguration of an intelligent manipulator is implemented as a use case to show the feasibility of this architecture. Another research in the assembly field [40] presented an ontology-based model of a DT for assembly workshop. The ontology of the DT assembly workshop is built to describe the objects, attributes, and relationships that exist in the assembly workshop and participate in the assembly behaviors. A model of the assembly process is defined as an occurrence of several events in a range of time, a representation of event ontology is presented as an event-oriented description logic to act as a logical base of event ontology language. The ontology of the DT assembly workshop with the assembly process model permits mirroring and monitoring the assembly workshop using the event ontology language. The study in [41] presented a DT modeling architecture for manufacturing processes using multi-agent systems, Material– Process–Functions-Quality (MPFQ) model, and semantic models to manage knowledge. Semantic models were used to properly define the knowledge and help multi-agent systems to better interact and interpret shared information, enabling semantic interoperability. Another study [42] discussed the challenges faced in dealing with DT data management (data variety, data mining) and their influence on DT dynamics, it proposes a novel concept of DT ontology model and methodology to address these data management challenges. The DT ontology model models the conceptual knowledge of the DT domain. Using the proposed methodology, such domain knowledge is transformed into a minimum data model structure to map, query, and manage databases for DT applications. The research is tested using a case study based on Condition Based Monitoring (CBM). From another perspective of modeling approaches, the research in [43] proposes a semantic modeling approach based on high-level architecture and GOPPRR for DT integration. The semantic modeling approach serves as a tool to create DT ontology that describes different heterogeneous properties of the physical twin to the digital model. 3.2. ML and DT integration in CPS ML techniques are a key enabler technology in CPSs, they can be applied in both the engineering and operational phase. DT can mirror and replicate any physical object in cyberspace, it provides also a feedback mechanism to control and monitor the physical object. ML algorithms are known for their high performance in decision-making applications. In monitoring CPSs, anomaly detection is a required task to control the physical object’s state and its behavior. In [45], ML techniques were used to detect anomalies in CPSs. A novel approach named Anomaly DeTection with digiTAl twIN (ATTAIN) is presented in this research paper. A Timed Automation Machine (TAM) is built to represent the DT of the CPS. Generative Adversarial Network (GAN) techniques are used to detect anomalies in the CPS. A generator model is used to capture the characteristics and features of the input data and learn to generate realistic unlabeled samples with the same features. TAM labels the samples produced by the generator and feed them to a discriminator with real labeled samples. The discriminator is trained to distinguish normal data from anomalous data. The use of DT with ML in this approach gave better results in anomaly detection compared to approaches not using DT. ML techniques are showing impressive performance at detecting cyber-security attacks in CPSs, the research in [44] discusses and implements a comparative study on the use of supervised ML techniques to detect cyber-attacks in CPSs. A case study using a dataset of a secure water treatment plant is selected to perform cyber attack detection using ML techniques, results showed impressive accuracy on the trained dataset. Sensor data collected from the environment and the machines can be used also to detect failure and errors by training a ML model from historical diagnosis data of the system or from historical machine data to predict failures and errors in the machines and the system. But, training an ML model with unbalanced or insufficient data can lead to errors when the model is used with real-world data. The performance of the ML model depends on the quality of training data and its volume. In [46]. An architecture that uses a discrete physics-based computational model with ML techniques to create a DT for investigating several damaged structure scenarios is presented. A ML classifier that represents the DT of the damaged structure is trained with data taken and generated from a stochastic computational model that simulates the damaged structure scenarios, the ML classifier learns to detect damaged structures and serves to warn the user of the location of the damage. The DT is then connected to the physical entity to enable the use of real-time decision-making. A use case of a wind turbine is modeled with physics-based models and data generated from it was used to train a ML classifier to detect damages in the turbine structure. Results showed a low accuracy in detecting damages. In [47], a study discusses the creation of a DT with the use of the petrochemical industrial IoT, ML connected in a loop with the physical factory to exchange information in real-time, realizing production control optimization. This approach optimizes DT models by applying real-time big data analysis with ML, it supports petrochemical processes and other manufacturing systems to dynamically adapt to the changes happening in the environment, taking into account time lags between time series data and reduction of data dimensionality. Several ML algorithms were tested and trained using data gathered from the industrial IoT system. A case study in a real petrochemical factory was used to examine the effectiveness of this approach. Moreover, the authors in [48] presented a compositional falsification framework for Signal Temporal Logic (STL) specifications against CPS-ML models based on a decomposition between the analysis of ML components and the system containing them. A ML analyzer was developed that can abstract feature spaces, and approximate ML classifiers, providing some miss-classified feature vectors to be used in the falsification process. A case study of autonomous driving cars is implemented, the proposed framework using DNN showed effective results. In another context, a DT architecture reference model using the state machine technique to design cloud-based CPSs is proposed [49]. Every physical thing is represented by a cloud-based DT of seven elements (sensors, actuators, functions, events, data storage, network, power unit). Sensing or actuating the physical thing is considered an event. All the data gathered by the smart things are stored at different levels of storage from mobile stationary to the cloud-based data center. Bayesian networks and fuzzy logic were used in this article to create a system that selects system modes of interaction based on event-triggered and physical things status. In [50], The authors investigated and tested the Inductive Conformal Prediction (ICP) frame- work in CPS with ML components for assurance monitoring in real time. ICP framework provides predictions with well-calibrated confidence, these predictions are combined with a monitor that ensures a small error rate and limits the number of high-dimensional inputs at Deep Neural Network (DNN) model in cases where an accurate prediction cannot be made. Tests were made on two different datasets, different neural network architectures were used and different non-conformity functions were implemented. Results showed that the proposed architecture of using the ICP framework combined with DNN model can minimize the number of alarms due to predictions with multiple classes. In Task learning, AI-ML algorithms such as RL can provide generalized task policy by training an agent to learn the task through different knowledge domains bases demonstrating the task. The authors in [51] proposed a neural network as a control strategy for Connected Vehicles (CV), the neural network model adjusts the speed of the following vehicle taking into account the distance, deceleration, and speed of the leading vehicle. In [52], the authors proposed an algorithm based on RL that uses data from the manufacturing system to fix errors and differences in data while representing the DT of the CPS. The DT acts as the agent of the RL algorithm to ensure minimal policy. The algorithm proposed was tested in sheet metal assembly. 4. Conclusion Given the different architecture proposals that integrate semantic models to ensure interoper- ability in communication between Cyber-Physical spaces and ML techniques to create virtual replicas of the physical world components, CPSs are a complex concept to design and implement, semantic models such ontologies can simplify the representation of data in a good manner and facilitate the construction of a DT of the physical reality. The use of DT can increase the performance of CPSs effectiveness and simplifies their monitoring to end users by providing an appropriate digital representation of the physical system. The use of ontology makes data management easy and communication easier, data and information about concepts are well-defined by properties and information flow between concepts is highlighted with relationships that ensure good communication and interpretation of all the data in the digital space. Several tasks can be accomplished such as anomaly detection with the use of an Ontology, a rule base can be used to reason about the ontology and extract implicit information about the physical system. ML techniques serve in both controlling and monitoring the CPS. They can be a tool for constructing a DT by training a model from data gathered or generated from cyber-physical spaces. The DT-trained model predicts the behavior of the physical system from learning patterns using historical data gathered. Generating data from simulation using physics-based computational models and feeding it to ML models can bring significance to results obtained from these ML models, by analyzing and mapping the results obtained from the trained model to physics-based parameters. One of the relevant challenges in implementing ML in CPS is that data generated from the simulation are not sufficient. The use of a hybrid way to generate samples with the same features characteristic of real data gathered from reality and similar in distribution is required, at Last, ML techniques can approximate the behavior of the real system but there will be always a difference between real data and simulated data from DT. Ontology modeling with ML techniques can be used to analyze and represent the semantic mismatch between real data and simulated data by calculating similarities and comparing their features and distribution. We believe that semantic models, DT, and ML together can serve as a bridge to create the visionary CPS and ensure knowledge interoperability when all three technologies are applied in a good way and in a refined architecture. This survey performed a review of the CPS reference architectures (5C, RAMI 4.0, and IIRA) with a comparison between them. Furthermore, this article proposed a literature review of CPS architecture projects and an analysis of the use of semantic models and ML employed in CPS engineering and implementation. For future works, combining semantic models and ML techniques together following I4.0 reference architectures design as an inspiration to implement CPS is considered, since it could bring a high level of integrity to these systems. In addition, an exhaustive and detailed archi- tecture that takes into account all aspects of designing CPS using ML and Ontology together must be explored and defined as a unified standard solution to ensure interoperability between different industrial systems and achieve I4.0 vision. References [1] F. Yang, S. Gu, Industry 4.0, a revolution that requires technology and national strategies, Complex & Intelligent Systems 7 (2021) 1311–1325. [2] J. Zhou, P. Li, Y. Zhou, B. Wang, J. Zang, L. Meng, Toward new-generation intelligent manufacturing, Engineering 4 (2018) 11–20. [3] G. Schuh, T. Potente, R. Varandani, C. Hausberg, B. Fränken, Collaboration moves pro- ductivity to the next level, Procedia CIRP 17 (2014) 3–8. URL: https://www.sciencedirect. com/science/article/pii/S2212827114003709. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . p r o c i r . 2 0 1 4 . 0 2 . 0 3 7 , variety Management in Manufacturing. [4] P. Dallasega, E. Rauch, C. Linder, Industry 4.0 as an enabler of proximity for construction supply chains: A systematic literature review, Computers in industry 99 (2018) 205–225. [5] F. Tao, Q. Qi, L. Wang, A. Nee, Digital twins and cyber–physical systems toward smart manufacturing and industry 4.0: Correlation and comparison, Engineering 5 (2019) 653–661. [6] E. A. Lee, S. A. Seshia, Introduction to embedded systems: A cyber-physical systems approach, Mit Press, 2016. [7] V. Gunes, S. Peter, T. Givargis, F. Vahid, A survey on concepts, applications, and challenges in cyber-physical systems, KSII Transactions on Internet and Information Systems (TIIS) 8 (2014) 4242–4268. [8] E. A. Lee, Cyber physical systems: Design challenges, in: 2008 11th IEEE international symposium on object and component-oriented real-time distributed computing (ISORC), IEEE, 2008, pp. 363–369. [9] R. Rajkumar, I. Lee, L. Sha, J. Stankovic, Cyber-physical systems: the next computing revolution, in: Design automation conference, IEEE, 2010, pp. 731–736. [10] H. Gill, A continuing vision: Cyber-physical systems, in: Fourth annual Carnegie Mellon conference on the electricity industry, 2008. [11] N. Carvalho, O. Chaim, E. Cazarini, M. Gerolamo, Manufacturing in the fourth industrial revolution: A positive prospect in sustainable manufacturing, Procedia Manufacturing 21 (2018) 671–678. [12] G.-J. Cheng, L.-T. Liu, X.-J. Qiang, Y. Liu, Industry 4.0 development and application of intelligent manufacturing, in: 2016 international conference on information system and artificial intelligence (ISAI), IEEE, 2016, pp. 407–410. [13] A. Redelinghuys, A. Basson, K. Kruger, A six-layer digital twin architecture for a manufac- turing cell, in: International Workshop on Service Orientation in Holonic and Multi-Agent Manufacturing, Springer, 2018, pp. 412–423. [14] R. van Dinter, B. Tekinerdogan, C. Catal, Predictive maintenance using digital twins: A systematic literature review, Information and Software Technology (2022) 107008. [15] Q. Bao, G. Zhao, Y. Yu, S. Dai, W. Wang, Ontology-based modeling of part digital twin oriented to assembly, Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture 236 (2022) 16–28. [16] C. Steinmetz, A. Rettberg, F. G. C. Ribeiro, G. Schroeder, C. E. Pereira, Internet of things ontology for digital twin in cyber physical systems, in: 2018 VIII Brazilian symposium on computing systems engineering (SBESC), IEEE, 2018, pp. 154–159. [17] M. Kherbache, M. Maimour, E. Rondeau, When digital twin meets network softwarization in the industrial iot: Real-time requirements case study, Sensors 21 (2021) 8194. [18] Z. Huang, C. Jowers, D. Kent, A. Dehghan-Manshadi, M. S. Dargusch, The implementation of industry 4.0 in manufacturing: from lean manufacturing to product design, The International Journal of Advanced Manufacturing Technology 121 (2022) 3351–3367. [19] R. M. de Salles, F. A. Coda, J. R. Silva, D. J. dos Santos Filho, P. E. Miyagi, F. Junqueira, Requirements analysis for machine to machine integration within industry 4.0, in: 2018 13th IEEE International Conference on Industry Applications (INDUSCON), IEEE, 2018, pp. 1237–1243. [20] J. Lee, B. Bagheri, H.-A. Kao, A cyber-physical systems architecture for industry 4.0-based manufacturing systems, Manufacturing letters 3 (2015) 18–23. [21] K. Schweichhart, Reference architectural model industrie 4.0 (rami 4.0), An Introduction. Available online: https://www. plattform-i40. de I 40 (2016). [22] M. Hankel, B. Rexroth, The reference architectural model industrie 4.0 (rami 4.0), ZVEI 2 (2015) 4–9. [23] S.-W. Lin, B. Miller, J. Durand, R. Joshi, P. Didier, A. Chigani, R. Torenbeek, D. Duggal, R. Martin, G. Bleakley, et al., Industrial internet reference architecture, Industrial Internet Consortium (IIC), Tech. Rep (2015). [24] M. Moghaddam, M. N. Cadavid, C. R. Kenley, A. V. Deshmukh, Reference architectures for smart manufacturing: A critical review, Journal of manufacturing systems 49 (2018) 215–225. [25] M. Djezzar, M. Hemam, M. Maimour, F. Z. Amara, K. Falek, Z. A. Seghir, An approach for semantic enrichment of sensor data, in: 2018 3rd International Conference on Pattern Analysis and Intelligent Systems (PAIS), IEEE, 2018, pp. 1–7. [26] T. R. Gruber, Toward principles for the design of ontologies used for knowledge sharing?, International journal of human-computer studies 43 (1995) 907–928. [27] F. Ortiz-Rodriguez, S. Tiwari, R. Panchal, J. M. Medina-Quintero, R. Barrera, MEXIN: Multidialectal Ontology Supporting NLP Approach to Improve Government Electronic Communication with the Mexican Ethnic Groups, 2022, p. 461–463. [28] A. Nikiforova, S. Tiwari, V. Rovite, J. Klovins, N. Kante, Evaluation and visualization of healthcare semantic models, Evaluation 323 (2020) 91773–5. [29] J. Domingue, D. Fensel, J. A. Hendler, Handbook of semantic web technologies, Springer Science & Business Media, 2011. [30] M. Hemam, M. Djezzar, Z. Boufaida, Multi-viewpoint ontological representation of com- posite concepts: a description logics-based approach, International Journal of Intelligent Information and Database Systems 10 (2017) 51–68. [31] M. M. Taye, Understanding semantic web and ontologies: Theory and applications, arXiv preprint arXiv:1006.4567 (2010). [32] V. Jirkovskỳ , M. Obitko, V. Mařík, Understanding data heterogeneity in the context of cyber-physical systems integration, IEEE Transactions on Industrial Informatics 13 (2016) 660–667. [33] J. Hammer, D. McLeod, An approach to resolving semantic heterogeneity in a federation of autonomous, heterogeneous database systems, International Journal of Intelligent and Cooperative Information Systems 2 (1993) 51–83. [34] D. George, Understanding structural and semantic heterogeneity in the context of database schema integration, Journal of the Department of Computing, UCLAN 4 (2005) 29–44. [35] M. Grieves, J. Vickers, Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems, in: Transdisciplinary perspectives on complex systems, Springer, 2017, pp. 85–113. [36] E. Fontes, Digital twins and model-based battery design, 2019. [37] M. HAFIDI, Contribution au dépistage intelligent du cancer du sein basé sur la thermogra- phie médicale (2020). [38] Y. Lu, M. R. Asghar, Semantic communications between distributed cyber-physical systems towards collaborative automation for smart manufacturing, Journal of manufacturing systems 55 (2020) 348–359. [39] J. Wan, B. Yin, D. Li, A. Celesti, F. Tao, Q. Hua, An ontology-based resource reconfigu- ration method for manufacturing cyber-physical systems, IEEE/ASME Transactions on Mechatronics 23 (2018) 2537–2546. [40] C. Zhang, W. Xu, J. Liu, Z. Liu, Z. Zhou, D. T. Pham, A reconfigurable modeling approach for digital twin-based manufacturing system, Procedia Cirp 83 (2019) 118–125. [41] X. Zheng, F. Psarommatis, P. Petrali, C. Turrin, J. Lu, D. Kiritsis, A quality-oriented digital twin modelling method for manufacturing processes based on a multi-agent architecture, Procedia Manufacturing 51 (2020) 309–315. [42] S. Singh, E. Shehab, N. Higgins, K. Fowler, D. Reynolds, J. A. Erkoyuncu, P. Gadd, Data management for developing digital twin ontology model, Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture 235 (2021) 2323–2337. [43] H. Li, J. Lu, X. Zheng, G. Wang, D. Kiritsis, Supporting digital twin integration using semantic modeling and high-level architecture, in: IFIP International Conference on Advances in Production Management Systems, Springer, 2021, pp. 228–236. [44] P. Semwal, A. Handa, Cyber-attack detection in cyber-physical systems using supervised machine learning, in: Handbook of Big Data Analytics and Forensics, Springer, 2022, pp. 131–140. [45] Q. Xu, S. Ali, T. Yue, Digital twin-based anomaly detection in cyber-physical systems, in: 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), IEEE, 2021, pp. 205–216. [46] T. Ritto, F. Rochinha, Digital twin, physics-based model, and machine learning applied to damage detection in structures, Mechanical Systems and Signal Processing 155 (2021) 107614. [47] Q. Min, Y. Lu, Z. Liu, C. Su, B. Wang, Machine learning based digital twin framework for production optimization in petrochemical industry, International Journal of Information Management 49 (2019) 502–519. [48] T. Dreossi, A. Donzé, S. A. Seshia, Compositional falsification of cyber-physical systems with machine learning components, Journal of Automated Reasoning 63 (2019) 1031–1053. [49] K. M. Alam, A. El Saddik, C2ps: A digital twin architecture reference model for the cloud-based cyber-physical systems, IEEE access 5 (2017) 2050–2062. [50] D. Boursinos, X. Koutsoukos, Assurance monitoring of cyber-physical systems with machine learning components, arXiv preprint arXiv:2001.05014 (2020). [51] A. Sargolzaei, C. D. Crane, A. Abbaspour, S. Noei, A machine learning approach for fault detection in vehicular cyber-physical systems, in: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, 2016, pp. 636–640. [52] C. Cronrath, A. R. Aderiani, B. Lennartson, Enhancing digital twins through reinforce- ment learning, in: 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), IEEE, 2019, pp. 293–298. [53] M. Sabou, S. Biffl, A. Einfalt, L. Krammer, W. Kastner, F. J. Ekaputra, Semantics for cyber-physical systems: A cross-domain perspective, Semantic Web 11 (2020) 115–124.