<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>L. Seemann);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards Machine Learning-based Digital Twins in Cyber-Physical Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Felix Theusch</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukas Seemann</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Achim Guldner</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Naumann</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralph Bergmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Artificial Intelligence and Intelligent Information Systems, Trier University</institution>
          ,
          <addr-line>54296 Trier</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>German Research Center for Artificial Intelligence (DFKI), Branch Trier University</institution>
          ,
          <addr-line>54296 Trier</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Software Systems, Trier University of Applied Sciences, Environmental Campus Birkenfeld</institution>
          ,
          <addr-line>55761 Birkenfeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The use of Artificial Intelligence, and especially Machine Learning methods, promise to play key roles in the development of Digital Twins due to their outstanding properties in processing large IoT data streams. However, so far, there is a lack of research on the systematisation of Machine Learning-based Digital Twins (MLDTs) as well as on their methodological development and implementation processes in productive environments. The scientific literature describes various applications of MLDTs - even if they are not called this way - and specialised methods and architectures, but a generic reference model is still missing. Therefore, this paper proposes a systematisation of the characteristics of MLDTs and their specific challenges. Furthermore, a first proposal of a process model for the systematic development of MLTDs according to the Machine Learning Operations (MLOps) paradigm is presented as a tentative instance of a future reference model for MLDTs. We incorporate established software development methods as well as insights gained from the examination of several industrial applications in the field of water resource management, one of which we present during the paper. We expect that the process model allows practitioners to consistently develop and maintain MLDTs and researchers to find potentials and research gaps.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Digital Twin</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Characterisation</kwd>
        <kwd>MLOps</kwd>
        <kwd>Water Resource Management</kwd>
        <kwd>Process Model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Cyber-Physical Systems (CPSs) and their seamless connection of control devices, physical
assets, and IT systems towards an Internet-of-Things (IoT) are current megatrends and promise
improvements in eficiency and resilience in almost all domains [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Recently, the CPS paradigm
has also been rapidly expanded to include approaches to process planning, monitoring, and
control [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. From their first mention at the beginning of the 21st century, Digital Twins have
become a cornerstone of CPSs, providing virtual representations of real physical objects, systems,
or processes. In doing so, they provide the nexus between the physical and digital world [3].
Based on the data gathered from IoT devices and systems, they can perform complex simulations
and data-driven decisions to optimise the control of the physical CPS layer [4]. Meanwhile,
Artificial Intelligence (AI) has become a central enabler in the further development of Digital
Twins, both in research and in industrial applications [5]. Groshev et al. [5] see AI as the central
puzzle piece in the Digital Twin architecture to tackle common challenges within the concept
such as the efective usage of real-time data streams, the fulfilment of safety or performance
requirements, and optimal network usage.
      </p>
      <p>Grieves [3], who was instrumental in establishing the concept of the Digital Twins, identifies
“Intelligent Digital Twins” built on AI technologies as the next evolution of the Digital Twin
paradigm [6]. On closer examination, the main focus is often placed on the use of Machine
Learning (ML) methods and their significant advantages in processing large IoT data sets using
statistical models [7]. Despite all the advantages of using ML in the environment of Digital
Twins, it also leads to specific questions and challenges, for example regarding an efective
Digital Twin ML model management [8], that have only been addressed in a few scientific
publications so far. For instance, there is a lack of characterisation and established methods
to support an efective and qualitative implementation of ML-based Digital Twins and their
productive operation. Therefore, in this paper, we aim to narrow down and characterise
MLbased Digital Twins (MLDTs) in CPSs, based on findings in related literature and investigations
of productive ML-based Digital Twin applications in the water industry. This contributes to
the clarification and definition of the concept of MLDTs and to a deeper understanding of their
properties. Based on best practice examples from a case study in the domain of water resource
management and established software implementation methods, a first outline of a process
model for the development and productive operation of MLDTs is presented.</p>
      <p>The outline of this paper is as follows: Sect. 2 describes the role of AI and ML in the context of
Digital Twins and provides a definition of MLDTs and their characteristics. The use of this type
of Digital Twins is demonstrated in Sect. 3 with a practical example showing the case study of
artificial neural networks (ANN) to control a water distribution network. As the main focus of
this paper, in Sect. 4, we develop a first approach for a process model for the implementation and
operation of MLDTs in the context of CPSs. This process model can be seen as a first instance
of a future reference model for MLDTs.</p>
    </sec>
    <sec id="sec-2">
      <title>2. ML-Based Digital Twins and their challenges</title>
      <p>As already stated in the introduction, technologies from the field of ML are increasingly applied
in the context of Digital Twins. Therefore, we first define the term ML-based Digital Twin, based
on an extensive literature review and our experiences from practical Digital Twin applications.
From this, we derive the most important components of an MLDT and identify its specific
challenges.</p>
      <sec id="sec-2-1">
        <title>2.1. Related Work</title>
        <p>There are many diferent methods for the realisation of Digital Twins, e. g., Geographical
Information System (GIS), Building Information Modelling (BIM), or Computer-Aided Design (CAD)
models [9] or the use of data-related technologies such as OPC UA1 or AutomationML2 in
manufacturing [10] and various frameworks for DTs have been presented (cf. [11, 12]). According to
Tao et al. [13], the modelling approaches of a digital twin can be divided into four dimensions:
A geometric digital twin model is used to describe the geometric properties, whereas a physical
model represents the physical properties, such as fluid dynamics of the real entity. Digital twins
based on a behavioral model represent the dynamic responses of the physical entity to internal
and external mechanisms and uses similar tools to physical modelling, while the rule model
reflects the real world by incorporating historical data to extract tacit knowledge. The latter
include, in particular, digital twins, which are based on machine learning methods and will be
defined in more detail in the remainder of this paper.</p>
        <p>Min et al. [14] propose a framework for ML-based digital production control optimisation in
the petrochemical industry and demonstrate their solution with a case study. Furthermore, they
propose diferent chronological steps to develop an MLDT based on a mathematical simulation
model. According to their concept, the ML-based “Digital Twin Model” is created through model
training and validation based on prepared, historical training data. Ritto and Rochinha [15]
investigate the integration of physics-based models with ML to construct a Digital Twin to
identify structural damage to wind turbines in real time. By transforming the physical models
into an ML model, it is possible to benefit from its performance in processing a large amount of
IoT data.</p>
        <p>The increasing importance of ML for Digital Twins leads to a need for practice-oriented
process models that control the development process of Digital Twins based on ML techniques.
With CRISP-DM and MLOps there are established procedure models from the Data Mining
and ML perspective. So far, only a few works address concrete procedures for Digital Twins
that focus especially on the ML aspect [16]. To the best of the authors’ knowledge, there are
no publications to date on a deeper systematisation of Digital Twins based on ML models in
the direction of a reference model (e. g., according to [17]), which guides focused research and
application development of MLDTs.
1https://opcfoundation.org/about/opc-technologies/opc-ua/ [2023-07-03]
2https://www.automationml.org/ [2023-07-03]</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Definition and Characteristics of ML-Based Digital Twins</title>
        <p>In the context of this paper, an ML-based Digital Twin is defined as a special type of Digital
Twin, where ML models form the central basis for the twin’s ability to model and simulate the
physical world. These models are adapted to the specific requirements of the Digital Twin by
training with large amounts of data and can also recognise previously unknown patterns and
react to unknown incoming data in real time.</p>
        <p>In comparison to other types of Digital Twins, MLDTs defined in this paper have some special
characteristics which are summarised in the following.</p>
        <p>Task specialisation: On the one hand, Digital Twins based on ML methods can be applied in all
kinds of CPSs, regardless of the domain (manufacturing, smart grids, etc.) or the task (intelligent
control, condition monitoring, etc.). On the other hand, an individual MLDT is trained to perform
a very specific task. This means, for example, that a neural network, optimised for industrial
quality analysis [14], cannot be used simultaneously in the energy eficiency improvement of
buildings [16].</p>
        <p>Physics-based model integration: Physics-based models in the context of Digital Twins are
computer-based models that mimic the physical properties and behaviours of real objects or
systems, for example, representing energetic or thermal and other physical properties in
mathematical form. MLDTs are able to benefit from the strengths of these models (interpretability,
generic applicability, etc.) on the one hand and from the performance of data-driven models
on the other hand through the targeted integration of physical models, e. g., to supplement or
generate training data [15].</p>
        <p>Data-driven model building: Every Digital Twin needs models to represent the behaviour
of its real counterpart and to be able to make predictions or perform simulations based on
(IoT) data [10]. While, for example, BIM, GIS, or CAD-based Digital Twins usually derive their
information from (3D) models of buildings, infrastructures, or other physical objects, MLDTs
are, at their core, based on ML models applied to real-time data or other sources [14], e. g.,
from IoT-sensors in the field. This enables MLDTs to recognise previously unknown patterns or
correlations and to make higher-quality decisions based on real-world data.</p>
        <p>
          Data complexity and processing: MLDTs are specialised in processing large amounts of
data from various sources in near real-time and can process them, for example, in cloud
environments [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] or on the edge [18], depending on the individual use case requirements for
performance and confidentiality. Furthermore, by integrating additional (domain) knowledge
in the diferent phases of the ML process, the robustness of the MLDT can be increased. Thus,
prior knowledge can not only support the selection of a suitable model or the interpretation of
model predictions, but also help in the data preparation phase to clean the data, fill in missing
values, or remove outliers [19].
        </p>
        <p>Adaptability and learning: Digital Twins operate in dynamic and permanently evolving
environments and must therefore be able to adapt changes as eficiently as possible. These
changes can be both sudden or gradual, as well as unconscious or planned (e. g., the planned
change of technical components versus their gradual wear and tear). The (real-time) processing
and analysis of large amounts of data, gives the MLDT the ability to adapt to changes in their
environment and provides a major advantage over other forms of Digital Twins [14].</p>
        <p>
          Federation and transfer of learning outcomes: Digital Twins often process sensitive data
concerning intellectual property or personal data. By applying federated ML techniques in
the context of Digital Twins, distributed (cross-company) data sets can be used eficiently to
increase model performance and scale, while preserving data privacy. In addition, MLDTs can
use transfer learning to access pre-trained models and apply them to similar problems [
          <xref ref-type="bibr" rid="ref3">20</xref>
          ].
        </p>
        <p>Non ML Models</p>
        <p>Domain Knowledge
Machine Learning Pipeline</p>
        <p>Training Data
(Sensor data etc.)</p>
        <p>Trigger</p>
        <p>Model Update
Retraining Data</p>
        <p>Inference Environment</p>
        <p>Model
Repository</p>
        <p>DT Service</p>
        <p>Monitoring</p>
        <p>Inference Engine
Data stream Manager</p>
        <p>Real-Time Data
(from the physical layer)</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Challenges in the implementation of ML-based Digital Twins</title>
        <p>The development of an MLDT in a CPS is a highly complex, interdisciplinary process that
requires both a profound understanding of the subject domain and its underlying technical
processes, as well as in-depth knowledge of data analytics and ML, software development, and
project management.</p>
        <p>
          The performance of the Digital Twin application depends significantly on the choice of a
well-fitting and robust ML model. On the one hand, a suficient amount of high-quality data
is essential for the training of ML models. This requires, in particular, the preparation of the
data, including the cleaning, transformation, and integration of diferent data sets from diferent
sources. This step often also requires the conversion of specific knowledge models, for example
based on CAD, BIM, or GIS information, into a format adequate for the learning process, for
example through synthetic data generation in domain systems [15]. On the other hand, Digital
Twins operate in dynamic and permanently evolving environments and therefore must be able
to adapt to changes as eficiently as possible. In all cases, the ML pipeline must be created
robust enough to adapt (autonomously) to changing environmental conditions or to allow the
retraining of their models in a structured way during live operation [14]. The overall challenge
is to develop a robust and powerful MLDT that can evolve and learn throughout its entire
lifetime (cf. continual lifelong learning) [
          <xref ref-type="bibr" rid="ref4">21</xref>
          ]. This results in ML-specific challenges, such as the
avoidance of catastrophic forgetting tendencies in neural networks or the respective ML model
by developing adequate strategies [
          <xref ref-type="bibr" rid="ref4">21</xref>
          ].
        </p>
        <p>
          Another challenge, in regard to especially high compute- and data-intensive components
of MLDTs (like the training processes, model federation, etc.) is their resource- and energy
eficiency and thus, their impact on the environment. While it is true that, as stated in the
introduction, the eficiency in the underlying domain-specific processes can be improved through
the optimisation with MLDTs (Green by IT) [
          <xref ref-type="bibr" rid="ref5 ref6">22, 23</xref>
          ], it is important that the systems themselves
are built in a way that they adhere to sustainability specifications and do not become resource
drivers [
          <xref ref-type="bibr" rid="ref7 ref8">24, 25</xref>
          ]. This is even more significant when we consider that the IoT systems that form
the basis of the MLDTs in the physical world are usually lightweight and distributed in the
ifeld [
          <xref ref-type="bibr" rid="ref9">26</xref>
          ]. Thus, it is important to ensure that the hardware resources are available and that the
energy supply is suficient (batteries, solar cells, energy harvesting, etc.).
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Case Study: ML-Based Digital Twins in Water Resource</title>
    </sec>
    <sec id="sec-4">
      <title>Management</title>
      <p>AI and the DT paradigm also find application in the Water Resource Management (WRM)
domain. As a result of a thorough literature search, the authors were able to identify four
categories in WRM into which previous publications can be classified. While being prevalent in
WRM, these categories certainly are not WRM-specific and can also be applied to other domains.</p>
      <p>
        For one, MLDTs in WRM are used in the context of Forecasting to predict the behaviour
of the water cycle through the timeline. Typically needed forecasts in WRM include the
required extraction from water sources [
        <xref ref-type="bibr" rid="ref10">27</xref>
        ] or water demand patterns [
        <xref ref-type="bibr" rid="ref11">28</xref>
        ]. Another category
is Monitoring &amp; Maintenance, considering that various papers show that the seamless operation
of water utilities can be supported by MLDTs (e. g. [
        <xref ref-type="bibr" rid="ref12 ref13">29, 30</xref>
        ]). The next predominant category
is Optimisation &amp; Controlling, as for example, energy eficiency through optimal pump and
valve operation is also a significant topic in WRM [
        <xref ref-type="bibr" rid="ref12">29</xref>
        ] and ML algorithms are a perfect fit
for these optimisation problems. Lastly, Digital Twins are used for Decision Support, e. g., for
infrastructure planning [
        <xref ref-type="bibr" rid="ref12">29</xref>
        ] or employee training [
        <xref ref-type="bibr" rid="ref14">31</xref>
        ]. In most use cases, the Digital Twins are
a holistic representation of the water utility, so usually they can fit into more than one category
as they serve multiple purposes.
      </p>
      <p>The case study in this paper is about the MLDT for a drinking water network in southwestern
Germany and can be assigned to the Forecasting and Optimisation &amp; Controlling categories.</p>
      <sec id="sec-4-1">
        <title>ANN-control of Drinking Water Distribution Systems</title>
        <p>The “Stadtwerke Trier” (SWT), a municipal utility, operates a drinking water network which
has a capacity of 10 to 11 million cubic meters of water, serving a population of approximately
110,000 residents in the city, located in southwestern Germany. The city’s drinking water
network is supplied by two water utilities, one of which obtains its water from a reservoir at a higher
altitude and generates about 1 million kWh of energy annually via two turbines (2 x 250 kW)
integrated in the water inlet. In addition, four rooftop and one ground-mounted system provide
a cumulative photovoltaic (PV) capacity of approximately 500 kWp. Compared to the electricity
generation, the energy consumption of the grid is also considerable and ranges from 1.6 to
1.7 million kWh per year, especially due to several water pumps that are used for transferring
the water to diferent storage reservoirs and grid zones within the city due to the topographical
location.</p>
        <p>In 2017, SWT started an automation project together with the industry supplier Xylem3,
whose objective was to create a Digital Twin for the online simulation and energy-eficient
optimisation of drinking water distribution based on ANNs. The overarching target parameters
of the Digital Twin are the provision of drinking water in the required quantity and water
pressure for the end users, while at the same time minimising energy consumption. Similar to
the division of the water distribution system (WDS) into separate water network zones, local
optimisation takes place in the Digital Twin at the level of these WDS zones, along with global
optimisation at the level of the overall network. The structure of the MLDT used in this case
study can be seen in Fig. 2.</p>
        <p>To enable the local optimisation, two types of zone-specific models are required. On the one
hand, water demand prognosis requires forecasting models that have been trained on the basis
of historical consumption and weather data and represent the water demand of a WDS zone.
On the other hand, there is also the need for infrastructure models that replicate the hydraulic
and energetic behaviour of the physical components. Both energy consumers (pumps, valves,
etc.) and energy producers (turbines) are modelled. Such simulations are already available
in the form of deterministic models which are provided by a domain-specialised software
for the simulation, calculation and analysis of utility networks4. Although simulations of
individual WDS zones with these deterministic models on their own is possible, they are very
time and resource consuming and therefore not suitable in live operation. The modelling of
the physical WDS properties by ANNs ensures significant performance gains in the simulation
of optimised operation modes in live operation. Nevertheless, the data from simulation runs
of the deterministic infrastructure model combined with expert knowledge from the domain
are used as a valuable training input for the ANN. With these two types of trained models for
3The ANN-based control of water distribution networks and water treatment plants is ofered by
Xylem under the brand name BLU-X: https://www.xylem.com/de-de/products--services/digital-solutions/
blu-x-treatment-plant-optimization/ [2023-07-03]
4SWT uses STANET for WDS modelling and calculation: https://www.stafu.de/en/home.html [2023-06-30]
Real-Time
Data Input
(SCADA, etc.)</p>
        <p>APPLICATION</p>
        <p>Training Data
Simula on</p>
        <p>Runs
(STANET®)</p>
        <p>Historical</p>
        <p>Data
Domain
Knowledge</p>
        <p>TRAINING
e1 (FWoraetecradstemMaondd,eelstc.)
n WDS zone 1
o
z
SDWIWnfDraSsztoruncetu1re Models
e2 (FWoraetecradstemMaondd,eelstc.)
no WDS zone 2
z
SDWIWnfDraSsztoruncetu2re Models
each WDS zone, an optimal control of the physical components is found for energy-optimised
operation in terms of power generation and consumption.</p>
        <p>Based on the incoming real-time data, provided by the SCADA system or additional data
sources and considering diferent interactions between individual WDS zones, a global overall
optimisation of the WDS is carried out according to the principle of modular ANNs. As a result,
the optimised operational suggestions can be carried out by the WDS SCADA system to control
the physical layer (e. g. pumps, valves, etc.). By abstracting the WDS-based on the MLDT
presented here, the self-consumption of green energy can be increased from 60 to 90 % through
optimised operation5.</p>
        <p>Since the MLDT is the virtual representation of a highly dynamic, physical network, it has
to be considered as a constantly evolving system. For example, an even deeper integration of
renewable energies into the described system is planned in the future by considering forecasts
of PV power generation. In order to be able to integrate new models like these into the existing
system and, if necessary, also update the existing models, it is helpful to have clearly defined
processes in which system adaptations and MLDT applications can run in parallel.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Process Model for the Development of ML-Based Digital</title>
    </sec>
    <sec id="sec-6">
      <title>Twins</title>
      <p>The development of Digital Twins, and in particular of MLDT applications, is a
crossorganisational, time- and knowledge-intensive process that requires a variety of diferent skills
and collaboration between domain experts (often supported by external technical planning
5https://www.swt.de/p/CO2_freies_Trinkwasser_f%25C3%25BCr_Trier-5-7330.html [2023-06-30]
ofices), automation engineers, component manufacturers, data scientists, and ML engineers. In
the following sections, we present a first approach for a six-phase process model that structures
the diferent work steps of MLDT development. For this purpose, we first describe the
methodology of the process model definition on the basis of established data science and software
development procedures as well as best practice examples.</p>
      <sec id="sec-6-1">
        <title>4.1. Methodology</title>
        <p>
          ML applications are typically complex and require careful planning, development, and
implementation to ensure they can be used safely and efectively in a production environment [
          <xref ref-type="bibr" rid="ref15">32</xref>
          ].
In practice, ML projects often fail because insights from the data exploration phases are not
efectively applied in productive ML models, which can lead to inaccurate predictions, higher
costs, and risks. To structure the procedure of efective development of reliable operational
ML applications, the Machine Learning Operations (MLOps) approach has emerged in recent
years. MLOps is a cross-functional, collaborative, and iterative paradigm that adopts
established DevOps practices from software development, such as the Continuous Integration (CI)
and Continuous Delivery (CD) principles, and combines them with data engineering and ML
methods [
          <xref ref-type="bibr" rid="ref16">33</xref>
          ].
        </p>
        <p>
          The development of an overarching process model is strongly oriented on selected scientific
publications on MLOps on the one hand and on expert interviews and studies of ML application
projects in water resource management on the other. With regard to a framework MLOps
structure, the procedure model described here is loosely oriented on the MLOps architecture
according to [
          <xref ref-type="bibr" rid="ref16">33</xref>
          ]. As a deeper look at the practice also shows that structured data-mining and
software development methods have already been adopted in industrial water management
applications, we base the detailed MLOps workflow definition for the data-mining and
engineering steps on CRISP-DM [
          <xref ref-type="bibr" rid="ref17">34</xref>
          ], the de-facto standard workflow for industrial data science
projects.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>4.2. Six-Phase process model</title>
        <p>Fig. 3 shows a six-phase process model for the development of MLDTs, developed according to
the methodology described above. Each phase is divided into a diferent number of tasks which
are ordered chronologically (indicated by the numbering in the round brackets).</p>
        <sec id="sec-6-2-1">
          <title>Phase 1: Digital Twin Alignment and ML Problem Definition</title>
          <p>
            Usually, data science or ML projects start with a phase in which the problem is specified and
a preliminary project plan is set up. In CRISP-DM, this initial step is very closely linked to
the phase of data understanding, since the problem definition is based on hypotheses about
possible data patterns [
            <xref ref-type="bibr" rid="ref17">34</xref>
            ]. Based on this, the first steps of the proposed process model include
the (1) definition of the business problem to be addressed by the Digital Twin and the definition
of an overall project plan that regulates the allocation of tasks between domain experts, data
scientists, ML engineers, or software developers. Subsequently, the raw data required for a
rough (2) exploratory data investigation are compiled and subjected to an initial (3) data quality
check. Based on these initial findings and the defined business goals associated with Digital
C
          </p>
          <p>Phase 1
DT-Alignment
and ML-Problem</p>
          <p>Defini on
Formulated
ML</p>
          <p>Problem</p>
          <p>Phase 2</p>
          <p>Data
Prepara on
Experimental</p>
          <p>Dataset</p>
          <p>Phase 3</p>
          <p>Model
Training and
Evalua on
Best-Fit
MLModel/Pipeline</p>
          <p>Development Loop</p>
          <p>Phase 4
ML-Based DT
Development
MLDT prototype</p>
          <p>B
Phase 5</p>
          <p>CI/CD for
ML-Based DT</p>
          <p>Produc ve
MLDT/ML-Model</p>
          <p>A</p>
          <p>Phase 6</p>
          <p>Live
opera on
Twin development, the (4) ML problem (regression, classification, etc.) to be solved is defined at
the end of the first process model phase.</p>
        </sec>
        <sec id="sec-6-2-2">
          <title>Phase 2: Data Preparation</title>
          <p>
            IoT data is usually not flawless and its quality can vary greatly. Therefore, before training the
ML models, it is necessary to (5) remove faulty data or (e. g. synthetically) fill in missing data
and perform a final data quality check. The (6) feature transformation and engineering step
involves the preparation and processing of input features, including conversion of features into
a processable format and creation of new features or modification of existing features to improve
the performance of the model. Parallel to the feature engineering task, the (7) integration of
additional data takes place, for example also from external sources, which are necessary for the
execution of the Digital Twin service. This can include, for example, weather data for Digital
Twins in water treatment or energy market data for planning energy-optimised production
processes [
            <xref ref-type="bibr" rid="ref14">31</xref>
            ]. The outcome of the “Data Preparation” phase is a cleaned and integrated data
set that is aligned to the next phase for the training and evaluation of an ML model tailored to
the Digital Twin Service.
          </p>
        </sec>
        <sec id="sec-6-2-3">
          <title>Phase 3: Model Training and Evaluation</title>
          <p>During the third phase, the most suitable ML methodology with regard to the problem defined
in phase 1 is to be evaluated, and its learning result subsequently stored as a model. At the
beginning, an (8) exploratory data analysis (EDA) takes place, in which the surveyed data is
analysed with regard to the statistical correlations of their features. The choice of the EDA
environment, for example Matlab [35], R, or Python [36], depends on the project requirements,
the skills and knowledge of the data analysts and the specific domain. Since the previous
step is very closely related to (9) model training (diferent ML approaches require diferently
prepared data), these activities can be carried out in parallel. ML models are the result of the
learning process and depend on the method and data used to train them. In the domain of WRM,
these are for example ANNs for process control tasks (see case study in sect. 3) or support
vector machines for predicting the water demand [37]. Diferent model parametrisations allow
(10) model validation based on selected performance metrics, for example the Mean Squared
Error (MSE) or the Mean Absolute Error (MAE) for the assessment of regression models for the
prediction of water demand or the expected wastewater quantity based on weather forecasts.
At the end of the Model Training and Evaluation phase, the (11) most promising ML model
is evaluated against the Digital Twin requirements defined in phase 1 and a decision is made
whether to initiate the development of a productive MLDT environment (phase 4). Phase 3
identifies the best-fit ML model through experimental training, but training on productive data
is completed in later steps.</p>
        </sec>
        <sec id="sec-6-2-4">
          <title>Phase 4: ML-Based DT Development</title>
          <p>The fourth phase covers all activities to build an infrastructure on which the MLDT can be
evaluated and ultimately operated with productive data. Normally, the infrastructure for CI/CD
is divided into at least two environments - with one being the testing or pre-production stage
and the other one being the productive environment. The aim of the development in this phase
is both an ML pipeline that regularly learns updated ML models initiated by various rules or
triggers and the inference environment that applies these models to real-time data (see Fig. 1).
Phase 4 starts with the (12) specification and setup of the system infrastructure, where, for
example, basic decisions are made about the structure of the server environments (cloud or
on-premise, server configuration etc.) and all necessary Application Programming Interfaces
(APIs) are defined. To automate the CI and CD of updated ML models and software components
of the MLDT, a corresponding (13) CI/CD pipeline must initially be established. Afterwards, step
(14) involves the development of the previously defined interfaces and software components as
well as the development of the ML pipeline and the inference server. Every newly developed
component or functionality triggers the CI/CD pipeline in the pre-production stage, which is
further described in phase 5 as the development loop. To monitor the performance of the MLDT
in later live operation, various (15) triggers are set up at the end of the implementation phase
that initiate either an adjustment of the system environment, the ML pipeline, or the retraining
of the ML model (phase 6).</p>
        </sec>
        <sec id="sec-6-2-5">
          <title>Phase 5: CI/CD for ML-Based Digital Twin</title>
          <p>In phase 4, a prototype of the MLDT was provided, but it is still not in live operation and does
not yet control the real asset. Before the Digital Twin becomes productive, its functionalities
have to be integrated, tested, and deployed on the final infrastructure. This is partly based
on the MLOps-approach proposed by Google6 for CI/CD in ML. During the execution of the
6https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
[2023-06-30]
(16) CI pipeline, the software fragments are continuously checked for errors in order to detect
and rectify problems at an early stage. Subsequently, in the (17) CD step, the MLDT system,
including the ML pipeline or the modified system components, are deployed on the server. In the
(18) continuous training step, the ML model is trained, based on the developed and previously
deployed ML pipeline. After this step, the trained ML model is stored in the repository of the
Inference Environment during the following (19) Model CD.</p>
          <p>As mentioned in the previous phase, the proposed CI/CD pipeline (including steps 16, 17,
18, and 19) runs in a development loop as newly developed features are tested, integrated,
and deployed in the pre-production environment. When all tests are passed and all MLDT
requirements from phase 1 are met, in step (20) the CI/CD pipeline is executed on the productive
environment. This final productive CI/CD run concludes the development, as the ML model
is now trained on productive data and the MLDT is ready for operation on the productive
environment. This whole phase is not only carried out during the initial development of the
MLDT, but also during partial adjustments of the system, e. g., adjustments in the ML pipeline
or the integration of new data.</p>
        </sec>
        <sec id="sec-6-2-6">
          <title>Phase 6: Live operation</title>
          <p>After the CI/CD phase and the successful go-live, the performance of the Digital Twin is
(21) monitored during operation. Thus, adequate re-training methods must be applied in this
step so that the MLDT learns continuously and does not lose relevant knowledge at the same time
(see section 2.3). The triggers implemented in phase 4 (step 15) can initiate diferent workarounds:
A type A trigger (annotations in Fig. 3) provides the impulse that the ML model needs to be
retrained (step 18), for example in the case of decreasing prediction quality, recognisable by the
deterioration of various performance metrics. A type B trigger indicates a diferent workflow
and means that either changes to the software environment (e. g. update, adaptation of API) or
to the ML pipeline (integration of new features, hyperparameter modification) must be done.
Digital Twins are not static structures, but are often further developed with regard to their
business case after their initial implementation. Depending on the intensity of the intervention,
it may be necessary to specify these adjustments again starting with phase 1 (type C trigger).
This could be the extension of the Digital Twin, where entirely new functionalities are to be
integrated, e. g., the integration of PV power forecasts into an existing ANN control of the water
distribution network.</p>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>4.3. Discussion</title>
        <p>In the context of the increasing importance of Digital Twins based on ML methods, the process
model described above is a first comprehensive attempt to take the specifics of MLDT into
account. This involves coordinating the preliminary steps of business goal definition and ML
problem formulation as well as the learning of suitable ML models and the development of a
suitable inferencing environment and its operation. By integrating established data mining,
ML, and software development paradigms with best practices from practical Digital Twin
implementation projects, the process model provides a structural framework for interdisciplinary
collaboration in MLDT projects. It fosters structured cross-organisational collaboration among
diferent experts with diferent skills. At the same time, the strict DevOps focus ensures fast and
secure development and deployment of the necessary software components, with particular
emphasis on regular updates of the fundamental ML models. This meets the high demands of
Digital Twins regarding their adaptability in changing environments.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion and Future Work</title>
      <p>As evaluated in this paper, Digital Twins are enabled to process large amounts of data in real time
through the use of ML and can thus perform intelligent control and optimisation tasks. This paper
therefore proposes an initial characterisation of MLDTs and specifies the associated challenges.
To address these challenges, a six-phase process model for the development, deployment, and
operation of MLDTs was proposed that considers the aspects of Digital Twin problem definition
and evaluation of a suitable ML model as well as its productive implementation within a CPS.
The adoption of CI/CD practices ensures integrated monitoring of model performance as well
as (semi-) automated model retraining and updating.</p>
      <p>Despite the widespread use of ML techniques in the context of Digital Twins, however, there
is a lack of comprehensive definitions and diferentiation from other Digital Twin modelling
approaches. To further encourage research in Machine Learning-based Digital Twins, we
propose the development of a general reference model by combining deductive and inductive
elements. This should include a comparison of similar reference models from the field of CPSs
and findings from an extensive literature study as well as further investigations in industrial
MLDT applications, for example in the field of manufacturing or water management.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We would like to thank the Ministry for Climate Protection, Environment, Energy and Mobility of
Rhineland-Palatinate (Ministerium für Klimaschutz, Umwelt, Energie und Mobilität
RheinlandPfalz) for the financial support and assistance of the research project “Digital twin in water
resource management” in the context of which this publication was realised. We would also
like to thank Mr. Nicolas Wiedemeyer from Stadtwerke Trier and Mr. Michael Natschke from
Xylem Water Solution GmbH for contributing their expertise to the case study. This work was
also funded in part by the German Federal Ministry for Economic Afairs and Climate Action
project “Energy-eficient analysis and control processes in the dynamic edge cloud continuum
for industrial manufacturing” (EASY) under Grant 01MD22002D.
[3] M. Grieves, J. Vickers, Digital twin: Mitigating unpredictable, undesirable emergent
behavior in complex systems, Transdisciplinary perspectives on complex systems: New
ifndings and approaches (2017) 85–113.
[4] W. Kritzinger, M. Karner, G. Traar, J. Henjes, W. Sihn, Digital twin in manufacturing: A
categorical literature review and classification, Ifac-PapersOnline 51 (2018) 1016–1022.
[5] M. Groshev, C. Guimarães, J. Martín-Pérez, A. de la Oliva, Toward intelligent cyber-physical
systems: Digital twin meets artificial intelligence, IEEE Communications Magazine 59
(2021) 14–20.
[6] M. Grieves, Intelligent digital twins and the development and management of complex
systems, 2022. URL: https://digitaltwin1.org/articles/2-8.
[7] K. Alexopoulos, N. Nikolakis, G. Chryssolouris, Digital twin-driven supervised machine
learning for the development of artificial intelligence applications in manufacturing,
International Journal of Computer Integrated Manufacturing 33 (2020) 429–439.
[8] S. Schelter, F. Biessmann, T. Januschowski, D. Salinas, S. Seufert, G. Szarvas, On challenges
in machine learning model management, IEEE Data Engineering Bulletin (2015).
[9] C. C. Menassa, From bim to digital twins: A systematic review of the evolution of intelligent
building representations in the aec-fm industry, Journal of Information Technology in
Construction (ITcon) 26 (2021) 58–83.
[10] M. Liu, S. Fang, H. Dong, C. Xu, Review of digital twin about concepts, technologies, and
industrial applications, Journal of Manufacturing Systems 58 (2021) 346–361.
[11] Y. Zheng, S. Yang, H. Cheng, An application framework of digital twin and its case study,</p>
      <p>Journal of Ambient Intelligence and Humanized Computing 10 (2019) 1141–1153.
[12] F. Tao, H. Zhang, A. Liu, A. Y. Nee, Digital twin in industry: State-of-the-art, IEEE</p>
      <p>Transactions on industrial informatics 15 (2018) 2405–2415.
[13] F. Tao, B. Xiao, Q. Qi, J. Cheng, P. Ji, Digital twin modeling, Journal of Manufacturing</p>
      <p>Systems 64 (2022) 372–389.
[14] Q. Min, Y. Lu, Z. Liu, C. Su, B. Wang, Machine learning based digital twin framework for
production optimization in petrochemical industry, International Journal of Information
Management 49 (2019) 502–519.
[15] T. Ritto, F. Rochinha, Digital twin, physics-based model, and machine learning applied
to damage detection in structures, Mechanical Systems and Signal Processing 155 (2021)
107614.
[16] T. Y. Fujii, V. T. Hayashi, R. Arakaki, W. V. Ruggiero, R. Bulla Jr, F. H. Hayashi, K. A. Khalil,
A digital twin architecture model applied with mlops techniques to improve short-term
energy consumption prediction, Machines 10 (2021) 23.
[17] C. M. MacKenzie, K. Laskey, F. McCabe, P. F. Brown, R. Metz, B. A. Hamilton, Reference
model for service oriented architecture 1.0, OASIS standard 12 (2006) 1–31.
[18] Y. Lu, X. Huang, K. Zhang, S. Maharjan, Y. Zhang, Communication-eficient federated
learning for digital twin edge networks in industrial iot, IEEE Transactions on Industrial
Informatics 17 (2020) 5709–5718.
[19] P. Klein, N. Weingarz, R. Bergmann, Enhancing siamese neural networks through expert
knowledge for predictive maintenance, in: IoT Streams for Data-Driven Predictive
Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning: Second International
Workshop, IoT Streams 2020, and First International Workshop, ITEM 2020, Co-located
discovery and data mining, volume 1, Manchester, 2000, pp. 29–39.
[35] W. L. Martinez, A. R. Martinez, J. Solka, Exploratory data analysis with MATLAB, Crc</p>
      <p>Press, 2017.
[36] K. Sahoo, A. K. Samal, J. Pramanik, S. K. Pani, Exploratory data analysis using python,
International Journal of Innovative Technology and Exploring Engineering (IJITEE) 8
(2019) 4727–4735.
[37] I. S. Msiza, F. V. Nelwamondo, T. Marwala, Artificial neural networks and support vector
machines for water demand time series forecasting, in: 2007 IEEE International Conference
on Systems, Man and Cybernetics, IEEE, 2007, pp. 638–643.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Cyber physical systems: Design challenges, in: 2008 11th IEEE international symposium on object and component-oriented real-time distributed computing (ISORC)</article-title>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>363</fpage>
          -
          <lpage>369</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bordel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Alcarria</surname>
          </string-name>
          , D. S. de Rivera, T. Robles,
          <article-title>Process execution in cyber-physical systems using cloud and cyber-physical internet services</article-title>
          ,
          <source>The Journal of Supercomputing</source>
          <volume>74</volume>
          (
          <year>2018</year>
          )
          <fpage>4127</fpage>
          -
          <lpage>4169</lpage>
          . with ECML/PKDD 2020, Ghent, Belgium,
          <source>September 14-18</source>
          ,
          <year>2020</year>
          ,
          <source>Revised Selected Papers 2</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>B.</given-names>
            <surname>Maschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jazdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weyrich</surname>
          </string-name>
          ,
          <article-title>Transfer learning as an enabler of the intelligent digital twin</article-title>
          ,
          <source>Procedia CIRP 100</source>
          (
          <year>2021</year>
          )
          <fpage>127</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [21]
          <string-name>
            <surname>G. I. Parisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kemker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Part</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wermter</surname>
          </string-name>
          ,
          <article-title>Continual lifelong learning with neural networks: A review</article-title>
          ,
          <source>Neural networks 113</source>
          (
          <year>2019</year>
          )
          <fpage>54</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Vinuesa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azizpour</surname>
          </string-name>
          , I. Leite,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balaam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dignum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Domisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Felländer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Langhans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tegmark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Nerini</surname>
          </string-name>
          ,
          <article-title>The role of artificial intelligence in achieving the sustainable development goals</article-title>
          ,
          <source>Nature Communications</source>
          <volume>11</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rolnick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. L.</given-names>
            <surname>Donti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Kaack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kochanski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lacoste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sankaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Milojevic-Dupont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Jaques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Waldman-Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Luccioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Maharaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Sherwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Mukkavilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Kording</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hassabis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Platt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Creutzig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Tackling climate change with machine learning</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dick</surname>
          </string-name>
          , E. Kern, T. Johann,
          <article-title>The greensoft model: A reference model for green and sustainable software and its engineering</article-title>
          ,
          <source>Sustainable Computing: Informatics and Systems</source>
          <volume>1</volume>
          (
          <year>2011</year>
          )
          <fpage>294</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dodge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <string-name>
            <surname>Green</surname>
            <given-names>AI</given-names>
          </string-name>
          ,
          <source>Communications of the ACM</source>
          <volume>63</volume>
          (
          <year>2020</year>
          )
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>A.</given-names>
            <surname>Guldner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Murach</surname>
          </string-name>
          ,
          <article-title>Measuring and assessing the resource and energy eficiency of artificial intelligence of things devices and algorithms</article-title>
          , in: V.
          <string-name>
            <surname>Wohlgemuth</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Naumann</surname>
          </string-name>
          , G. Behrens, H.
          <article-title>-</article-title>
          <string-name>
            <surname>K. Arndt</surname>
          </string-name>
          , M. Höb (Eds.),
          <source>Advances and New Trends in Environmental Informatics</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>185</fpage>
          -
          <lpage>199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Puig</surname>
          </string-name>
          , G. Cembrano,
          <article-title>Real-time control of urban water cycle under cyber-physical systems framework</article-title>
          ,
          <source>Water</source>
          <volume>12</volume>
          (
          <year>2020</year>
          )
          <fpage>406</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>A.</given-names>
            <surname>Niknam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Zare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hosseininasab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mostafaeipour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>A critical review of short-term water demand forecasting tools-what method should i use?</article-title>
          ,
          <source>Sustainability</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>5412</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>P.</given-names>
            <surname>Conejos Fuertes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Martínez Alzamora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Hervás</given-names>
            <surname>Carot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Alonso</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <article-title>Building and exploiting a digital twin for the management of drinking water distribution networks</article-title>
          ,
          <source>Urban Water Journal</source>
          <volume>17</volume>
          (
          <year>2020</year>
          )
          <fpage>704</fpage>
          -
          <lpage>713</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [30]
          <string-name>
            <surname>H. M. Ramos</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          <string-name>
            <surname>Morani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Carravetta</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Fecarrotta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Adeyeye</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          <string-name>
            <surname>López-Jiménez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pérez-Sánchez</surname>
          </string-name>
          ,
          <article-title>New challenges towards smart systems' eficiency by digital twin in water distribution networks</article-title>
          ,
          <source>Water</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>1304</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>E.</given-names>
            <surname>Torfs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nicolaï</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Daneshgar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Copp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Haimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ikumi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Plosz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Snowling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Townley</surname>
          </string-name>
          , et al.,
          <article-title>The transition of wrrf models to digital twin applications</article-title>
          ,
          <source>Water Science and Technology</source>
          <volume>85</volume>
          (
          <year>2022</year>
          )
          <fpage>2840</fpage>
          -
          <lpage>2853</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sculley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Holt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Golovin</surname>
          </string-name>
          , E. Davydov,
          <string-name>
            <given-names>T.</given-names>
            <surname>Phillips</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ebner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Crespo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dennison</surname>
          </string-name>
          ,
          <article-title>Hidden technical debt in machine learning systems</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kreuzberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kühl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hirschl</surname>
          </string-name>
          ,
          <article-title>Machine learning operations (mlops): Overview, definition, and architecture</article-title>
          ,
          <source>arXiv preprint arXiv:2205.02302</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hipp</surname>
          </string-name>
          , Crisp-dm:
          <article-title>Towards a standard process model for data mining</article-title>
          ,
          <source>in: Proceedings of the 4th international conference on the practical applications of knowledge</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>