=Paper= {{Paper |id=Vol-3762/524 |storemode=property |title=An MLOps Solution Framework for Transitioning Machine Learning Models into eHealth Systems |pdfUrl=https://ceur-ws.org/Vol-3762/524.pdf |volume=Vol-3762 |authors=Andrea Basile,Fabio Calefato,Filippo Lanubile,Giulio Mallardi,Luigi Quaranta |dblpUrl=https://dblp.org/rec/conf/ital-ia/BasileCLMQ24 }} ==An MLOps Solution Framework for Transitioning Machine Learning Models into eHealth Systems== https://ceur-ws.org/Vol-3762/524.pdf
                                An MLOps Solution Framework for Transitioning Machine
                                Learning Models into eHealth Systems
                                Andrea Basile, Fabio Calefato, Filippo Lanubile∗ , Giulio Mallardi and Luigi Quaranta
                                Dept. of Computer Science, University of Bari, Via Edoardo Orabona 4, 70125 Bari BA, Italy


                                                  Abstract
                                                  Over the past few years, there has been a growing experimentation of machine learning (ML)-based technologies in the
                                                  healthcare domain. However, most related initiatives struggle to progress beyond the prototypical research stage and transition
                                                  to clinical use. Although this problem affects the adoption of ML across all industries, it is largely exacerbated in the highly
                                                  regulated medical domain. Lately, MLOps has emerged as a new discipline encompassing practices and tools to streamline
                                                  the development and maintenance of ML-enabled systems. Rooted in software engineering and inspired by DevOps, it places
                                                  great emphasis on the automation of ML pipelines and model lifecycle. In this paper, we present an MLOps-based solution
                                                  framework designed to streamline the transition of experimental ML models to production-ready components for eHealth
                                                  systems. Our approach is designed to support the reliable integration and clinical deployment of ML-enabled tools that can
                                                  assist healthcare professionals. The solution framework is being developed and validated in the context of “DARE – Digital
                                                  Lifelong Prevention”, an Italian research project aimed at leveraging the potential of data to improve health promotion and
                                                  prevention throughout the life course.

                                                  Keywords
                                                  ML pipeline reproducibility, ML model deployment, ML-enabled component, ML for healthcare, health informatics



                                1. Introduction                                                                                                  interoperable. As a result — after the dissemination of
                                                                                                                                                 scientific findings — ML models typically remain confined
                                The integration of data-driven artificial intelligence (AI) within laboratories and never find practical application.
                                into eHealth systems has recently emerged as a promis- This precludes a broader societal impact of ML research
                                ing avenue to enhance healthcare delivery and improve in the medical domain, representing a significant disper-
                                patient outcomes [1]. Consequently, in the last few years, sion of valuable resources and potential.
                                there has been a growing experimentation of healthcare                                                              To address the key challenges hindering the practical
                                solutions based on machine learning (ML) and deep learn- application of ML in the healthcare domain, we propose a
                                ing (DL). These data-driven AI techniques have already solution framework that integrates a set of best practices
                                shown remarkable capabilities in key areas of medicine, and a selection of software tools for their implementa-
                                from diagnostics to treatment [2].                                                                               tion. The solution framework is based on MLOps (short
                                              However, most research initiatives struggle to progress for ‘Machine Learning Operations’), an emerging disci-
                                beyond the prototypical research stage and transition pline in the area of AI engineering. Inspired by DevOps,
                                to clinical use. On the one hand, the primary focus of MLOps places great emphasis on the automation of ML
                                research teams is typically on optimizing the model build- pipelines and the lifecycle of machine learning models.
                                ing process and advancing the state of the art in terms                                                             Our solution framework is designed to support the
                                of model performance. On the other hand, moving be- end-to-end process for building and maintaining ML-
                                yond experimentation, towards the clinical application of enabled components to be integrated into eHealth sys-
                                machine learning, poses significant challenges. These in- tems. On the one hand, it aims to improve the practices
                                clude ensuring the end-to-end reproducibility and trace- adopted by data scientists in the laboratory. To this aim,
                                ability of ML pipelines, verifying the quality of all in- it comprises tools to organize the requirements of an
                                volved artifacts, and making ML-enabled components ML project, ensure the reproducibility and traceability
                                                                                                                                                 of ML experiments, and verify the quality of code, data,
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
                                nized by CINI, May 29-30, 2024, Naples, Italy                                                                    and models. On the other hand, it assists the transition
                                ∗
                                     Corresponding author.                                                                                       of ML models to production environments. To this aim,
                                Envelope-Open andrea.basile@uniba.it (A. Basile); fabio.calefato@uniba.it                                        it supports activities such as model API development,
                                (F. Calefato); filippo.lanubile@uniba.it (F. Lanubile);                                                          model containerization, deployment, and monitoring. In
                                giulio.mallardi@uniba.it (G. Mallardi); luigi.quaranta@uniba.it
                                                                                                                                                 all phases, it leverages workflow automation tools to
                                (L. Quaranta)
                                Orcid 0009-0007-2381-4575 (A. Basile); 0000-0003-2654-1588                                                       make the process reproducible and reduce the margin for
                                (F. Calefato); 0000-0003-3373-7589 (F. Lanubile);                                                                human error.
                                0000-0001-5847-117X (G. Mallardi); 0000-0002-9221-0739                                                              The solution framework described in this paper has
                                (L. Quaranta)                                                                                                    been developed as part of “DARE – Digital Lifelong Pre-
                                                    © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                            Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
vention,” an Italian research project aimed at leveraging     els to ensure patient safety and trust among healthcare
the potential of data to improve health promotion and         providers.
prevention throughout the life course. We are currently           To address these challenges, an increasing number of
in the process of validating the benefits of the solution     researchers are exploring the use of MLOps to integrate
framework through a few case studies, both within and         ML models into eHealth systems. In [6], Granlund et al.
outside DARE. In the future, we plan to further extend        introduce a certified medical software for the risk assess-
the scope of our proposal, by leveraging automation to        ment of joint replacement interventions, exploring the
support further aspects of ML projects. For instance, we      use of MLOps in a highly regulated context. A similar ef-
envision the automated creation of documentation and          fort is reported by Stirbu et al., who present an approach
validation reports needed to comply with healthcare reg-      that leverages pull requests as design controls and applies
ulations and certify the resulting ML-enabled systems as      it to integrate ML models in certified medical systems [7].
medical devices.                                                  Lombardo et al. leverage a digital twin technology to
   The remainder of this paper is organized as follows. In    provide Location Based Services (LBS) with intelligent
Section 2, we provide a definition of MLOps and report        functionalities [8]. In doing so, they leverage MLOps to
about existing MLOps experimentation in the healthcare        facilitate model evolution and adaptation to changes in
domain. In Section 3, we introduce the DARE research          the physical world.
project. In Section 4, we provide details about the prac-         To address a similar problem, Toivakka et al. pro-
tices and tools included in our solution framework. In        pose an efficient software delivery model, based on De-
Section 5, we outline future research directions and in       vOps, which ensures compliance with medical device
Section 6 we conclude the paper.                              standards [9]. Specifically, they align medical device soft-
                                                              ware regulatory requirements from standards IEC 62304
                                                              and IEC 82304-1 into the software delivery pipeline.
2. Background
2.1. MLOps definition                                         3. DARE Project
MLOps is an umbrella term that encompasses a set of           The DARE project is a wide-ranging initiative funded
practices and tools to streamline the creation and main-      by the Italian Ministry of University and Research. It
tenance of ML-enabled systems. It primarily aims to           has fostered the development of a distributed knowl-
automate ML pipelines and workflows, facilitating the         edge community dedicated to digital preventive health-
deployment of models into production environments.            care research. This community encompasses a network
The ultimate objective of MLOps is to implement the con-      of around 250 researchers from universities, hospitals,
tinuous integration and deployment of models (CI/CD),         healthcare companies, and other organizations.
mirroring and extending the DevOps approach used in              The primary goal of the project is to produce the
conventional software systems.                                knowledge and multidisciplinary solutions necessary to
   Kreuzberger et al. provide a comprehensive definition      establish Italy as a leading country in digital preven-
of MLOps, which they define as “a paradigm, including         tion. Specifically, the project aims to promote preventive
aspects like best practices, sets of concepts, as well as a   actions enabled by digital technologies and big data to
development culture when it comes to the end-to-end con-      improve the readiness and accuracy of key public health
ceptualization, implementation, monitoring, deployment,       tasks such as forecasting, surveillance, early diagnosis,
and scalability of machine learning products” [3].            and response to acute and chronic diseases, including co-
   With its growing popularity, MLOps is emerging as a        morbidities. A peculiarity of the project is the adoption
distinct discipline in the area of AI engineering. This is    of a ‘life-course’ perspective to address health-related
evidenced by the recent addition of courses on this topic     conditions in general.
at some universities [4, 5].                                     Ultimately, DARE aims to leverage digital technologies
                                                              to bridge social and geographic disparities in access to
2.2. MLOps in healthcare                                      integrated health services, benefiting the most vulnerable
                                                              segments of the population.
The application of data-driven AI in healthcare faces sev-
eral challenges, ranging from regulatory compliance and
data privacy concerns to the interoperability of systems 4. MLOps Solution Framework
and the integration of AI-driven insights into clinical
decision-making processes. Additionally, the high-stakes To support the transition of prototypical ML-based so-
nature of healthcare demands rigorous validation, mon- lutions developed within DARE to production-grade
itoring, and interpretability of machine learning mod- eHealth systems, we have proposed a solution framework
                                                           based on state-of-the-art MLOps practices and tools. Our
framework has a general-purpose design and is meant to           Reproducibility is a key requirement for ML pipelines.
support the development and maintenance of a variety          It is essential not only for achieving consistent model
of ML-based eHealth software. However, it can be easily       performance across production and lab environments
customized to support specific research initiatives within    but also for enabling the recovery and timely retraining
DARE and beyond.                                              of deployed models. Nonetheless, the inherent nondeter-
   In the following paragraphs, we describe the main          ministic nature of most ML and DL techniques, coupled
ideas behind the solution framework. Specifically, we         with the complexity of ML pipelines, makes attaining
report on the MLOps practices encompassed by the frame-       reproducibility in practice a significant challenge.
work, as well as the tools that we recommend for their           Similarly, ensuring the full traceability of model build-
practical implementation.                                     ing processes is of paramount importance. Healthcare
   Several MLOps tools have been developed so far. Most       is a safety-critical domain in which decisions can have
of them are commercial solutions, typically integrated        life-altering consequences. Thus, for models aimed at
into end-to-end MLOps or cloud-computing platforms.           supporting healthcare professionals in decision-making
Open-source options are available as well, and some of        activities it is essential to be able to trace back any unex-
the commercial tools – typically provided as Software-        pected behavior to the model training process, enabling
as-a-Service (SaaS) – are based on an open-source core        root cause analysis. This ensures the transparency and
which can be independently deployed on-premises. In           accountability of the overall system. Moreover, traceabil-
our solution framework, we recommend adopting open            ity helps in meeting healthcare regulations.
source software whenever possible. Not only is it typi-          As a first step towards ensuring the reproducibility
cally more cost-effective, but it also offers independence    and traceability of ML pipelines, we propose the use of
from cloud infrastructures, enabling on-premises deploy-      git as a version control system (VCS) for code artifacts
ments. This is particularly important in the healthcare do-   and of DVC1 as a specialized VCS for data and models. By
main, in which hospitals and other research institutions      adopting these tools in conjunction, it is always possible
need to comply with stringent patient data management         to understand which specific version of a dataset and of
requirements, which typically cannot leave the institu-       a training script were used to build a particular version
tion’s computing facilities. In such cases, our MLOps         of a machine learning model.
solution framework can be fully deployed on-premises.            A further step towards ensuring the full traceability of
                                                              the training process is adopting an experiment tracking
4.1. Scoping the ML Problem                                   solution. In this regard, we recommend using MLflow,2
                                                              a popular open-source platform featuring a dedicated
When planning to build an ML-enabled system or com-           experiment tracking module (MLflow Tracking). With
ponent, the initial challenge is properly defining the un-    MLflow, data scientists can track all relevant details of an
derlying machine learning problem, if one exists. Indeed,     ML experiment, including the training algorithm, the hy-
while machine learning offers optimal solutions for a         perparameters, the dataset version, and the selected fea-
wide range of problems, it is always crucial to assess        tures. Similarly, the metrics selected for model evaluation
whether using it is sensible and feasible for the specific    can be logged into MLflow, together with any experimen-
problem at hand, considering factors like availability of     tal output. The outcomes of experimental runs can then
labeled data and computing resources.                         be visually compared in a dashboard offered through a
   Inspired by the Business Model Canvas, the Machine         web application. Once the best run has been determined,
Learning Canvas by Goku Mohandas [10] can serve as            the resulting model can be registered in a model registry
a useful template to facilitate this decision-making pro-     within the dedicated MLflow module (MLflow Registry).
cess. It encourages thinking on both product and system       If used consistently to register models and update their
design aspects, clarifying the motivation, key objectives,    status, the model registry becomes the centralized store
feasibility, and high-level strategy for building the pro-    of production-grade models and related metadata – i.e.,
posed ML-enabled solution.                                    if a deployed model is pulled from a model registry, it is
                                                              easy to trace back the particular experimental run that
4.2. Ensuring the Reproducibility and                         produced it.
     Traceability of ML Pipelines
Once the basic requirements for the desired product have
been specified, data engineers and data scientists can
start working together to build the ML models that will
power the final product. In doing so, they should take
care of defining a reproducible and traceable pipeline.       1
                                                                  https://dvc.org
                                                              2
                                                                  https://mlflow.org
4.3. Fostering Quality Assurance of ML                         within distributed architectures, facilitating their deploy-
     Artifacts                                                 ment and scalability.
                                                                  With respect to this, our solution framework endorses
A major criticism raised by software engineers towards FastAPI,7 a specialized Python framework for developing
data scientists concerns the poor code quality of exper- OpenAPI-compliant web APIs. By leveraging FastAPI,
imental ML artifacts, particularly computational note- data scientists can efficiently build standardized and well-
books [11, 12]. Integrating data science tools with static documented APIs for their ML models, benefiting from
analyzers and testing utilities could significantly improve its high performance capabilities and first-class support
code quality. In this regard, our framework promotes the for asynchronous code.
adoption of pytest3 as a testing framework and ruff4 as a
static analyzer for Python scripts. Moreover, in projects
that include computational notebooks, we recommend 4.5. ML Component Delivery
the use of Pynblint,5 i.e., a specialized linting solution for In addition to exposing API endpoints, models must be
Jupyter Notebook documents.                                    packaged in a portable way and automatically deployed
   Nonetheless, the quality of ML-enabled systems ex- to production environments. To accomplish this, our
tends beyond code and is largely determined by the qual- MLOps solution framework embraces Infrastructure as
ity of data and models. It is widely acknowledged that Code (IaC), a well-established DevOps methodology. The
model performance can be substantially impacted by the typical approach involves packaging ML models, along
quality of training data, which often fails to meet ideal with their web API components, into software containers
standards in real-world scenarios. Addressing data qual- leveraging IaC techniques. In our solution framework,
ity issues, such as biases, noise, and scarcity, is crucial we advocate for the use of Docker,8 which has established
to developing reliable and effective ML-enabled systems. itself as the de facto standard containerization technol-
To this aim, our solution framework provides for the use ogy during the last decade. Using Docker, ML models
of Deepchecks,6 a commercial tool with an open-source can be shipped as immutable and portable software pack-
core. Deepchecks can be used to test training data for ages that are consistently reproducible across different
outliers and other anomalies; moreover, it reveals issues deployment environments. This containerized approach
like the leakage of test data in training datasets.            aligns with modern cloud-native architectures.
   In addition to assessing performance metrics, the              Deployed models need to be properly documented. As
quality of models can be further evaluated using dedi- a standardized format to consistently document the de-
cated testing approaches. Where applicable, our solution livered machine learning components, model cards can
framework recommends the development of behavioral be adopted to report essential model attributes. Model
model tests. Originally proposed by Ribeiro et al. [13], cards are simple Markdown documents describing the
these tests are designed to ensure specific model capa- model, its intended uses and potential limitations, in-
bilities. For instance, in the case of an NLP model, data cluding biases and ethical considerations, the training
scientists might want to verify that the model can handle parameters and experimental information, the datasets
negations appropriately. These tests can be implemented used for training, and the model evaluation results. This
using the same testing framework employed for verifying type of documentation was first proposed by Mitchell
code correctness (in our framework, pytest).                   et al. in [14] and gained popularity through its adoption
                                                               by Hugging Face,9 a prominent AI community and ma-
4.4. Developing APIs for ML Components chine learning hub. Beyond model cards, Hugging Face
                                                               users typically document the datasets used with Dataset
To enable seamless integration of models into larger sys- Cards, which outline basic details about the data as well
tems, they are typically encapsulated within dedicated as information on how to use the data responsibly (e.g.,
APIs. Specifically, given the widespread adoption of mi- potential biases within the dataset). Dataset cards help
croservices and serverless architectures, which predomi- users understand the contents of the dataset and provide
nantly rely on the HTTP protocol for inter-component context for how it should be used.
communication, a common pattern is to expose ML mod-              A further step towards the end-to-end automation of
els through web APIs, using either REST or RPC ap- ML pipelines is adopting CI/CD (Continuous Integration/-
proaches. By wrapping models with standardized web Continuous Deployment) solutions for the automated de-
APIs, they can be more easily consumed and orchestrated ployment of containerized ML components. Automating
                                                               this step of the workflow offers two key benefits: expe-
                                                               diting the deployment of model updates and minimizing
3
  https://docs.pytest.org
4                                                             7
  https://docs.astral.sh/ruff/                                    https://fastapi.tiangolo.com
5                                                             8
  https://github.com/collab-uniba/pynblint                        https://www.docker.com
6                                                             9
  https://deepchecks.com                                          https://huggingface.co
human errors through consistent, rigorous quality as-           Accordingly, our future work will prioritize enhancing
surance checks before deployment. Several CI/CD tools        the security of the MLOps framework. Robust authenti-
with similar capabilities are currently available. To re-    cation, authorization, and encryption protocols will safe-
duce friction in adopting this practice, our framework       guard patient data and ensure system integrity through
recommends leveraging the CI/CD service integrated           API security.
with the chosen code hosting platform for sharing Git           In addition, we aim to explore automated report gener-
repositories, such as GitHub Actions10 for GitHub or Git-    ation as a means to facilitate compliance with regulatory
Lab CI/CD11 for GitLab. A notable advantage of GitLab        bodies and enable efficient auditing processes. By leverag-
CI/CD is the ability to deploy the entire code hosting       ing CI/CD workflows to generate comprehensive reports
platform on-premises, which can be beneficial for insti-     and documentation, we expect to streamline compliance
tutions with policies prohibiting external code hosting.     efforts, thereby facilitating the certification of ML models
                                                             and ML-enabled eHealth systems as medical devices.
4.6. ML Component Monitoring
To ensure the continued availability and performance         6. Conclusion
of deployed ML-enabled components, continuous moni-
toring is essential. A comprehensive monitoring system       The approach presented in this paper represents a
should track both the resource utilization of ML compo-      comprehensive and robust framework for the develop-
nents and the performance of the underlying ML models        ment, deployment, and monitoring of ML models within
themselves, as model performance often degrades over         eHealth systems. By leveraging industry-standard prac-
time. Establishing robust monitoring practices main-         tices and tools, it addresses the critical aspects of repro-
tains a crucial feedback loop, enabling ML engineers to      ducibility, traceability, and quality assurance throughout
promptly identify and replace underperforming models         the entire machine learning lifecycle.
as needed.                                                      The approach prioritizes a structured foundation for
   A wide range of solutions can be leveraged for this       coding, emphasizing clear problem statements, data
purpose, ranging from general-purpose monitoring tools       sources, and evaluation metrics. It integrates version
like Prometheus,12 Grafana,13 and the ELK stack14 to spe-    control and experiment tracking for reproducibility and
cialized software specifically designed for monitoring ML    collaboration. Rigorous quality assurance is applied to
systems, such as the monitoring module of Deepchecks.        data and models, ensuring integrity and ethical consider-
Within our solution framework, we favor the adoption         ations. MLOps practices streamline deployment for effi-
of the popular open-source stack of Prometheus and           cient model deployment. Continuous monitoring detects
Grafana. Their general-purpose nature allows for setting     issues early, fostering reliability and trust in developed
up custom monitoring services that holistically track the    models.
overall health of ML components, encompassing resource          Ultimately, this comprehensive and methodical ap-
utilization metrics as well as ML model performance in-      proach provides a solid foundation for healthcare or-
dicators. Moreover, being open-source, both Prometheus       ganizations to harness the full potential of artificial in-
and Grafana can be seamlessly deployed on-premises           telligence while upholding responsible AI principles. It
using their official Docker containers, aligning with our    empowers stakeholders to develop and deploy machine
framework’s emphasis on open solutions and on-prem           learning models that are not only accurate and perfor-
deployability for healthcare use cases.                      mant but also interpretable, ethical, and maintainable
                                                             over time, driving innovation and positive impact across
                                                             various domains and industries.
5. Future Work
While the proposed MLOps framework lays a robust             Acknowledgments
foundation for deploying machine learning models in
healthcare, there are still additional issues that require   This study has been realized with the co-financing of
careful consideration. In particular, the complexity of      the Ministry of University and Research in the frame-
security and regulatory requirements in the healthcare       work of PNC ”DARE - Digital lifelong prevention project”
sector poses significant challenges that need to be thor-    (PNC0000002 – CUP B53C22006450001). The views and
oughly addressed.                                            opinions expressed are solely those of the authors and do
                                                             not necessarily reflect those of the European Union, nor
10
   https://github.com/features/actions                       can the European Union be held responsible for them.
11
   https://docs.gitlab.com/ee/ci/
12
   https://prometheus.io
13
   https://grafana.com
14
   https://www.elastic.co/elastic-stack
References                                                    [12] J. Wang, L. Li, A. Zeller, Better code, better sharing:
                                                                   on the need of analyzing jupyter notebooks, in:
 [1] E. J. Topol, High-performance medicine: the                   Proceedings of the ACM/IEEE 42nd International
     convergence of human and artificial intelligence,             Conference on Software Engineering: New Ideas
     Nature Medicine 25 (2019) 44–56. doi:10.1038/                 and Emerging Results, ICSE-NIER ’20, Association
     s41591- 018- 0300- 7 .                                        for Computing Machinery, New York, NY, USA,
 [2] A. Bohr, K. Memarzadeh,            Chapter 2 - the            2020, p. 53–56. doi:10.1145/3377816.3381724 .
     rise of artificial intelligence in healthcare ap-        [13] M. T. Ribeiro, T. Wu, C. Guestrin, S. Singh, Be-
     plications,       in: A. Bohr, K. Memarzadeh                  yond accuracy: Behavioral testing of NLP models
     (Eds.), Artificial Intelligence in Healthcare, Aca-           with CheckList, in: D. Jurafsky, J. Chai, N. Schluter,
     demic Press, 2020, pp. 25–60. doi:10.1016/                    J. Tetreault (Eds.), Proceedings of the 58th Annual
     B978- 0- 12- 818438- 7.00002- 2 .                             Meeting of the Association for Computational Lin-
 [3] D. Kreuzberger, N. Kühl, S. Hirschl, Machine learn-           guistics, Association for Computational Linguis-
     ing operations (mlops): Overview, definition, and             tics, Online, 2020, pp. 4902–4912. doi:10.18653/
     architecture, IEEE Access 11 (2023) 31866–31879.              v1/2020.acl- main.442 .
     doi:10.1109/ACCESS.2023.3262138 .                        [14] M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasser-
 [4] F. Lanubile, S. Martínez-Fernández, L. Quaranta,              man, B. Hutchinson, E. Spitzer, I. D. Raji, T. Ge-
     Teaching MLOps in Higher Education through                    bru, Model Cards for Model Reporting, in: Pro-
     Project-Based Learning, in: 2023 IEEE/ACM 45th                ceedings of the Conference on Fairness, Account-
     International Conference on Software Engineer-                ability, and Transparency, ACM, 2019, pp. 220–229.
     ing: Software Engineering Education and Training              doi:10.1145/3287560.3287596 .
     (ICSE-SEET), IEEE, 2023, pp. 95–100. doi:10.1109/
     ICSE- SEET58685.2023.00015 .
 [5] F. Lanubile, S. Martínez-Fernández, L. Quaranta,
     Training future ML engineers: A project-based
     course on MLOps, IEEE Software (2023) 1–9.
     doi:10.1109/MS.2023.3310768 .
 [6] T. Granlund, V. Stirbu, T. Mikkonen, Towards
     Regulatory-Compliant MLOps: Oravizio’s Journey
     from a Machine Learning Experiment to a Deployed
     Certified Medical Product, SN Computer Science 2
     (2021) 342. doi:10.1007/s42979- 021- 00726- 1 .
 [7] V. Stirbu, T. Granlund, T. Mikkonen, Continuous de-
     sign control for machine learning in certified med-
     ical systems, Software Quality Journal 31 (2023)
     307–333. doi:10.1007/s11219- 022- 09601- 5 .
 [8] G. Lombardo, M. Picone, M. Mamei, M. Mordonini,
     A. Poggi, Digital Twin for Continual Learning in
     Location Based Services, Engineering Applications
     of Artificial Intelligence 127 (2024) 107203. doi:10.
     1016/j.engappai.2023.107203 .
 [9] H. Toivakka, T. Granlund, T. Poranen, Z. Zhang,
     Towards regops: A devops pipeline for medical
     device software, in: International Conference
     on Product-Focused Software Process Improve-
     ment, Springer, 2021, pp. 290–306. doi:10.1007/
     978- 3- 030- 91452- 3_20 .
[10] G. Mohandas, Machine learning systems design,
     https://madewithml.com/, 2023.
[11] J. a. F. Pimentel, L. Murta, V. Braganholo, J. Freire,
     A large-scale study about quality and reproducibil-
     ity of jupyter notebooks, in: Proceedings of the
     16th International Conference on Mining Software
     Repositories, MSR ’19, IEEE Press, 2019, p. 507–517.
     doi:10.1109/MSR.2019.00077 .