=Paper=
{{Paper
|id=Vol-3917/paper60
|storemode=property
|title=Automating machine learning: A meta-synthesis of MLOps tools, frameworks and architectures
|pdfUrl=https://ceur-ws.org/Vol-3917/paper60.pdf
|volume=Vol-3917
|authors=Danylo O. Hanchuk,Serhiy O. Semerikov
|dblpUrl=https://dblp.org/rec/conf/cs-se-sw/HanchukS24
}}
==Automating machine learning: A meta-synthesis of MLOps tools, frameworks and architectures==
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Automating machine learning: A meta-synthesis of MLOps
tools, frameworks and architectures
Danylo O. Hanchuk1 , Serhiy O. Semerikov1,2,3,4,5
1
Kryvyi Rih State Pedagogical University, 54 Universytetskyi Ave., Kryvyi Rih, 50086, Ukraine
2
Institute for Digitalisation of Education of the NAES of Ukraine, 9 M. Berlynskoho Str., Kyiv, 04060, Ukraine
3
Zhytomyr Polytechnic State University, 103 Chudnivsyka Str., Zhytomyr, 10005, Ukraine
4
Kryvyi Rih National University, 11 Vitalii Matusevych Str., Kryvyi Rih, 50027, Ukraine
5
Academy of Cognitive and Natural Sciences, 54 Universytetskyi Ave., Kryvyi Rih, 50086, Ukraine
Abstract
Automating the end-to-end lifecycle of machine learning models is critical for their effective operationalization.
Various tools, frameworks and architectures have emerged to support Machine Learning Operations (MLOps)
practices. This paper presents a meta-synthesis of existing reviews to provide a comprehensive overview of such
enabling technologies for MLOps. The capabilities and features offered by common commercial and open-source
MLOps platforms are compared. Patterns in the MLOps architecture and design philosophies are identified. The
role of containers, orchestration, configuration management, and infrastructure automation in ML pipelines is
examined. Approaches for model deployment on cloud and edge are also discussed. The synthesis offers insights
for tool selection and usage to automate enterprise-scale machine learning.
Keywords
MLOps, automation, tools, frameworks, architecture, model deployment, ML pipelines, meta-synthesis
1. Introduction
In the modern world, machine learning is becoming an increasingly important technology that finds
application in various fields such as finance, healthcare, industry, retail, etc. [1] However, despite
significant progress in the development of machine learning algorithms and models, their effective
deployment in production environments remains a challenging task [2, 3]. This is due to a number of
factors, such as the need to ensure scalability, reproducibility, security, and reliability of models, as well
as the complexity of integrating development and operation processes.
To solve these problems, the MLOps (Machine Learning Operations) methodology has emerged, which
aims to apply the principles and practices of DevOps to the development and deployment processes
of machine learning models [4, 5]. MLOps covers a wide range of practices, such as automation of
machine learning pipelines, versioning of data and models, monitoring model performance, experiment
management, etc. [6, 7]. Research shows that the application of MLOps practices can significantly
increase the efficiency and reliability of deploying machine learning models in production environments
[8, 9].
At the same time, despite the significant interest in the topic of MLOps from both scientists and
practitioners, there are still certain gaps and unresolved problems in this area. In particular, there are no
generally accepted standards and best practices for implementing MLOps, issues of integrating MLOps
with other approaches (DataOps, ModelOps, AIOps, etc.) are insufficiently researched, and there is a
need to develop new tools and platforms to automate MLOps processes [5, 10, 11].
This work is aimed at solving the current problem of defining and analysing MLOps practices
necessary for the effective deployment of machine learning models. The basis for performing the work
CS&SE@SW 2024: 7th Workshop for Young Scientists in Computer Science & Software Engineering, December 27, 2024, Kryvyi
Rih, Ukraine
" danilhanchuk@gmail.com (D. O. Hanchuk); semerikov@gmail.com (S. O. Semerikov)
~ https://acnsci.org/semerikov (S. O. Semerikov)
0009-0004-6474-3521 (D. O. Hanchuk); 0000-0003-0789-0272 (S. O. Semerikov)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
362
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
is the need to systematize and generalize knowledge about MLOps practices, as well as the need to
develop recommendations for their implementation in organizations to increase the efficiency and
reliability of deploying machine learning models in production environments.
According to the aim, the following main objectives of the study are defined:
1. Perform a meta-synthesis of systematic reviews to generalize knowledge about MLOps practices
necessary for the effective deployment of machine learning models.
2. Analyze the relationships between MLOps principles, processes, and practices.
3. Identify the most effective MLOps practices for deploying machine learning models.
2. Meta-synthesis of MLOps practices
2.1. Main concepts of the study
DevOps (Development & Operations) is becoming increasingly widespread, and companies are ap-
plying its methods in various fields [6]. In this context, MLOps (Machine Learning & Operations)
automates ML workflows, such as pipelines, applying DevOps practices (implementing continuous
integration/continuous deployment (CI/CD) for machine learning projects) [12, 6, 8].
According to Calefato et al. [12], key MLOps practices can be realized using GitHub Actions and CML
(Continuous Machine Learning). While some workflows automate ML tasks with GitHub Actions and
CML, production-grade, end-to-end MLOps pipelines appear rare in the analysed open-source GitHub
projects. Practices focus more on reporting and metrics rather than retraining or deployment.
2.2. Research methodology
Systematic reviews can provide syntheses of the state of knowledge in a field, from which future
research priorities can be identified; they can address questions that otherwise could not be answered
by individual studies; they can identify problems in primary research that should be rectified in future
studies; and they can generate or evaluate theories about how or why phenomena occur [13, p. 1]. The
main aim of a systematic review is facilitating evidence-based decision-making [13, p. 6]. The main
difference between systematic and literature reviews is the “place of idea”. The literature review can be
idea-driven; thus, all sources can be selected to confirm some idea. Indeed, the systematic review is the
scientific method; instead of ideas, we operate with research questions and hypotheses. As a result, the
systematic review can produce new evidence-based knowledge.
On 23 February 2024, we proceeded with a search request to the Scopus database by article title:
TITLE ( ( systematic OR review OR survey ) AND mlops )
5 documents were found (table 1), of which 3 [6, 8, 5] relate to systematic reviews.
Table 1: Results of searching for existing systematic reviews in Scopus.
Bibliographic description Review content
G. Recupito, F. Pecorelli, G. Catolino, Recupito et al. [6] conducted a “multivocal” literature review – a kind of
S. Moreschini, D. D. Nucci, F. Palomba, systematic review that uses both “white” (articles, book chapters, etc.)
D. A. Tamburri, A Multivocal Literature and “grey” sources (blog posts, technical documents, videos, etc.). The
Review of MLOps Tools and Features, authors’ aim was to identify tools for creating MLOps pipelines and
in: 2022 48th Euromicro Conference analyze their main characteristics and features. The authors investigated
on Software Engineering and Advanced the functionality of 13 MLOps tools and showed that most MLOps
Applications (SEAA), 2022, pp. 84–91. tools support the same features but apply different approaches that can
doi:10.1109/SEAA56994.2022.00021. provide different advantages depending on user requirements.
Continuation on the next page
363
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Continuation of table 1
Bibliographic description Review content
A. Lima, L. Monteiro, A. P. Furtado, Lima et al. [8] conducted a systematic literature review to identify
MLOps: Practices, Maturity Models, practices, standards, roles, maturity models, challenges, and tools of
Roles, Tools, and Challenges – A System- MLOps. 30 articles were selected for analysis. The study results allowed
atic Literature Review, in: Proceedings of to conclude that MLOps is still at an initial stage.
the 24th International Conference on En-
terprise Information Systems - Volume
1: ICEIS, INSTICC, SciTePress, 2022, pp.
308–320. doi:10.5220/0010997300003179.
C. Haertel, D. Staegemann, C. Daase, M. Haertel et al. [14] provided an overview of MLOps applications in Data
Pohl, A. Nahhas, K. Turowski, MLOps Science projects. The authors showed that when considering contempo-
in Data Science Projects: A Review, in: rary MLOps approaches, the emphasis is placed on model development
2023 IEEE International Conference on and deployment, while organizational aspects (business understanding,
Big Data (BigData), 2023, pp. 2396–2404. evaluation) receive insufficient attention. Since Data Science project
doi:10.1109/BigData59044.2023.10386139. success does not exclusively depend on technical matters, the authors
propose that future research should continue to advance the MLOps
field by bridging the gap between business objectives of the organi-
zation and how these objectives are represented and modelled using
appropriate concepts.
R. Cohen, Digital Strategy, Machine As part of a digital strategy, machine learning (ML) has become a com-
Learning, and Industry Survey of MLOps, mon toolset and capability across many businesses. However, the op-
in: Digital Strategies and Organizational erational aspects of machine learning (MLOps) are often overlooked
Transformation, 2023, pp. 137–150. for ML projects until they are already installed and being executed in
URL: https://tinyurl.com/33z6zpd3. the business environment. This chapter provides a review of MLOps
doi:10.1142/9789811271984_0008. [15] products and vendors to give data scientists the ability to set up the
appropriate ML infrastructure in a proactive manner.
J. Diaz-de Arcaya, A. I. Torre-Bastida, G. Diaz-de Arcaya et al. [5] analyze the challenges, opportunities, and
Zárate, R. Miñón, A. Almeida, A Joint perspectives of implementing MLOps and AIOps. The authors ana-
Study of the Challenges, Opportunities, lyzed the open issues, opportunities, and trends faced by organizations
and Roadmap of MLOps and AIOps: A when implementing MLOps and AIOps, the frameworks and architec-
Systematic Survey, ACM Comput. Surv. tures, as well as the fields of their use. The systematic review of 93
56 (2023) 84. doi:10.1145/3625289. studies provided an opportunity to identify: 1) successful implemen-
tation of artificial intelligence projects requires a collaborative culture
and a combination of software engineering, data science, and DevOps
skills; 2) containerization, data and model versioning, FaaS (Function-
as-a-Service) and serverless architectures are useful for supporting the
MLOps/AIOps lifecycle; 3) monitoring the environment is important
for retraining and redeploying components; 4) AIOps is used predomi-
nantly in complex environments, such as 5G and 6G technologies, while
MLOps is more common in traditional industrial environments.
The papers by Haertel et al. [14] and Cohen [15] are not systematic reviews, but the results obtained in
these works were taken into account when performing the meta-synthesis [16] for combining (appendix
B) and thematic analysis of the results obtained in the systematic reviews.
The meta-synthesis was performed according to Chrastina [17, pp. 123-125]:
1. Defining the subject of research: MLOps practices for the effective deployment of models.
2. Identifying relevant sources: systematic literature reviews [6, 8, 5] and a review of MLOps products
and providers [15].
3. Thorough study to determine the common time period for the analyzed works, similarities and
differences: in the aim, research questions, sources, inclusion, exclusion, and quality criteria,
definitions of MLOps and MLOps stages.
364
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
4. Defining the relationship between works through identifying and grouping key topics.
5. Mutual translation of results of different works through defining common terminology, explaining
contradictions in the results from different works, and generalizing results from different works.
6. Synthesis of results.
7. Publication of the meta-synthesis.
2.3. Thorough study and defining the relationship between works
2.3.1. Distribution of reviews by year
Two works [6, 8] refer to 2022, one [5] – to 2023. At the same time, the sources analyzed in [6] are
limited to 2020, in [8] – to 2021, and in [5] – to 2023. In addition, the work [5] mentions the work [8] as
a previous one.
2.3.2. Review objectives
The aim of the meta-synthesis of the review objectives [6, 8, 5] was to identify common and different
aspects regarding the general focus and tasks of these studies.
Common aspects of the objectives of the considered reviews are:
1. All reviews [6, 8, 5] are aimed at researching and generalizing knowledge about the MLOps
methodology, its practices, tools, and challenges.
2. The reviews [6, 8] aim to identify and analyze MLOps tools used to automate the development
and deployment processes of machine learning models.
3. The reviews [5, 8] seek to provide an understanding of the general state of MLOps practices
implementation in industry and academia.
Distinct aspects of the review objectives are:
1. The review [6] focuses more on identifying and analyzing the functional capabilities of MLOps
tools for creating machine learning pipelines.
2. The review [8] pays attention to a wider range of MLOps aspects, such as practices, roles, maturity
models, challenges, in addition to tools.
3. The review [5], in addition to the MLOps methodology, also considers the related concept of
AIOps. More attention is paid to highlighting the opportunities, challenges, and future trends in
both areas.
Thus, despite some differences in focus and breadth of coverage, all the considered reviews are
united by a common goal – to investigate and generalize knowledge about the MLOps methodology, its
practical application, tools, challenges, and state of implementation to promote further development of
this area.
2.3.3. Review research questions
The aim of the meta-synthesis of the research questions of the reviews [6, 8, 5] was to identify com-
mon and different aspects regarding the main directions of research within the study of the MLOps
methodology.
Common aspects of the research questions of the considered reviews are:
365
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
1. All reviews [6, 8, 5] contain questions about tools and platforms used to implement MLOps
practices, automate development processes, deployment, and monitoring of machine learning
models.
2. The reviews [8, 5] include questions regarding challenges and open problems faced by organiza-
tions when implementing MLOps.
3. The reviews [8, 5] consider questions about opportunities and future trends in the field of MLOps.
Distinct aspects of the research questions of the reviews are:
1. The review [6] contains more specific questions about the functional capabilities and features of
MLOps tools for creating machine learning pipelines.
2. The review [8] includes questions regarding the roles and responsibilities of specialists involved
in MLOps implementation, as well as maturity models for assessing the level of automation of
model deployment processes.
3. The review [5], in addition to MLOps, also considers questions specific to the AIOps methodology
and pays attention to current and future areas of application of these approaches.
Thus, the research questions of the considered reviews cover a wide range of MLOps aspects, from
tools and platforms to challenges, opportunities, and areas of application. Despite some differences in
the focus of the questions, all reviews seek to explore the key components and factors that influence
the implementation and development of MLOps practices in organizations.
2.3.4. Review information sources
The aim of the meta-synthesis of the information sources of the reviews [6, 8, 5] was to identify common
and different aspects regarding the databases, search engines, and types of literature used to search for
relevant studies.
Common aspects of information sources of the considered reviews are:
1. All reviews [6, 8, 5] used electronic databases of scientific publications to search for relevant
studies.
2. The reviews [6, 5] included both academic (peer-reviewed) and non-academic (“grey”) literature
sources such as blogs, websites, videos, code repositories, etc. in the search.
Distinct aspects of information sources of the reviews are:
1. The review [6] used Google Scholar to search for scientific publications and regular Google search
for “grey” literature.
2. The review [8] limited the search to only academic databases such as ACM Digital Library, IEEE
Xplore, ScienceDirect, and SpringerLink.
3. The review [5] used several databases (Scopus, arXiv, Springer, IEEE), but the main source was
the Scopus database from Elsevier.
Thus, the considered reviews demonstrate different approaches to the selection of information
sources. Some studies [6, 5] include both academic and non-academic sources to obtain a more complete
picture of the practical application of MLOps. Others [8] focus exclusively on peer-reviewed scientific
publications. The choice of sources can influence the coverage and type of studies found, and hence the
results and conclusions of the reviews.
366
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
2.3.5. Criteria for including information sources in reviews
The aim of the meta-synthesis of the criteria for including information sources in the reviews [6, 8, 5]
was to identify common and different aspects regarding the requirements that studies must meet for
inclusion in the analysis.
Common aspects of the inclusion criteria in the considered reviews are:
1. All reviews [6, 8, 5] included studies that directly relate to the topic of MLOps, its practices, tools,
and application.
2. The reviews [6, 8] considered studies that describe the experience, practices, architecture, or
implementation of MLOps tools and processes.
Distinct aspects of the inclusion criteria in the reviews are:
1. The review [6] included studies that describe the components of the minimal MLOps lifecycle or
present the experience and opinions of experts regarding MLOps.
2. The review [8] additionally included studies that assess the maturity of MLOps processes, consider
roles and responsibilities in the ML model development lifecycle, and identify challenges in
developing and implementing MLOps solutions.
3. The review [5] had more general inclusion criteria, considering studies published from 2018 to
2023 that contain new ideas and are closely related to the topic of MLOps and AIOps.
Thus, despite some differences, the inclusion criteria in the considered reviews mainly focus on
studies that directly relate to MLOps, describe practical experience, tools, and processes, and consider
various aspects of the ML model development lifecycle. The reviews [8, 5] have somewhat broader
criteria, also including studies related to maturity assessment, roles, and MLOps challenges.
2.3.6. Criteria for excluding information sources from reviews
The aim of the meta-synthesis of the criteria for excluding information sources from the reviews [6, 8, 5]
was to identify common and different aspects regarding the characteristics of studies that lead to their
exclusion from the analysis.
Common aspects of the exclusion criteria in the considered reviews are:
1. The reviews [6, 8] excluded studies that do not provide sufficient details about the architecture,
implementation, or application of MLOps tools and processes.
2. The reviews [8, 5] excluded studies published in languages other than English.
Distinct aspects of the exclusion criteria for information sources from the reviews are:
1. The review [6] excluded studies that promote commercial MLOps platforms without providing
details on their implementation or use.
2. The review [8] excluded studies that relate only to the application of ML models without con-
sidering MLOps aspects, as well as short articles, posters, and studies without access to the full
text.
3. The review [5] excluded studies with an insufficient number of citations (depending on the year
of publication), as well as materials with limited access (by subscription) and articles published in
insufficiently reliable sources.
Thus, the considered reviews apply different exclusion criteria to filter out studies that do not meet
their requirements. The common aspect is the exclusion of studies with insufficient descriptions of
MLOps processes and tools, as well as non-English publications. Differences lie in additional criteria,
such as the exclusion of commercial platforms without technical details [6], short articles and posters
[8], and materials with limited access and a low number of citations [5].
367
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
2.3.7. Quality criteria for information sources in reviews
The aim of the meta-synthesis of the quality criteria for the reviews [6, 8, 5] was to identify common
and different aspects regarding the requirements for the quality and reliability of studies included in
the analysis.
Common aspects of the quality criteria in the considered reviews are:
1. The reviews [8, 5] evaluated the quality of studies based on the completeness of the description
of the methodology, context, and results.
2. The reviews [8, 5] considered the presence of substantiated evidence and arguments to support
the study conclusions.
Distinct aspects of the quality criteria in the reviews are:
1. The review [6] used quantitative indicators of popularity (number of stars on Github, views on
YouTube) to assess the quality and relevance of “grey” literature.
2. The review [8] evaluated whether the study presents empirical results, not just expert opinions,
and whether the results are properly validated.
3. The review [5] used an extended set of criteria, including the presence of a comprehensive
literature review, verification of results on use cases, the number of research questions addressed,
the availability of open access, publication in high-impact journals, and the number of citations.
Thus, the considered reviews apply different approaches to assessing the quality of studies. The
common aspect is the desire to include studies with a complete description of the methodology and
substantiated results. However, the specific quality criteria differ: from the use of popularity indicators
for “grey” literature [6], to the assessment of the empirical nature of the results [8] and consideration of
bibliometric indicators such as journal impact factor and number of citations [5].
2.4. Mutual translation of results from different works and synthesis of results
For the mutual translation of the results from different works, in addition to systematic reviews [6, 8, 5],
a review of MLOps products and providers [15] was involved.
2.4.1. Definition of MLOps
The aim of the meta-synthesis of MLOps definitions in the reviews [6, 8, 5, 15] was to identify common
and different aspects in the understanding and interpretation of this concept.
Common aspects of MLOps definitions in the considered reviews are:
1. All reviews [6, 8, 5, 15] consider MLOps as a set of practices, principles, and processes for
automating and managing the lifecycle of machine learning models.
2. The reviews [6, 5] emphasize the use of approaches and practices from DevOps in MLOps, such
as continuous integration, delivery, and monitoring.
3. The reviews [8, 15] emphasize the role of MLOps in operationalizing machine learning solutions
and transferring them to industrial operation.
Distinct aspects of MLOps definitions in the reviews are:
1. The review [6] focuses more on the technical aspects of MLOps, such as model lifecycle manage-
ment, pipeline automation, and performance monitoring.
2. The review [8] considers MLOps as a set of practices specifically for operationalizing data science
solutions.
368
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
3. The review [5] emphasizes the use of software engineering and machine learning principles in
MLOps to create model-based products.
4. The review [15] considers MLOps as a separate area that focuses on automating the machine
learning model lifecycle as part of companies’ digital strategies.
Thus, despite some differences in emphasis and wording, all the considered reviews define MLOps
as an approach for managing, automating, and operationalizing the processes of developing,
deploying, and supporting machine learning models based on practices from software en-
gineering and DevOps. MLOps is a key component for the successful implementation of machine
learning solutions in an industrial environment.
2.4.2. Stages of the MLOps workflow
The aim of the meta-synthesis of the stages of the MLOps workflow in the reviews [6, 8, 5, 15] was
to identify the most common steps in the lifecycle of developing and implementing machine learning
models.
Common stages of the MLOps workflow in the considered reviews are:
1. All reviews [6, 8, 5, 15] include the stages of data collection and processing, model development
and training, and model deployment in the working environment.
2. The reviews [6, 5, 15] highlight the stage of monitoring the performance and degradation of
deployed models as an important part of the MLOps workflow.
3. The reviews [6, 15] include the stage of retraining models based on new data or on a schedule as
part of the MLOps lifecycle.
Distinct aspects of the MLOps workflow stages in the reviews are:
1. The review [6] provides a detailed breakdown of the workflow stages, including the steps of data
extraction, analysis, cleaning, and transformation, as well as model validation.
2. The review [8] focuses less on detailed stages and more on general MLOps functions, such as
data collection, transformation, model training, and implementation.
3. The review [5] groups stages into broader categories, such as data management, distributed
training, deployment, and monitoring.
4. The review [15] additionally highlights the stages of generating predictions and managing models
and data as part of the MLOps workflow.
Thus, despite different levels of detail and grouping, the considered reviews demonstrate general
consistency regarding the main stages of the MLOps workflow. These stages cover the entire lifecycle
of machine learning models, from data collection and processing to deployment, monitoring, and
retraining of models. Differences in the presentation of stages reflect different approaches to structuring
and describing the MLOps workflow.
2.4.3. Frameworks and architectures that facilitate MLOps implementation
The aim of the meta-synthesis of frameworks and architectures that facilitate MLOps implementation
in the reviews [6, 8, 5, 15] was to identify the most common and effective approaches and technologies
in this area.
Common frameworks and architectures that facilitate MLOps implementation, according to the consid-
ered reviews, are:
369
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
1. The reviews [6, 8, 5] highlight open-source platforms and frameworks such as MLflow, Kubeflow,
and TensorFlow Extended (TFX) as key components of the MLOps ecosystem.
2. The reviews [6, 5, 15] emphasize the importance of using cloud computing platforms and services
such as AWS, Google Cloud, and Azure to deploy and scale MLOps solutions.
3. The reviews [8, 5] note that architectures based on containerization (e.g., using Docker) and
container orchestration (e.g., using Kubernetes) are key to ensuring portability and scalability of
MLOps solutions.
Distinct aspects of the considered MLOps frameworks and architectures in the reviews are:
1. The review [6] additionally highlights MLOps pipeline orchestration platforms such as Apache
Airflow, Jenkins, and Polyaxon.
2. The review [8] additionally mentions tools such as Kubeflow, Polyaxon, Comet.ml, Kafka-ML,
MLModelCI for managing pipelines and deploying models.
3. The review [5] considers a broader range of architectural approaches, including the use of edge
computing, serverless computing, and event-driven architectures.
4. The review [15] focuses primarily on proprietary platforms and solutions from commercial
providers such as Iguazio, Domino Data Lab, Comet, and Valohai.
Thus, there are many frameworks and architectural approaches that facilitate MLOps implementation,
from open platforms and libraries to commercial solutions and cloud services. Key factors are support
for automation, scalability, portability, and integration with existing systems and tools. The choice of
appropriate frameworks and architectures depends on the specific requirements and constraints of the
organization, as well as the level of maturity of its MLOps processes.
2.4.4. MLOps tools for creating machine learning pipelines and operationalizing models
The aim of the meta-synthesis of MLOps tools for creating machine learning pipelines and operational-
izing models in the reviews [6, 8, 5, 15] was to identify the most popular and functional tools in this
area.
Common MLOps tools mentioned in the considered reviews are:
1. The reviews [6, 8] highlight MLflow as a popular open-source platform for managing the lifecycle
of machine learning models, experiments, and deployment.
2. The reviews [6, 15] mention cloud platforms from major providers such as AWS SageMaker,
Google Cloud AI Platform, Azure Machine Learning, as tools for operationalizing models.
3. The reviews [8, 5] note that containerization tools such as Docker and orchestration tools such
as Kubernetes are often used to deploy models.
Distinct aspects of the considered MLOps tools in the reviews are:
1. The review [6] provides a detailed list of tools for different stages of the MLOps pipeline, including
orchestration platforms (Apache Airflow, Jenkins, Kubeflow, Polyaxon, Seldon Core, etc.) and
deployment (TensorFlow Extended).
2. The review [8] additionally mentions tools such as Kubeflow, Polyaxon, Comet.ml, Kafka-ML,
MLModelCI for managing pipelines and deploying models.
3. The review [5] focuses more on general categories of tools, such as experiment management
systems, data and model versioning, and infrastructure automation.
370
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Table 2
Popular MLOps platforms and products, and associated providers (based on [15, p. 141]).
Platform/product Provider URL
MLflow MLflow https://mlflow.org/
Google Cloud AI Google https://cloud.google.com/products/ai
Kaggle Kaggle https://www.kaggle.com/
SageMaker Amazon https://aws.amazon.com/sagemaker/
Cloud-Native Toolkit IBM https://develop.cloudnativetoolkit.dev/
resources/workshop/ai/
Iguazio MLOps Platform Iguazio https://www.iguazio.com/
Azure Machine Learning Microsoft https://azure.microsoft.com/en-us/
products/machine-learning
Huawei Cloud ModelArts Huawei https://www.huaweicloud.com/intl/en-us/
product/modelarts.html
SparkCognition Generative AI Suite SparkCognition https://www.sparkcognition.com/products/
sparkcognition-generative-ai-suite
Comet Comet https://www.comet.com/site/
Grid.AI Grid.AI https://www.grid.ai/
Modzy ModelOps Platform Modzy https://github.com/modzy
Valohai MLOps Platform Valohai https://valohai.com/
HPE Ezmeral ML Ops Hewlett Packard Enterprise https://www.hpe.com/us/en/software/
ezmeral-ml-ops.html
Domino Enterprise MLOps Platform Domino https://domino.ai/
4. The review [15] details the functionality of popular commercial MLOps platforms such as Iguazio,
Domino Data Lab, Comet, Valohai, etc. (table 2)
Thus, there is a wide range of MLOps tools for creating machine learning pipelines and operational-
izing models, from open platforms such as MLflow, to commercial solutions from cloud providers and
specialized companies. The choice of specific tools depends on the needs and scale of the organization,
as well as compatibility with the existing technology stack.
2.4.5. Main features offered by MLOps tools
The aim of the meta-synthesis of the main features offered by MLOps tools in the reviews [6, 8, 5, 15]
was to identify the key capabilities and components of these tools.
Common features of MLOps tools, highlighted in the considered reviews, are:
1. The reviews [6, 8, 5] note that MLOps tools usually provide capabilities for tracking experiments,
versioning models and data.
2. The reviews [6, 8, 15] emphasize the importance of automation and orchestration features of
MLOps workflows, such as model training and deployment pipelines.
3. The reviews [6, 5, 15] indicate the presence in MLOps tools of components for monitoring the
performance and degradation of deployed models.
Distinct aspects of MLOps tool features, considered in the reviews, are:
1. The review [6] provides a detailed classification of features into three categories:
a) general features related to all stages of the MLOps pipeline:
• open source support;
• scalability and elasticity;
• extensibility;
371
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
• cloud-agnostic or cloud environment support;
• metadata management;
• continuous integration and delivery (CI/CD);
• user interfaces: graphical (GUI), command line (CLI), application programming interface
(API);
b) data management features:
• real-time data streaming;
• data storage;
• data analysis, cleaning, and transformation;
• data monitoring;
• metadata management;
• providing data access via API;
c) model management features:
• support for various machine learning libraries and frameworks;
• experiment tracking and model versioning;
• model registry;
• automatic hyperparameter optimization;
• model testing (A/B testing);
• anomaly and model drift detection;
• model performance monitoring;
• model metadata management;
• model deployment via API.
2. The review [8] additionally highlights features such as automatic hyperparameter optimization
of models and mobility support for deployment in different environments.
3. The review [5] notes the importance of integrating MLOps tools with existing systems and
supporting collaborative work of teams.
4. The review [15] details the features of commercial MLOps platforms:
• model development: environment for data analysis, feature development, training, and model
experiments;
• operationalization of model training: creating reproducible pipelines for model training and
testing;
• continuous model training: automatic support for the frequency of model retraining based
on schedule, events, or ad-hoc requests;
• model deployment: packaging, testing, and deploying trained models in the production
environment;
• generating predictions: providing predictions or classifications in real-time or batch process-
ing mode;
• monitoring model performance: tracking the efficiency and degradation of models, warning
about the need for retraining;
• data and feature management: support for storing, processing, and accessing data and
generated features.
372
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Thus, MLOps tools provide a wide range of features to support the lifecycle of machine learning mod-
els, with a focus on automation, experiment tracking, versioning, monitoring, and model deployment.
Some tools offer more specialized features, such as hyperparameter optimization or data management.
The choice of a tool with an appropriate set of features depends on the specific needs and goals of the
organization in the field of MLOps.
2.4.6. Ways of deploying machine learning models in production environments
The aim of the meta-synthesis of ways of deploying machine learning models in production envi-
ronments in the reviews [6, 8, 5, 15] was to identify the most common and critical practices in this
area.
Common ways of deploying machine learning models in production environments, according to the
considered reviews, are:
1. The reviews [6, 5, 15] note that models are often deployed using container technologies such as
Docker, which provides model mobility and isolation.
2. The reviews [6, 8, 15] indicate the prevalence of deploying models in cloud environments using
platforms and services from major providers such as AWS, Google Cloud, and Azure.
3. The reviews [6, 5] note that models are often deployed as web services using REST API or other
protocols to provide access to predictions in real-time.
Distinct aspects of the considered ways of deploying models in the reviews are:
1. The review [6] additionally describes deploying models using orchestration platforms (Apache
Airflow, Jenkins, Kubeflow, MLflow, Polyaxon, Seldon Core, Valohai) to provide automatic scaling
and container management.
2. The review [8] notes that some MLOps tools, such as MLflow, Kubeflow, and Kafka-ML, have
built-in capabilities to facilitate model deployment in different environments.
3. The review [5] considers deploying models not only in the cloud but also on edge devices using
specialized frameworks such as TensorFlow Lite and Core ML.
4. The review [15] provides a detailed description of the main stages and features of the machine
learning model deployment process using CI/CD pipelines and support for different environments
using commercial MLOps platforms:
a) creating a CI/CD pipeline for models: MLOps platforms such as SageMaker, Azure ML, and
Databricks allow creating CI/CD pipelines to automate the model deployment process,
which includes the stages of building, testing, and deploying models, as well as tracking
artifacts and version management;
b) supporting different deployment environments: MLOps platforms typically support multiple
deployment environments, such as development, testing, and production environments;
models can be deployed in different environments using appropriate configurations and
access policies;
c) model deployment process:
• trained models are packaged in a standardized format (e.g., Docker container) along
with necessary dependencies;
• the model goes through testing and validation stages to ensure its correctness and
compliance with requirements;
• after successfully passing the tests, the model is deployed in the target environment
(when deploying in the production environment, additional security and monitoring
measures may be applied);
373
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
d) automation and orchestration of deployment: MLOps platforms use automation tools such
as Jenkins or GitLab CI/CD to ensure continuous integration and deployment of models,
which can be configured to automatically trigger on certain events, such as updating the
model code or the appearance of new data;
e) monitoring and management of deployed models: MLOps platforms provide tools for moni-
toring the performance and metrics of deployed models in real-time: in case of problems
or model degradation, the platform can automatically initiate the process of retraining or
rolling back to the previous version of the model.
Thus, the most common ways to deploy machine learning models in production environments are
the use of container technologies, cloud platforms and services, and the deployment of models as web
services. The choice of a specific approach depends on the requirements for latency, scalability, and
availability of models, as well as the existing infrastructure and ecosystem of tools in the organization.
2.4.7. Maturity models for assessing the level of automation in deploying machine learning
models
Lima et al. [8] refer to several maturity models for assessing the level of improvement in the development
process of machine learning solutions:
1. The maturity model proposed by Amershi et al. [18], mentioned simultaneously in [8] and [5].
This model, based on the Capability Maturity Model (CMM) and Six Sigma methodology, checks
whether the activity: (1) has defined goals, (2) is consistently implemented, (3) is documented, (4)
is automated, (5) is measured and tracked, and (6) is continuously improved.
2. According to Dhanorkar et al. [19], organizations can be classified into three levels of maturity for
developing machine learning solutions: (1) data-oriented, (2) model-oriented, (3) pipeline-oriented.
3. Lwakatare et al. [20] describe five stages of improvement in development practices: (1) manual
process led by data science, (2) standardized process of experimental-operational symmetry,
(3) automated ML workflow process, (4) integrated software development and ML workflow
processes, and (5) automated and fully integrated CD and ML workflow process.
4. Akkiraju et al. [21] proposed an adaptation of the CMM model with the definition of five levels
of maturity for each assessed capability: (1) initial, (2) repeatable, (3) defined, (4) managed, and
(5) optimizing.
All systematic reviews [6, 8, 5] indicate that the level of automation of MLOps processes is one
of the key factors in assessing the maturity of an organization in this area. Despite the fact that the
considered reviews do not provide an exhaustive description of MLOps maturity models, they emphasize
the importance of assessing the level of automation of model development, testing, and deployment
processes as a key factor in the maturity of an organization in this area. The adaptation of existing
software development maturity models to the specifics of MLOps can be an effective approach to
assessing and improving machine learning processes in an organization.
2.4.8. Roles and responsibilities identified in the activities of operationalizing machine
learning models
The aim of the meta-synthesis of roles and responsibilities identified in the activities of operationalizing
machine learning models in the reviews [6, 8, 5, 15] was to identify the key participants in the MLOps
process and their functions.
Common roles and responsibilities identified in the considered reviews:
1. All reviews [6, 8, 5, 15] mention the involvement of data scientists / data science researchers who
are responsible for developing, training, and experimenting with machine learning models.
374
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
2. The reviews [6, 8, 5] highlight the role of data engineers / data providers who are involved in
extracting, processing, transforming, and ensuring the quality of data for model training.
3. The reviews [6, 5, 15] note the importance of DevOps engineers, ML/MLOps engineers, and soft-
ware engineers in operationalizing models, automating deployment processes, creating pipelines,
and managing environments.
4. The reviews [6, 5] emphasize the role of managers, leadership, and business stakeholders in
defining model requirements for deployment, decision-making, and supporting the MLOps
strategy.
5. The reviews [6, 5] emphasize the role of managers, leadership, and business stakeholders in
defining model requirements, decision-making, and supporting the MLOps strategy.
Distinct aspects of roles and responsibilities, considered in the reviews:
1. The review [8] additionally highlights the roles:
• domain specialist has deep knowledge of the subject area, plays an important role in obtaining
data and validating results);
• computational scientist/engineer has high technical skills to prepare the environment for the
operation of machine learning models;
• ML scientist/engineer is responsible for designing new machine learning models, has in-depth
knowledge of statistics and ML algorithms;
• provenance specialist manages the supply of data in the lifecycle of developing machine
learning solutions, has knowledge of both the subject area and machine learning;
• manager assesses models before their publication;
• application developer develops applications in which the created models will operate;
• deployment lead assesses aspects related to infrastructure components when deploying ML
models to production.
2. The review [5] mentions the role of subject matter experts in labeling data in specific domains.
Thus, despite some differences in the detail of roles, the considered reviews recognize the need to
involve specialists from different areas – software development, data engineering, machine learning,
subject matter experts, and management, for the successful operationalization of machine learning
models. Close collaboration and communication between these roles is critical for implementing MLOps
practices in organizations (figure 1).
2.4.9. Challenges encountered in deploying machine learning models in production
environments
The aim of the meta-synthesis of challenges encountered in deploying machine learning models in
production environments in the reviews [6, 8, 5, 15] was to identify the most common and critical
problems in this area. In [6] and [15], specific challenges are not explicitly listed, but they can be
determined indirectly based on the discussion of MLOps and automation of machine learning pipelines
in [6] and the description of the various stages of MLOps and the need for appropriate tools in [15].
Common challenges identified in the considered reviews:
1. The reviews [6, 8, 5, 15] note the complexity of managing the machine learning model lifecycle,
including versioning, tracking, and reproducibility of models and data, as well as the problem of
ensuring scalability and performance of models in real-world conditions with large amounts of
data and requests.
375
production, and monitors both the model and the ML infrastructure is a
[14,17,26,29] [α, β, γ, δ, ε, ζ, η, θ]. data
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
was
feat
DS BE
Aft
Data Scientist Backend Engineer ana
(ML model development)
ML
(ML infrastructure management)
pipe
ML Engineer / the
MLOps Engineer clea
(cross-functional management
of ML environment and assets: scie
ML infrastructure,
ML models, eng
ML workflow pipelines,
data Ingestion, adv
DE
monitoring)
DO rule
Data Engineer DevOps Engineer bas
(data management, (Software engineer with DevOps skills,
data pipeline management) ML workflow pipeline orchestration, eng
CI/CD pipeline management,
monitoring) mod
{…}
SE
Software Engineer requ
(applies design patterns and
coding guidelines) data
buil
init
Figure 3. Roles
Figure 1: Intersection and
of roles andtheir intersections
responsibilities contributing
(according to [2, p. 5]). to the
the
MLOps paradigm
2. The reviews [8, 5, 15] point to challenges associated with monitoring and maintaining models in eng
the production environment, including detecting data drift and model performance degradation. mod
3. The reviews [6, 8] consider challenges related to integrating software development with the the
5 Architecture
machine learning pipeline.and Workflow
and
On 4.the
Thebasis
reviewsof[6,the identified
5] consider challengesprinciples, components,
related to ensuring data securityand roles,when
and privacy we of
deploying machine learning models.
derive a generalized MLOps end-to-end architecture to give ML und
5. The reviews [5, 15] indicate that the quality, availability, preparation, labeling, and integration of
researchers and practitioners
data from different sources is a significantproper guidance.
challenge that requires a lot It is and
of time depicted in
resources, and feat
highlight the problem of interpreting and explaining the results of model operation to end-users
Figure 4. Additionally, we depict the workflows, i.e., the sequence
and business stakeholders. (for
in which the different tasks are executed in the different stages. The
Distinct challenges, considered in the reviews: clou
artifact was designed to be technology-agnostic. Therefore, ML
1. The review [6] emphasizes the need to automate all stages of the MLOps pipeline and integration (10
researchers
with existing and
software practitioners
development systems and can choose the best-fitting
processes.
clea
technologies and
2. The review [8] notesframeworks for their
the problem of selecting needs.
and managing infrastructure for deploying models,
requ
including the choice between cloud and on-premises environments.
As depicted in Figure 4, we illustrate an end-to-end process, mai
from MLOps project initiation to the model serving. It includes (A) tran
the MLOps project initiation steps; 376 (B) the feature engineering feed
pipeline, including the data ingestion to the feature store; (C) the
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
3. The review [5] notes the problems: a) the gap between software engineering and machine
learning skills – data scientists often do not understand the requirements of certain production
environments, and software developers do not have sufficient machine learning skills; b) effective
distribution, parallelization, and orchestration of data and ML tasks; c) the diversity of computing
infrastructure.
The considered reviews show that deploying machine learning models in production environments
is associated with a number of challenges, such as managing the model lifecycle, ensuring scalability
and performance, monitoring and maintaining models in real-world conditions. Addressing these
challenges requires an integrated approach that includes automation of MLOps processes, selection
of appropriate infrastructure, ensuring data security and privacy, and effective communication with
business stakeholders.
2.4.10. Open issues, challenges, and peculiarities of MLOps
The aim of the meta-synthesis of open issues, challenges, and peculiarities of MLOps in the reviews
[6, 8, 5, 15] was to identify the most relevant and promising areas of research and development in this
field. In [6] and [15], open problems, challenges, and peculiarities of MLOps are not directly discussed,
but they can be identified based on the analysis of MLOps tools and their capabilities in [6] and the
description of the components and functions of MLOps platforms [15].
Common open issues and challenges of MLOps, identified in the considered reviews:
1. The reviews [6, 8, 5, 15] indicate the need to develop methods and tools to ensure the inter-
pretability, reproducibility, and responsible use of machine learning models in the context of
MLOps.
2. The reviews [6, 5, 15] emphasize the importance of developing approaches to data management
in MLOps, including ensuring data quality, privacy, and security.
3. The reviews [6, 8] note the need to develop and implement MLOps standards and best practices
to ensure consistency and compatibility between different tools and platforms.
4. The reviews [8, 5] emphasize the importance of the human factor in MLOps, including the need
to ensure effective communication and collaboration between different roles and teams and the
training of qualified personnel with cross-functional skills in programming, data processing, and
operational activities.
Peculiarities of MLOps, identified in the considered reviews:
1. The review [6] notes that MLOps should take into account the specifics of the machine learning
model development process, which differs from traditional software development.
2. The review [15] considers MLOps in the context of the overall digital strategy of the organization
and emphasizes the need to align MLOps practices with business goals and needs.
Thus, the considered reviews identify a number of open issues and challenges in MLOps, such as the
need to develop standards and best practices, ensure interpretability and responsible use of models,
and effectively manage data. Peculiarities of MLOps, such as the difference from traditional software
development, the importance of the human factor, and the need to integrate knowledge from different
fields, require consideration when implementing MLOps practices in organizations.
2.4.11. Opportunities, future trends, and areas of application of MLOps
The aim of the meta-synthesis of opportunities, future trends, and areas of application of MLOps in
the reviews [6, 8, 5] was to identify promising directions of development and potential areas where
MLOps practices can bring significant benefits. In [6], they are not directly discussed, but they can be
377
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
determined indirectly based on the presented MLOps tools and their capabilities, it is possible to outline
some potential directions and trends.
Opportunities and future trends of MLOps, identified in the considered reviews:
1. The reviews [6, 8, 5] note the potential for developing standardized MLOps platforms and tools
that will simplify and accelerate the implementation of machine learning models in production.
2. The reviews [8, 5] note the prospects for integrating MLOps with other approaches, such as
DataOps, ModelOps, and DevSecOps, to provide comprehensive management of the machine
learning model lifecycle.
3. The review [8] points to: a) significant opportunities for further academic research and devel-
opment due to the fact that MLOps is still at an early stage; b) the expectation of increasing
demand for MLOps tools and platforms with the spread of artificial intelligence solutions; c) the
emergence of new roles and competencies related to MLOps as the industry develops.
4. The review [5] points to: a) opportunities to apply MLOps practices in the context of distributed
and federated model training, which will allow efficient use of decentralized data; b) involving
business units and training leadership in MLOps principles; c) using hardware platforms such as
FPGA and IoT to improve performance and privacy.
Current and future areas of MLOps application, identified in the considered reviews:
1. The reviews [6, 8, 5] note that MLOps is already actively used in industries such as finance,
healthcare, commerce, marketing, and manufacturing, where machine learning models are used
to solve real business problems.
2. The review [5] points to the potential of applying MLOps in the field of IoT and edge computing,
where machine learning models can be deployed on resource-constrained devices, 5G and 6G
technologies, educational and scientific activities.
3. The review [8] notes the prospects for using MLOps in transportation and logistics.
Thus, the considered reviews outline a number of opportunities and development trends for MLOps,
such as creating standardized platforms, applying in the context of distributed learning, and integrating
with other approaches to managing the lifecycle of data and models. Current and future areas of MLOps
application include a wide range of industries, from finance and healthcare to IoT and natural language
processing, which indicates a significant potential impact of this approach.
3. Analysis of MLOps practices
3.1. Relationship between MLOps principles, processes, and practices
MLOps is based on a set of principles [2, p. 3] and processes that ensure effective development, deploy-
ment, and support of machine learning models (MLOps practices).
MLOps principles define the fundamental foundations of designing machine learning pipelines:
• automation: maximum automation of all stages of the machine learning model lifecycle to reduce
manual interventions and improve efficiency;
• reproducibility: ensuring the ability to reproduce the results of experiments and model deployment
processes;
• collaboration: establishing effective collaboration and communication between different teams
involved in model development and implementation;
378
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
• continuous learning and improvement: regular updating of models based on new data and feedback,
continuous improvement of MLOps processes;
• data governance: ensuring data quality, security, and confidentiality throughout the model lifecy-
cle.
MLOps processes define the sequence of actions for designing and implementing machine learning
pipelines:
1. Defining business goals and requirements: aligning the goals of developing machine learning
models with the business strategy of the organization.
2. Data collection and preparation: collecting, cleaning, transforming, and enriching data for model
training.
3. Model development and training: selecting algorithms, developing model architecture, training
and validating models.
4. Model evaluation and testing: evaluating model performance on test data, conducting tests for
reliability, security, and compliance with requirements.
5. Model deployment: packaging models with necessary dependencies, deploying in target environ-
ments.
6. Model monitoring and maintenance: tracking model performance, identifying and resolving issues,
updating models as needed.
7. Model lifecycle management: coordinating all stages of model development, deployment, and
support, ensuring compliance with regulatory requirements.
MLOps practices define the most effective methods and technologies for implementing machine
learning pipelines:
• basic MLOps practices include:
– continuous integration and delivery (CI/CD): automation of the processes of building, testing,
and deploying machine learning models;
– model and data versioning: tracking changes in models and datasets, ensuring reproducibility
of results;
– ML pipeline automation: creating automated workflows for data collection, processing,
model training, and evaluation;
– model performance monitoring: tracking model quality metrics in the production environ-
ment, detecting performance degradation;
– experiment management: organizing, tracking, and comparing different experiments with
models and hyperparameters;
– model deployment: packaging models with necessary dependencies, deploying in different
environments (cloud, edge, etc.);
– model lifecycle management: coordinating the processes of model development, testing,
deployment, and monitoring to best ensure compliance with requirements;
• additional MLOps practices include:
– data security and privacy: ensuring the protection of data used for model training, compliance
with regulatory requirements;
379
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
– model explainability and interpretability: using methods and tools to understand and explain
model behavior, especially in regulated industries;
– data quality management: monitoring and ensuring the quality of data used for model
training and evaluation, detecting and handling anomalies;
– configuration management: versioning and managing configurations of environments where
models are deployed, ensuring consistency across different environments;
– model deployment strategies: selecting and implementing appropriate deployment strategies;
– infrastructure automation: using Infrastructure as Code to automate provisioning and man-
agement of infrastructure for model training and deployment;
– collaboration and communication: establishing effective collaboration between data science,
development, operations, and business units;
– risk management and compliance: identifying and mitigating risks associated with the use
of machine learning models, ensuring compliance with regulatory requirements.
Figure 2 illustrates the relationships between key MLOps principles (light blue rectangles), main
processes (green rectangles), and common practices (orange rectangles). Arrows show how principles
influence processes, and processes, in turn, are implemented through specific practices. For example,
the principle of automation influences all MLOps processes, from goal definition to model lifecycle
management. The model development process is associated with practices such as versioning, pipeline
automation, experiment management, and model interpretability.
3.2. CI/CD
CI/CD (Continuous Integration/Continuous Delivery) is a key element/practice/implementation of
DevOps for automatic testing and deployment of code, data and models in a production environment
(figure 3) [7, p. 7]. In MLOps, it is extended to automate the process of developing and deploying ML
models, including the stages of building, testing, delivery, and deployment [2, pp. 3-4].
The CI/CD process in MLOps includes the stages of build, test, delivery, and deploy [2, p. 4]. However,
unlike traditional CI/CD, MLOps may also have additional stages, such as model retraining.
CI/CD in MLOps is part of the MLOps system architecture and provides fast feedback to developers
on the success or failure of certain stages, increasing overall productivity [2, pp. 3-4].
Typical triggers for starting the CI/CD process in MLOps on GitHub are git push and pull_request
events [12, p. 4]. The events issue_comment, release, and schedule (on a schedule) can also be used.
In the article Steidl et al. [7], potential main triggers were also investigated, such as feedback systems
and alerts, scheduled orchestration service, traditional repository updates, and manual triggers. These
triggers start the execution of the pipeline, which consists of four stages: (1) data processing, (2) model
training, (3) software development, and (4) system commissioning. The data processing stage consists of a
repetitive end-to-end lifecycle of data-related tasks, such as preprocessing, quality assurance, versioning,
and documentation. The model training stage uses the results of data processing and illustrates model
development tasks such as model design, training, quality assurance, collecting metadata for model
improvement, version management, and documentation. After the pipeline completes model training,
the software development stage prepares the model for deployment through packaging, quality assurance
at the software level, and system versioning. In the final system commissioning stage, the model is
deployed in a specific environment using different deployment strategies and the system is monitored
[7, p. 21].
A feature of the CI/CD process in MLOps is the need to version not only code, but also data and
models. This allows for reproducibility and the ability to roll back to previous versions [2, p. 4].
Among the popular tools for implementing CI/CD in MLOps are Jenkins [2, p. 3] and GitHub Actions
[2, 12] and tools from cloud providers such as AWS CodePipeline, Azure DevOps Pipelines, etc.
380
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Defining business
Automation CI/CD
goals and requirements
Model and data versioning
ML pipeline automation
Data collection
and preparation
Model performance monitoring
Reproducibility
Experiment management
Model develop-
ment and training
Model deployment
Model lifecycle management
Collaboration Model evalua-
Data security and privacy
tion and testing
Model explainability and interpretability
Data quality management
Model deployment
Configuration management
Continuous learning
Model deployment strategies
Model monitoring
and maintenance
Infrastructure automation
Collaboration and communication
Data gGovernance Model lifecy-
Risk management and compliance
cle management
Figure 2: Diagram of relationships between MLOps principles, processes, and practices.
Figure 4 from the article by Kreuzberger et al. [2] presents an end-to-end MLOps architecture and
workflow with functional components and roles involved at each stage. Let’s consider in more detail
each of the zones and stages depicted in the figure:
1. A. MLOps Project Initiation. At this stage, the business stakeholder (BS) analyzes the problem
and defines the goal. The data scientist (DS) formulates the ML problem based on the business
goal. The necessary data is also determined and an initial analysis is performed by the data
engineer (DE) and data scientist (DS).
2. Data Engineering Zone. This zone includes two sub-stages:
• B1. Requirements for feature engineering pipeline. Rules for transformation, cleaning,
and calculation of new features are defined.
• B2. Feature Engineering Pipeline. A data processing pipeline is implemented that
receives data from various sources, applies transformations, and loads it into a feature store
system.
3. C. Experimentation. At this stage, the data scientist (DS) performs data analysis, preparation,
and validation, as well as model training and validation. The best model is saved in the Model
Registry.
381
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Figure 3: Continuous lifecycle pipeline for AI applications [7, p. 10].
382
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
A MLOps Project Initiation
BS SA DS DE AND DS DE AND DS
Connect to raw data
derive ML problem Understand required
Business problem designs architecture for initial data analysis
from business goal data to solve problem
analysis and technologies to (distribution analysis,
(e.g., classification, (data available?, where
(define goal) be used data quality checks,
regression) is it?, how to get it?)
validation checks)
MLOps Project Initiation Zone
B1 Requirements for
DE DS AND DE Feedback Loop
feature engineering feature requirements (iterative)
pipeline define
define feature
transformation
engineering rules
& cleaning rules
Data Sources
data pipeline code
transformation feature
streaming data
rules engineering rules
B2 Feature Engineering Pipeline {…} orchestration
DE AND SE
component
batch data data feature engineering data Ingestion
Connect to data
transformation (e.g., calc. of new job (batch or artifact store
raw data extraction
& cleaning features) streaming)
cloud storage
data preprocessing
(labeled data) data processing computation infrastructure CI/CD component
Data Engineering Zone
C Experimentation model engineering (best algorithm selection, hyperparameter tuning) model
{…}
DS SE
training code DO OR ML
ML workflow
data ML
model model export pipeline code
data analysis preparation & model
training validation model model
validation
Repository serving code {…}
versioned SE
feature data
ML metadata store CI / CD component
model training computation infrastructure continuous integration
DE / continuous delivery
Feature store ML Experimentation Zone versioned artifacts: model + ML training & workflow code (build, test and push)
system
ML Production Zone continuous deployment
offline DB (build, test and deploy model)
(normal
Scheduler artifact
(trigger when Workflow orchestration component
latency) store Model Registry
new data (e.g., Image prod ready
available, Registry) ML metadata store ML model DO OR ML
ML metadata store
online DB event-based or model status (staging or prod)
(low-latency) periodical) (metadata logging of each ML workflow task)
parameter & perf. metrics
D Automated ML Workflow Pipeline (best algorithm selection, parameter & perf. metric logging)DO
OR ML
versioned
feature data data
data model training model export push to model
preparation & registry
extraction / refinement validation model
validation
model training computation infrastructure ML
Monitoring component DO OR ML Model serving component
Feedback Loop – enables continuous training / retraining & continuous improvement (prediction on new batch or
continuous monitoring of model serving performance
streaming data) {…}
new versioned feature data (batch or streaming data) SE
model serving computation
ML infrastructure
LEGEND {…} General process flow Feedback loop flow
BS SA DS DE SE DO ML
Data Engineering flow Versioned Feature Flow
Business IT Solution Data Data Software DevOps ML Engineer
Stakeholder Architect Scientist Engineer Engineer Engineer Model / Code flow
Figure
Figure 4. End-to-end MLOps
4: End-to-end architecture
MLOps and workflow
architecture with functional
and workflow withcomponents and
functional roles
components and roles [2, p. 6].
(11) The feature engineering task calculates new and more (C) Experimentation. Most tasks in the experimentation stage
advanced features based on other features. The predefined feature are led by the data scientist (R3). The data scientist is supported by
4. MLrules
engineering Production Zone.
serve as input This
for this zone
task. Theseincludes
feature an automated pipeline
the software engineer (R5). (D.
(13) Automated ML
The data scientist (R3)Workflow
connects to
Pipeline),
engineering which ensures
rules are continuously improved data preparation,
based on the feedback. training,
the featurevalidation, andforregistration
store system (C4) of(Alternatively,
the data analysis. the model
(12) Lastly, a data ingestion job loads batch or streaming data into the data scientist (R3) can also connect to the raw data for an initial
in production mode. The Model Serving component
the feature store system (C4). The target can either be the offline or
deploys the model, and the Monitoring
analysis.) In case of any required data adjustments, the data
Component
online database ensures
(or any kind its continuous monitoring. In case of problems, information is transmitted
of data store).
through the Feedback Loop to initiate model retraining.
In addition to functional components, the figure also shows different roles and their areas of respon-
sibility: Business Stakeholder (BS), Data Scientist (DS), Data Engineer (DE), DevOps Engineer (DO), ML
Engineer (ML), Software Engineer (SE), and IT Solution Architect (SA).
3.3. Model and data versioning
Versioning is one of the key MLOps practices that ensures reproducibility and traceability of machine
learning models [2, p. 3]. Versioning covers data, code, and the models themselves [7, p. 9]. Model
383
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
versioning not only captures model artifact versions but also model dependencies for tracking or
reproducing different model versions that rapidly change over time [7, p. 13].
The purpose of data versioning is to guarantee the reproducibility of models and compliance with
regulatory requirements. Data versioning can be implemented either by storing data snapshots or by
referencing the original dataset. Since traditional version control systems cannot handle large amounts
of data, specialized tools such as Data Version Control (DVC) are used [7, p. 11].
In the MLOps workflow, model versioning occurs at the model training stage. Its purpose is to save
different versions of models along with their metadata for the ability to roll back to previous versions
and reproduce results. The conditions for using model versioning are: (1) the constant evolution of
models over time; (2) the need to track dependencies between models, data, and code [7, p. 12].
A feature of model versioning, as opposed to traditional code versioning, is the need to track a larger
number of artifacts and metadata, as well as the need for larger amounts of memory due to the constant
development of models [7, p. 13].
In addition to the data itself, versioning is subject to dependencies, data processing steps, and extracted
features [7, p. 11]. For the latter, specialized feature stores are often used.
Model dependencies capture the relationship with related elements such as the corresponding dataset,
source code, and configuration files. In addition, model versions store associated log files and model
evaluation results. This allows checking whether model versions are constantly improving throughout
the continuous lifecycle. Since versioning of artificial intelligence models is more complex and requires
more memory due to continuous development, standard version control systems such as Git cannot be
used as model repositories. Potential alternatives such as MLFlow, H2O, and DataRobot are container
registries where image versions are stored, or model repositories that store model versions, including
code, metadata, test results, and dependencies [7, p. 13].
3.4. ML pipeline automation
ML pipeline automation is a key MLOps practice that allows simplifying and accelerating the develop-
ment, testing, and deployment of ML models in a working environment. This practice encompasses
the automation of various stages of the ML pipeline, including data collection, preprocessing, model
development, training, testing, validation, and deployment [7, p. 2].
The algorithm of an automated machine learning pipeline (figure 5) consists of the following stages:
1. Process initiation.
2. Extraction of versioned data from the storage.
3. Automated data preparation and validation.
4. Automated model training on new data (iteratively).
5. Model evaluation and hyperparameter tuning.
6. Model export.
7. Storing the model in the model registry.
8. Model deployment.
9. Model serving to obtain predictions.
10. Monitoring model performance.
The feedback loop ensures continuous improvement of the model by returning to the training
stage. If necessary, monitoring can initiate a model retraining trigger, which starts the process from
the beginning. This automated pipeline provides the ability to effectively manage the lifecycle of a
384
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Initiation
Extraction of versioned
data from storage
Automated data preparation
and validation
Iteratively
Automated training
Ret
on new data
rain
ing
trig
ger
Model evaluation,
hyperparameter tuning
Model export
Model
Model registry
deployment
con F
tinu eedbac
ous k
mod loop fo
el im r
prov Model serving
em ent to obtain predictions
Performance monitoring
Figure 5: Algorithm of an automated machine learning pipeline (based on Kreuzberger et al. [2]).
machine learning model, from data preparation to deployment and monitoring, ensuring continuous
improvement of the model. Features of using ML pipeline automation include the need to take into
account the heterogeneity of models, frameworks, and execution environments. Therefore, each step of
the ML pipeline should be as isolated as possible and have clear interfaces, for example, through the
use of containerization [22, p. 6]. It is also critically important to ensure the ability to reproduce results
and track artifacts [23, p. 1706].
Methods of ML pipeline automation include the use of pipeline management systems such as Kubeflow,
385
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
TFX [7, p. 19] or Apache Airflow [6, p. 4], as well as the development of custom automation scripts using
tools such as Jenkins [24] or GitLab CI/CD [2, p. 2]. In this case, the pipeline is divided into separate
steps, each of which is implemented as code or configuration, and then these steps are orchestrated and
executed automatically [23, p. 1706].
3.5. Model performance monitoring
Monitoring the performance of machine learning models is an important MLOps practice that allows
tracking the operation of models in a production environment, identifying problems, and taking
measures to maintain their quality. This practice covers the collection of metrics regarding the operation
of the model, real-time monitoring of these metrics, and alerts in case of detecting deviations from the
norm [25, p. 128].
Model performance monitoring occurs at the model operation stage, after its deployment in the
production environment. It is part of the continuous MLOps cycle and is performed in parallel with
other practices, such as development, testing, and deployment [25, p. 127].
The conditions for using monitoring are the presence of a deployed ML model that processes real
data and generates predictions. In addition, an infrastructure for collecting and storing monitoring data,
for example, a database and visualization tools, should be set up [25, p. 134].
The monitoring process includes several steps (figure 6). First, key model performance metrics are
defined, such as accuracy, error, latency, etc. Then, tools for collecting these metrics from the system
where the model is deployed are set up. Next, the data is aggregated and visualized on dashboards for
convenient analysis. Finally, alerts are set up that are triggered when metrics go outside of acceptable
limits [25, p. 129].
Figure 6: Monitoring process in MLOps [25, p. 129].
Figure 7 depicts the analogy proposed by Bodor et al. [25] between monitoring a machine learning
system and an iceberg. This analogy emphasizes that the health of an ML system depends not only on
visible elements, such as provided services, but also on hidden features that are difficult to track, such
as data and the model itself:
386
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Figure 7: Hidden and visible characteristics associated with monitoring [25, p. 130].
• The top level of the iceberg represents ecosystem health, which includes aspects such as prediction
drift and business key performance indicators (KPIs). The goal is to assess the performance,
reliability, and stability of this top level.
• The service level includes metrics such as latency, cost, and overall system performance.
• The data level contains data quality characteristics, outlier values, and data drift.
• The lowest level of the iceberg is the model, which is characterized by accuracy, concept drift,
and model bias.
A feature of monitoring in the context of MLOps is the need to track not only traditional software
performance metrics (for example, CPU and memory usage), but also metrics specific to ML, such as
model accuracy on new data (figure 8). In addition, MLOps involves automating the monitoring process
and integrating it into the overall model development and deployment pipeline [25, p. 128].
There are a number of tools for setting up model performance monitoring in ML. They include
open-source platforms such as Prometheus, Grafana, ELK stack, as well as commercial solutions from
cloud providers, for example, AWS CloudWatch, Google Stackdriver, Azure Monitor [25, p. 135].
3.6. Experiment management
According to Singh [26], experiment tracking involves systematically storing metadata of machine
learning experiments [26, p. 153].
Czakon and Kluge [27] defines that experiment metadata can include arbitrary scripts to run the
experiment, environment configuration files, information about training and evaluation data, model pa-
rameter and training configurations, ML evaluation metrics, model weights, performance visualizations
(e.g., confusion matrix or ROC curve), etc.
Experiment management (also known as experiment tracking) is part of MLOps focused on supporting
iterative model development – the part of the ML project lifecycle where, in particular, hyperparameter
tuning is performed to achieve the required level of model performance. Experiment management is
closely intertwined with other aspects of MLOps, such as data and model versioning.
387
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Figure 8: Elements for monitoring [25, p. 131].
Data Prediction
collection monitoring
Data Data Experiment Model Model
labeling versioning management versioning deployment
Model architecture
Model training
Model evaluation
Figure 9: Experiment management in the MLOps lifecycle (adapted from Czakon and Kluge [27]).
The main condition for applying experiment management is the iterative nature of the model
development and training process, when many experiments are conducted with different sets of hy-
perparameters, model architectures and training data (figure 9) [27]. This is typical for research and
applied machine learning projects.
The most popular tools for experiment management are MLflow, Neptune.ai, Weights & Biases
(wandb), Guild.ai, Comet.ml and TensorBoard [2]. They allow storing information about each experiment
388
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
in a repository – data used and its version, model parameters and architecture, performance metrics,
model artifacts, etc.
3.7. Model deployment
Model deployment is an important MLOps practice that occurs at the operationalization stage in the
workflow of developing and deploying ML models. This practice involves directly placing a trained and
tested machine learning model into a production environment where it can be used to obtain predictions
on real data [11, p. 1].
According to Kolltveit and Li [11], deployment, i.e. the transition of a packaged and integrated model
into a service state, can occur in several different ways. Models packaged in containers are simply
run directly as standalone services. However, models can be deployed in a target environment that is
different from where they were packaged, and in this case, model transfer must occur using a push or
pull pattern [11, p. 4]:
• in a pull-pattern deployment, the target environment (host application running, e.g., on a server
or edge device) periodically polls for model updates and downloads them when available;
• in a push-pattern deployment, the target environment is notified of the availability of a new
model by the master server (e.g., the server where the model was trained) through a messaging
service, where the message contains metadata including the location of the updated model, or by
initiating model transfer to the target environment through a specific receiving interface.
ML model deployment often occurs by packaging them in containers, for example, using Docker.
This allows standardizing the deployment process and ensuring that the model will run in the same
environment as during development and testing [11, p. 5].
There are different ways to deploy ML models depending on the requirements and architecture of
the system [28, p. 68]:
• the model can be integrated directly into the application code;
• the model can be deployed as a separate service (microservice) with a REST API;
• the model can be loaded into a specialized environment for deploying and scaling ML models (for
example, AWS SageMaker).
ML model deployment is associated with a number of problems, including ensuring low latency and
high throughput for the prediction service [11, p. 5]. Various methods are used to solve them, such as
adaptive queues with timeout for batch prediction, caching, dynamic switching between models of
different accuracy, etc.
In terms of tools, container orchestration systems (Kubernetes), cloud provider services (AWS Sage-
Maker, Azure ML), end-to-end MLOps platforms (MLflow, Kubeflow) are used for ML model deployment
[28, 3].
ML model deployment is a critically important MLOps practice that allows transitioning developed
models into a working state and using them for prediction. It occurs at the operationalization stage
and requires considering a number of factors - from the way the model is packaged to ensuring the
necessary performance of the prediction service. Deployment relies on modern tools for containerization,
orchestration and infrastructure automation.
Based on the generalization [11, 28], a general scheme of machine learning model deployment was
constructed (figure 10):
1. ML Model – a trained and tested machine learning model.
2. Packaging – the model is packaged in an appropriate format (e.g., a Docker container).
3. Model registry – the packaged model is placed in a model registry.
389
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
4. Deployment – the packaged model is deployed in the target environment (cloud, edge devices).
5. Serving – the model serves requests and generates predictions.
6. Monitoring – the performance of the model and environment is monitored. If necessary, model
retraining is initiated.
ML model
Packaging Model registry
Deployment
Monitoring Serving
Figure 10: ML model deployment scheme.
3.8. Lifecycle management
According to Steidl et al. [7], continuous lifecycle management ((end-to-end) lifecycle management) is
a continuous pipeline/flow management, which describes the (automatic) execution of specific tasks
to ensure the management of the lifecycle of artificial intelligence [7, p. 7], which starts with data
collection and ends with the deployment and monitoring of the artificial intelligence model [7, p. 2].
The goal of lifecycle management is to unify and standardize processes, which contributes to increas-
ing the productivity of ML model development and their reliability in the industrial environment.
Key features of lifecycle management in MLOps:
• covers all stages: data collection and preparation, model development, training, validation and
deployment [25, p. 127] (figure 11);
• is applied both at the early stages of model development and for their continuous support after
deployment in the production environment [5, p. 9];
• involves versioning of data, code and models themselves to track changes and ensure reproducibil-
ity;
• involves monitoring model performance;
• automates processes using pipelines [7, p. 8].
Figure 11 shows the ML project development process (pipeline), which consists of a number of steps
that are both linear and iterative (MLOps Pipeline block). The pipeline starts with data extraction,
validation and preparation, then model training, evaluation and validation (experiment management).
After that, the model is deployed in the production environment [25, p. 128].
An important aspect of the lifecycle is versioning of models and data, which occurs both in the
experiment management pipeline and the production pipeline: the latter includes all stages of model
390
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Figure 11: Machine learning project lifecycle in MLOps [25, p. 127].
creation, not just the final result of the experiment management pipeline. The iterative nature of MLOps
provides the opportunity to obtain and maintain the best model, and the combination of monitoring
based on key performance indicators and alert functions enables proactive intervention, ensuring the
quality and reliability of deployed models throughout the lifecycle [25, p. 128].
The main tools used for lifecycle management in MLOps [25, p. 126]:
• platforms for organizing ML workloads and pipelines, such as MLflow, Kubeflow;
• data versioning systems, for example DVC (Data Version Control);
• tools for monitoring model performance in the production environment, such as Neptune.ai.
A comprehensive approach to lifecycle management in MLOps is implemented in Ease.ML (https:
//ease.ml/) – a lifecycle management system designed to simplify the entire development process [29].
The main goal of Ease.ML is to provide systematic recommendations and automation at all stages of the
ML lifecycle, minimizing user effort.
Key features of Ease.ML:
1. Human-in-the-loop process – Ease.ML incorporates user interaction in a structured way, providing
the ability for users to input data and make decisions at critical stages.
2. Probabilistic data model – the system uses a probabilistic database that handles uncertainty in
data, which can arise from incorrect data, weak supervision, or other sources.
3. Interactive environment – Ease.ML uses Jupyter notebooks, allowing users to perform data manip-
ulations, run ML operations, and visualize results in an integrated environment.
4. Lineage graphs – user interactions and operations are tracked in lineage graphs, which represent
the entire ML workflow and ensure reproducibility.
391
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
5. Automatic quality tuning and recommendations – the system provides recommendations for model
improvements based on errors detected in production, guiding users through data cleaning and
collection tasks efficiently.
The Ease.ML process is divided into three sub-processes and consists of eight stages that cover the
entire ML lifecycle:
• Day 0: Pre-ML Subprocess
1. Problem formulation – clearly define the problem and ML goals.
2. Feasibility study – assess the feasibility of an ML solution with the available data and
resources.
• Day 1: AutoML Subprocess
3. Data preparation – clean, preprocess and augment data to make it suitable for training ML
models.
4. Model training, using prepared data, with the use of AutoML to automate model selection
and tuning.
5. Evaluation of trained model performance on validation datasets to determine compliance
with criteria.
6. Selection of the best model and its deployment in the production environment.
• Day 2: Post-ML Subprocess
7. Continuous integration and delivery – integration of the model into the production environ-
ment and setup of CI/CD pipelines to manage model updates and performance monitoring.
8. Model maintenance through continuous monitoring of the model in the production environ-
ment and necessary updates or retraining of the model to adapt to new data or changing
conditions.
3.9. Data security and privacy
Data security and privacy is the practice of protecting information and data used in machine learning
processes from unauthorized access, misuse, and leakage [30, p. 1]. It aims to ensure the integrity,
availability, and confidentiality of data at all stages of the MLOps lifecycle.
Data security and privacy should be considered at all stages of MLOps, from problem definition
to monitoring of the deployed system. Ensuring security at the stages of data management, model
development and deployment is especially critical, as this is where data is most vulnerable [31, p. 8].
The practice of data security and privacy is mandatory in cases where the ML system operates
with sensitive data (personal, financial, medical, etc.) or is deployed in mission-critical environments
(healthcare, automotive industry, manufacturing). However, even for less sensitive applications, an
appropriate level of security must be ensured in accordance with regulatory requirements and user
expectations.
Unlike traditional software development, in the context of MLOps, new attack vectors and vulnerabil-
ities specific to machine learning emerge (figure 12). In particular, ML models are vulnerable to attacks
such as data poisoning, model inversion, and adversarial example attacks [30, pp. 8-15]. This requires
the use of specialized defense strategies.
To ensure data security and privacy in MLOps, both standard security tools (encryption, authenti-
cation, logging, etc.) and specialized ML-oriented solutions are used. Examples of the latter include
libraries for secure federated learning aggregation, tools for testing models for vulnerabilities, frame-
works for privacy-preserving machine learning based on homomorphic encryption or secure enclaves
[31, pp. 10-11].
392
Literature review
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414 9
Model extraction Model stealing
attacks attacks Model poisoning
System Security for
MLBSS attacks
Model-reuse
Model-oriented attacks
attacks
Model adversarial
attacks
Model evasion
Data-oriented Model inference
Model trojan attacks
attacks attacks
attacks
Data adversarial Data poisoning System-level DNN
attacks attacks attacks
System-oriented
Side-channel
attacks
Data backdoor attacks
attacks Covert channel
User privacy attacks
attacks
Platform attacks
Figure 12: Classification of attacks on MLOps systems [30, p. 9].
Fig. 3. The taxonomy for system security of machine learning-based software systems
Data security and privacy is ensured by implementing various control and protection mechanisms at
holding local data samples [199]. The settings have a larger liability of exposing the attack surface to the adversary, in
each stage of MLOps. This includes, in particular, data encryption and anonymization, authentication
which the adversary
and access can manipulate
control, integrity the training
checking, data monitoring,
activity in differentincident
stages. response,
In general, the success
as well ofsecurity
as regular machine learning
algorithms audits andthe
assumes penetration
integritytesting
of the[30, pp. 4-5].
training data and serving data. Availability of the training data is one main
Figure 13 summarizes the main security components and practices in MLOps presented in [30,
concern for security. The goals of data-oriented attacks can be summarised as to: 1. degrade the machine learning model
pp. 22-25].
performance; Essential
2. manipulate the prediction
components of MLOps:outcome at test time.
Review & discussion:
• problem definition includes understanding the problem and system requirements;
Overall, the data-oriented attacks could be categorised into three major types of attacks, which are poisoning attacks,
• data
backdoor attacks managementattacks.
and adversarial covers data
Whilecollection,
poisoning labeling
attacksand verification;
have been a general term summarising most popular
attacks, in this •work
model construction
poisoning and deployment
attacks includes
represents the model selection,
triggerless poisoningbuilding,
attacks optimization and evalua-
while the backdoor attacks is the
tion,
backdoor poisoning as wellTriggerless
attacks. as its deployment in the
poisoning targetrefer
attacks environment;
to successfully mislead the model and system behaviour
• system
without manipulating of maintenance involves
the data during modelmonitoring
inference the functioning
stage. of theattacks,
For backdoor deployed system.
the attack will be activated when
the input dataSecurity
contains crafted
practices in trigger,
MLOps: which could be pixel-level features in an image or designed characters in a
sequence. In Table. 2, we have collectively summarised 15 most recently exemplar studies to cover the modes targeting
1. At the problem definition stage:
on different development stages.
• risk assessment;
For both triggerless poisoning attacks and backdoor poisoning attacks, the attacks mostly happen during the data
• threat modeling.
management stage when the attackers could easily obtain the access to the dataset. This assumption is made against
the centralised2.machine
At the data management
learning models stage:
[7, 71, 91, 161, 172, 182, 190], whilst one specific scenario is identified for the
distributed machine•learning
data flowmodels.
diagram;In [129, 176], the federated learning-based systems are studied that the global
model is poisoned by• aggregating
STRIDE methodology for threat
the edge node modelclassification;
updates learned from malicious participants. The last attacking
• conceptual
stage for backdoor poisoning modeling;
attacks is discovered in [88], in which the model is retrained periodically at system
• data validation;
maintenance stage. Generally, attacks have impacts on the data integrity and availability.
Although several•earlier
data linter;
works [120, 128, 143, 153] have identified the poisoning attacks in different applications,
• data verification.
one earliest work presenting a more realistic assumption by limiting the adversary’s capability and knowledge of the
system was led by Suciu et al. [172]. Neither the feature nor the algorithm knowledge was taken for granted as the
Manuscript submitted to ACM
393
n the development of MLBSS are specific and can not be well managed within the SWEBOK topics, especially in term
of the technical and data-related challenges. This, in turn, calls for adjusting the practices in the context of system
domain and technical
Danylo O.issues.
Hanchuk et al. CEUR Workshop Proceedings 362–414
Problem • Security • Assess and Data
understanding requirement determine acquisition
System configuration engineering data quality
attributes Data labeling
Data flow
diagram Problem Data Data linter
definition management Data
STRIDE Threat Data
validation
Conceptual modelling verification
modelling
Model
System construction
maintenance and
System testing & deployment
debugging ML model
• Post-
deployment • Model testing
monitoring
System Test adequacy
monitoring identification
SelfChecker Fuzzing Testing against
MLDEMON techniques adversarial input
Essential components for MLBSS Identified secure practice
Figure 13: Main security components and practices in MLOps [30, p. 20].
Fig. 4. The essential components [162] and identified secure practices for MLBSS
3. At the model construction and deployment stage:
Currently, the research and development
• identification of the state-of-the-practice
of test adequacy criteria; for MLBSS are mostly focused on two topics
which are technical debt and quality
• testing againstassurance. Generally, quality assurance defines ‘a planned and systematic pattern
adversarial input;
of all actions necessary •tofuzzing
providetechniques.
adequate confidence that the item or project conforms to established technical require
ments’ [20]. For technical debt, at
4. At the system its broadest,
maintenance stage:it was defined as ‘any side of the current system that is considered
sub-optimal from a technical perspective’
• ML model [81,tools
monitoring 177].(MLDEMON,
While the discussion
SelfChecker);of technical debt is related to the compromised
decisions for the poor •software
static analysis;
development [93], the quality assurance dedicates to system quality offline and in
• exploratory
production attached with attacks;aspects, i.e., model quality, safety and fairness analysis [76].
several distinct
• evasion
Technical debt, studied in [177]attacks;
for classical software systems, has recently been reviewed in MLBSS in order to identify
• data
he specific patterns [14, poisoning
93, 162, 175].attacks;
Additional to typical technical debt at code level, the machine learning-based
systems represent extra• trade-offs
manual testing;
to be overcome for practical practices for the long term development, deploymen
and maintenance at the• system
dynamic level.
analysis.
The data dependency debt was extensively discussed as a key contributor fo
MLBSS since it isThus,
onesecurity practices in
of the integral MLOps are integrated
components in MLBSS intoand
all main stages of more
is generally the development lifecycle
difficult to detect as a result o
and include both general methods of security assessment and testing (threat modeling, static/dynamic
ts unstable and underutilised characteristics [162]. Other technical debts, i.e., abstraction debt, configuration deb
analysis) and specialized approaches focused on the peculiarities of machine learning systems (testing
against
data testing debt, adversarial examples,
reproducibility detection
debt, process of data poisoning).
management debt and cultural debt, may also be accrued and demand
ongoing collaborative efforts. Following in [14], 21 primary studies were included to identify the actual technical debt
3.10. Model explainability and interpretability
hat have been investigated in recent works, in which four novel emerging debts of data, model, configuration and
Explainability/interpretability of models is an important MLOps practice for ensuring transparency
ethics were elaborated. Taking
and trust in further
machine actions
learning towards
models. measuring
Explainability refers the technical
to the ability todebt andand
explain targeting to the
understand pay it off, a recen
decision-making
Manuscript submitted to ACM processes of a machine learning model [10, p. 66]. This is especially important when
the model is used to make decisions that have significant consequences, such as in military operations
or law enforcement activities [9].
394
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Explainability as the basis for trust in the ML project allows users to trust the prediction, which
increases transparency. The user can verify which factors contributed to certain predictions, introducing
an additional layer of accountability. The terms “explainability” and “interpretability” are often used
interchangeably, but for MLOPs, explainability is more than interpretability in terms of importance,
completeness, and fidelity of predictions or classifications (figure 14) [4, p. 63608].
Individual case Model
explanation accountability
Explainability Interpretability
Feature importance
Human readability
explanation
Figure 14: Relationship between explainability and interpretability in MLOps.
Explainable Artificial Intelligence (XAI) is a research direction that promotes explainable decision
making [4, p. 63609]. Explainability can be defined as the degree to which an observer can understand
the cause of a decision [32, p. 8]. An ML system is explainable when it is easier to identify causal
relationships between the inputs and outputs of the system. The more explainable a model is, the better
practitioners understand the internal business procedures that occur during model decision making. An
explainable model does not necessarily translate into a model that a human can understand (internal
logic or processes underlying it), but the explainability of the model allows the user to strengthen trust
in the predictions made by the deployed system [4, pp. 63614-63615].
Various methods and tools can be used to achieve model explainability, such as:
• attribution-based methods (integrated gradients, saliency maps) or perturbation-based methods
(SHAP) [33, p. 3], which explain the model’s decision by assigning high scores to the most
influential input features;
• incorporating an attention mechanism into models, which allows focusing on the most relevant
network states and inputs [33, p. 2];
• using a combined reward signal during training, which includes not only target metrics but also
interpretability metrics [33, p. 4].
3.11. Data quality management
Data quality management is an important practice in the MLOps workflow. According to Steidl et al.
[7], the data processing stage includes the full lifecycle of working with data, including pre-processing,
quality assurance, versioning and documentation [7, p. 21].
In Haller’s book [9, pp. 77-84], data quality management is considered in the context of monitoring
and checking data in a production environment to ensure that the data used corresponds to reality.
In the data-centric MLOps lifecycle (figure 15), data quality management is performed at the following
stages:
395
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Figure 15: Data-centric MLOps lifecycle [26, p. 145].
1. Data collection – at this stage, data is created and obtained from various sources: data must be
relevant, complete, consistent and reliable.
2. Data quality assessment and cleaning – collected data undergoes thorough quality checks using
various metrics such as accuracy, completeness, consistency, timeliness, etc.; quality issues
are identified and resolved – incorrect values, missing values, duplicates, noise; data cleaning
techniques are applied to improve data quality.
3. Data augmentation and labeling – cleaned data is enriched with additional information and
labeled according to the target task; at this stage, the quality of data labeling is also controlled to
avoid errors and inaccuracies.
4. Data quality analysis – labeled data is analyzed for quality, its representativeness, class balance,
presence of outliers and anomalies are checked; adjustments are made as needed to improve data
quality.
5. Model training with quality control – based on quality data, ML model training takes place;
during experiments, model and data quality metrics are tracked to ensure stability and reliability
of results.
6. Model deployment with quality assurance – the model is deployed in the production environment
only after thorough testing on quality test data; measures are taken to maintain data quality in
the production environment.
7. Data and model quality monitoring – the deployed model and incoming data are constantly
checked for quality: quality metrics, presence of anomalies in data, distribution shift are tracked;
the model is retrained on new quality data as needed.
To guarantee data quality throughout the entire lifecycle of their use in MLOps, some approaches to
data validation and verification need to be applied:
• data quality assessment by determining how suitable this data is for achieving business goals
(completeness, uniqueness, integrity, validity, accuracy, timeliness) [26, pp. 2-3];
396
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
• data validation in single and cross batches by comparing data characteristics with the expected
schema, as well as checking for data drift [7, p. 11];
• automated modular data tests based on a schema to identify errors in data and prevent them from
entering the model training stage [7, p. 10];
• versioning of data and related artifacts (processing procedures, metadata, etc.) for traceability,
reproducibility of results and compliance with regulatory requirements [7, p. 11];
• documenting data to an extent sufficient to ensure model verifiability [7, p. 12].
Timely detection and elimination of defects in data helps prevent wasting computational resources
on low-quality data [7, p. 11].
In their work, Singh [26] present techniques for assessing the quality of “big data”, which help identify
datasets that may cause problems and unnecessary costs.
To implement data quality management practices, frameworks such as TensorFlow Extended (TFX)
can be used, which have data validation components such as SchemaGen and ExampleValidator [7,
p. 19].
3.12. Configuration management
Configuration files help create more robust software by moving all hardcoded variables into dedicated
locations that can be split up or organized at the developer’s discretion [34, p. 3].
Godwin and Melvin [34] propose a template that supports two types of configuration files – a
config.py file that contains the template configuration and can include additional resources, such as
databases and spreadsheets, and JSON files for storing specific program variables, such as thresholds or
parameters. The config.py file is accessible through import statements, while JSON files are called from
disk at runtime [34, pp. 3-4].
Yongqiang et al. [35] consider the use of a unified data model based on the YANG language for
unifying the description of configuration data and simplifying their management. Neptune Labs [36]
indicate that configuration management tools, such as Ansible (https://www.ansible.com/), Puppet
(https://www.puppet.com/) and Chef (https://www.chef.io/), can be used to automate configuration and
provisioning of MLOps platforms.
3.13. Model deployment strategies
Model deployment strategies:
• are implemented at the final stage of the MLOps process;
• require standardization, automation and encapsulation of models;
• are used for gradual transition of a new model to the production environment;
• rely on containerization.
Peltonen and Dias [28] highlight the following benefits of using containers for model deployment
[28, p. 68]:
• model abstraction and process isolation by running multiple models in individual containers
representing their runtime dependencies;
• ability to create model-specific containers that meet their packaging requirements;
• assignment to different processors (CPU, Mobile GPU, etc.);
• ability to allocate a separate container to facilitate post-processing functions;
397
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
• a model repository and container storage can be used for fast deployment;
• provides a means of standardized deployment;
• almost all existing continuous development pipelines facilitate model deployment in the form of
containers;
• containers can be tailored to specific architectures of edge devices;
• Docker containers are a well-established industry standard;
• ease of rollback in case of failures.
Gunny et al. [37] describe in detail the development cycle and model deployment strategy using
the DeepClean application as an example. A new version of the model is first deployed as a developer
version, undergoes validation in conditions similar to production, and only then replaces the current
model in the production environment. This uses a service architecture with NVIDIA Triton Inference
Server, which supports simultaneous placement of the developer version and the production version of
the model.
The authors identify two main approaches to model deployment [37, pp. 11-12].
In the traditional
Workshop scenario, each user manages their own resources and model versions. Inconsis-
Presentation
tencies in libraries and dependencies, as well as model versions, lead to inconsistent results. Reduced
computational requirements for inference lead to underutilization of hardware resources, depicted
by green rectangles on each node (figure 16). More complex deployment scenarios require the use of
multiple networks, exacerbating existing issues.
Git repository
User node on
compute cluster
Local/shared
storage
Version of DL
software stack
Ops from different
DL frameworks
Figure 16: Traditional
(a) distributed
Traditional deployment scenario [37,deployment
distributed p. 12]. scenario (b)
Figure 2: (a) A traditional deployment scenario in which individual users m
Inconsistencies in libraries and398dependencies as well as model versions lea
demands of inference lead to hardware under-utilization, represented b
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Training Job
User node on
compute cluster
Training Job Cloud or
local model
repository
Local/shared
storage
Version of DL
software stack Containerized
inference service
Ops from different gRPC inference
DL frameworks requests
Figure 17: Deployment using the Inference-as-a-Service scenario
(b) Inference-as-a-Service [37, p. 12]. scenario
deployment
ario
In the Inference-as-a-Service approach, a centralized service orchestrates models and provides unified
which individual users manage their own software and hardware resources.
interfaces for invoking models (figure 17). A centralized model store synchronizes all users and keeps
well as model
themversions lead to
up to date. Pipelines sendinconsistent results.
gRPC inference requests to theReduced computational
service using standardized APIs that
abstract away the details of the inference execution itself. Inference is performed by a containerized
utilization, represented by green rectangles on each node. More complex
service that can efficiently schedule asynchronous model execution, maximizing hardware compute
e networkscapabilities
utilizing multiple
in a portable framework
and scalable manner. In thisbackends, exacerbating
approach, containerization existing
allows creating portable
and isolated model execution environments.
ndardizes inference across all users and coordinates complex concurrent
Choosing the right model deployment strategy allows minimizing operational costs, ensuring consis-
e capacity tent
in amodel
way that for
operation is all
portable and scalable.
users, and facilitating model version monitoring and control.
3.14. Infrastructure automation
equired to Infrastructure
usersas to pick and choose which libraries they need in order to
code is an important MLOps practice that allows treating infrastructure as code for its
ok like the reliable keep theirdeployment
and efficient deployments lightweight.
and management [38, p. 2].In this section,
Infrastructure we will
automation briefly
relates to the
s that look deployment and monitoring stage in the MLOps process (figure 18).
describe the purpose of the most relevant libraries, and leave more It ensures reliable creation of the
necessary environment for deploying machine learning models and automates infrastructure tasks [38,
e Model as p. 3]. detailed information to their documentation.
ed point in Infrastructure as code involves describing infrastructure configuration in a declarative way in special
files (for4.1.1 in YAML, JSON formats) or usingThe
example, hermes.cloudbreak. cloudbreak
special libraryTerraform)
languages (for example, contains or tools
tools.
ream users These files describe the desired state of the infrastructure [38, p. 4].
for provisioning Kubernetes clusters and virtual machines on pri-
e true test Describing infrastructure settings as code allows automating the process of its creation, modification
vate clouds and Thisdeploying workloads onto environments,
those computational
s practices and management
resources and
[38, p. 3].
avoid manual
makes it possible
setup errors [38, p. 4].
to fully reproduce
This approach is appropriate to
quickly deploy
use for Cloud,
creating
resources. While support only currently exists for Google
oyment via complex dynamic infrastructures, when high repeatability and consistency of environments is needed,
the intent of the library is to be written in such a way that the
atable, com-
user interface is agnostic to the actual cloud backend. Moreover, by
using Python contexts to deploy resources, we can ensure that any
399
resources are spun-down once jobs are complete so that unneces-
sary costs are not incurred. cloudbreak is not currently used as
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Data collection Data analysis Model building Model testing
and preparation and exploration and training and validation
Model
Model deployment
performance
(Infrastructure automation)
monitoring
Infrastructure as Code
Model retraining and updating
Figure 18: Infrastructure automation in the MLOps lifecycle.
to increase reliability and reduce time costs for manual configuration [38, pp. 2, 4].
The automated approach allows testing infrastructure as code, applying software development
practices to it (code review, versioning, etc.). This contributes to improving stability and security, allows
quickly tracking and resolving infrastructure issues [38, p. 4].
To implement Infrastructure as code in Azure DevOps, tools such as Azure Resource Manager (ARM)
Templates, Terraform, Ansible, Chef are used [38, pp. 3-4]. They allow describing infrastructure as code
and automatically creating or modifying it.
3.15. Collaboration and communication
Collaboration and communication between various stakeholders is a key MLOps practice for the
successful implementation of machine learning and artificial intelligence projects in organizations.
MLOps emphasizes how cross-functional teams, such as data analysts, system operators, as well as data
and software engineers, collaborate through a harmonized process [7, p. 7].
The essence of the collaboration and communication practice lies in establishing effective interaction
and information exchange between the various teams involved in the process of developing and
implementing ML models – the data science team, development team, operations team, and business
units. This practice is an important component of the development and implementation stage of the
MLOps workflow.
As Kreuzberger et al. [2] point out, MLOps involves close collaboration between data science (machine
learning) teams, which are engaged in data preparation and model development, software development
engineers, who are responsible for integrating models into the production environment, and operations
teams, which ensure the deployment and support of models [2, p. 2]. Effective communication is
necessary at all stages of the ML model lifecycle, from defining business goals to monitoring models in
the production environment. Without well-established collaboration, the development of ML solutions
can be delayed, conflicts of interest and misunderstandings may arise between participants [39, p. 3].
The diagram (figure 19) shows the main participants in the MLOps process and the directions of their
interaction:
1. The Data Science team prepares data and develops machine learning models.
2. The Development team integrates the developed models into software products.
3. The Operations team deploys models in the production environment and provides their support.
4. The Business team defines goals and requirements for ML solutions, and also receives results
from the Data Science and Operations teams.
400
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Defining goals and requirements
Business
Collaboration
Data tools: Slack,
preparation, Trello, GitLab wiki Model
model deployment
development and support
Data Science Development Operations
Integration of models into the product
Figure 19: Diagram of collaboration and communication in MLOps.
To ensure effective communication in MLOps practice, the following approaches and tools are used
[2, p. 4]:
• use of collaboration and knowledge sharing tools, such as Slack, Trello, GitLab wiki;
• regular meetings between teams to discuss status, problems and plans;
• clear definition of roles and areas of responsibility of process participants;
• use of version control systems (Git) for collaboration on code and models;
• automation of CI/CD processes to ensure transparency and reproducibility of development.
3.16. Risk management and compliance
Since ML models often make important decisions that affect people, risk management and compliance
is a critically important MLOps practice, the essence of which is to ensure compliance of developed ML
models and systems with regulatory requirements and standards (compliance) and manage potential
risks associated with their development and operation.
This practice has a cross-cutting nature and manifests itself at different stages of the ML model
lifecycle. Risk management in the context of MLOps involves identifying and mitigating potential risks
associated with ML models, such as data bias, privacy breaches, model accuracy deterioration over
time, etc. Compliance means ensuring that models comply with regulatory requirements, for example,
regarding data protection [7, pp. 18-19]. The main condition for using this practice is the presence of
regulatory requirements or industry standards that the ML system must comply with. Examples can be
GDPR requirements for personal data protection or certifications in the healthcare industry [7, p. 5].
401
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Steidl et al. [7] indicate that compliance aspects should be considered during data preparation, model
training and validation, as well as deployment and monitoring [7, p. 18].
Methods of using this practice include regular data quality checks, testing models for bias and
discrimination, implementing access control and data encryption, documenting model architecture and
development process, monitoring model performance after deployment [7, pp. 11-12]. Yes, compliance
is achieved by [7, p. 18]:
• quality control and origin of data used for model training to avoid violation of regulatory require-
ments;
• documenting and versioning models and data to ensure reproducibility of results and auditing;
• verifying models integrated into the production environment for compliance with requirements;
• continuous monitoring of deployed models for timely detection of potential violations or incorrect
behavior.
The leading tools for ensuring risk management and compliance practice are version control systems
for tracking changes in data and model code, testing tools for identifying problems in models, monitoring
systems for tracking model accuracy in real time [7, pp. 12, 14]. At the same time, interviews with
practitioners conducted by Steidl et al. [7] revealed that ensuring compliance of ML systems in various
domains (for example, in healthcare) is a serious challenge due to the lack of established methodologies
and software tools. At the same time, achieving compliance is a mandatory condition for obtaining the
necessary certifications and permits from regulators.
The diagram in figure 20 shows three main MLOps stages: working with training data, model training,
and system deployment (operationalization). At each stage there are blocks that indicate potential risks
and compliance measures. This diagram illustrates the cross-cutting nature of risk management and
compliance practice in MLOps, showing its presence and interconnections at each stage of the machine
learning model lifecycle.
Figure 21 shows the relationships between the key principles, deployment process, and main MLOps
practices that are applied at the stage of machine learning model deployment. The principles of
automation and reproducibility influence the model deployment process, which is associated with
the following MLOps practices: CI/CD, model deployment, data security and privacy, configuration
management, model deployment strategies, and infrastructure automation.
4. Conclusions
A meta-synthesis of systematic reviews [6, 8, 5] and a review of products and providers [15] was
performed in order to generalize knowledge about the implementation of MLOps practices for the
effective deployment of machine learning models. The main conclusions obtained as a result of the
meta-synthesis are as follows:
1. MLOps is an approach for managing, automating, and operationalizing the processes of developing,
deploying, and supporting machine learning models based on practices from software engineering
and DevOps. MLOps is based on a set of principles, processes, and practices that ensure effective
development, deployment, and support of machine learning models.
2. The main stages of the MLOps lifecycle include the following processes: data collection and
processing, model development and training, deployment, monitoring, and retraining of models.
3. Various frameworks and architectures are used to implement MLOps, such as open source
platforms (MLflow, Kubeflow, TensorFlow Extended), cloud computing platforms (AWS, Google
Cloud, Azure), containerization (Docker), and container orchestration (Kubernetes).
402
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
data model deployment
risks risks risks
data collection model training operationalization
data model deployment
compliance compliance compliance
Figure 20: Diagram of risk management and compliance practice in MLOps.
Automation Model deployment CI/CD
Model deployment
Data security and privacy
Configuration management
Reproducibility
Model deployment strategies
Infrastructure automation
Figure 21: Diagram of relationships between principles, processes and MLOps practices for model deployment.
4. MLOps tools provide a wide range of features to support the machine learning model lifecycle,
with a focus on automation, experiment tracking, versioning, monitoring, and model deployment.
5. The most common ways to deploy machine learning models in production environments are the
use of container technologies, cloud platforms and services, and the deployment of models as
web services.
403
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
6. Adapted software development maturity models, such as CMM, can be used to assess the maturity
level of MLOps processes in organizations.
7. Successful MLOps implementation requires the involvement of specialists from different ar-
eas - software development, data engineering, machine learning, subject matter experts, and
management.
8. The main challenges when deploying machine learning models in production environments are
managing the model lifecycle, ensuring scalability and performance, monitoring and maintaining
models in real-world conditions.
9. Open issues and challenges in MLOps are the need to develop standards and best practices, ensure
interpretability and responsible use of models, effectively manage data, integrate knowledge from
different fields.
10. The main opportunities and development trends of MLOps are the creation of standardized
platforms, application in the context of distributed learning, and integration with other approaches
to managing the lifecycle of data and models. Current and future areas of MLOps application
include a wide range of industries, from finance and healthcare to IoT and natural language
processing.
The conducted meta-synthesis showed that MLOps is a promising approach for the effective deploy-
ment of machine learning models in production environments, which requires further research and
development to address existing challenges and realize potential opportunities.
Next, the key MLOps practices necessary for effective deployment of machine learning models were
analyzed. The main conclusions obtained as a result of the analysis are as follows:
1. MLOps is based on a set of principles, processes and practices that ensure effective development,
deployment and maintenance of machine learning models. The key principles of MLOps are
automation, reproducibility, collaboration, continuous learning and data governance.
2. The main MLOps practices include: continuous integration and delivery (CI/CD), model and data
versioning, ML pipeline automation, model performance monitoring, experiment management,
model deployment and lifecycle management.
3. Additional MLOps practices, such as data security and privacy, model explainability and inter-
pretability, data quality management, configuration management, model deployment strategies,
infrastructure automation, collaboration and communication, risk management and compliance,
are important for ensuring reliability, compliance with requirements and efficiency of MLOps
processes.
4. The application of MLOps practices allows automating and standardizing the processes of devel-
opment, deployment and maintenance of machine learning models, which increases the efficiency
and reliability of ML solutions in the production environment.
5. Successful implementation of MLOps practices requires the use of appropriate tools and platforms,
such as experiment management systems, data and model versioning, infrastructure automation
and monitoring tools, as well as establishing effective collaboration between the different roles
and teams involved in the process of developing and implementing machine learning models.
The analysis showed that the application of MLOps practices is critically important for the successful
deployment of machine learning models in production environments. The implementation of these
practices allows increasing the efficiency, reliability and reproducibility of the processes of development
and operation of ML solutions, which is a necessary condition for their successful use in real business
problems.
404
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
As a result of the comprehensive study, the MLOps practices necessary for the effective deployment
of machine learning models were identified and analyzed. The main conclusions obtained during the
study are as follows:
1. A meta-synthesis of systematic reviews was performed to generalize knowledge about MLOps
practices. The conducted meta-synthesis showed that MLOps is a promising approach for the
effective deployment of machine learning models in production environments, which requires
further research and development to solve existing challenges and realize potential opportunities.
2. A diagram of relationships between MLOps principles, processes and practices is proposed.
This diagram illustrates the interconnections between the key principles, stages of the machine
learning model development and implementation process, as well as the main MLOps practices
that are applied at each stage.
3. The most effective MLOps practices for model deployment have been identified, which include:
continuous integration and delivery (CI/CD), model and data versioning, ML pipeline automa-
tion, model performance monitoring, experiment management, model deployment and lifecycle
management, data security and privacy, model explainability and interpretability, data quality
management, configuration management, model deployment strategies, infrastructure automa-
tion, collaboration and communication, risk management and compliance.
The obtained results have both theoretical and practical significance. The theoretical significance
lies in the generalization and systematization of knowledge about MLOps practices necessary for the
effective deployment of machine learning models. The practical significance of the obtained results
lies in the possibility of their use by organizations for the implementation or improvement of MLOps
processes in order to increase the efficiency and reliability of machine learning model deployment in
production environments.
Further research may be aimed at developing detailed recommendations for the implementation of
individual MLOps practices in organizations, creating new tools and platforms for automating and
managing the lifecycle of machine learning models, as well as studying the effectiveness of applying
MLOps practices in various industries and areas of machine learning model application.
Declaration on Generative AI: During the preparation of this work, the authors used Claude 3 Opus in order to: Text
Translation, Abstract drafting. After using this service, the authors reviewed and edited the content as needed and takes full
responsibility for the publication’s content.
References
[1] P. V. Zahorodko, S. O. Semerikov, V. N. Soloviev, A. M. Striuk, M. I. Striuk, H. M. Shalatska, Com-
parisons of performance between quantum-enhanced and classical machine learning algorithms
on the IBM Quantum Experience, Journal of Physics: Conference Series 1840 (2021) 012021.
doi:10.1088/1742-6596/1840/1/012021.
[2] D. Kreuzberger, N. Kühl, S. Hirschl, Machine Learning Operations (MLOps): Overview, Definition,
and Architecture, IEEE Access 11 (2023) 31866–31879. doi:10.1109/ACCESS.2023.3262138.
[3] G. Symeonidis, E. Nerantzis, A. Kazakis, G. A. Papakostas, MLOps - Definitions, Tools and
Challenges, in: 2022 IEEE 12th Annual Computing and Communication Workshop and Conference
(CCWC), 2022, pp. 0453–0460. doi:10.1109/CCWC54503.2022.9720902.
[4] M. Testi, M. Ballabio, E. Frontoni, G. Iannello, S. Moccia, P. Soda, G. Vessio, MLOps: A Taxonomy
and a Methodology, IEEE Access 10 (2022) 63606–63618. doi:10.1109/ACCESS.2022.3181730.
[5] J. Diaz-de Arcaya, A. I. Torre-Bastida, G. Zárate, R. Miñón, A. Almeida, A Joint Study of the
Challenges, Opportunities, and Roadmap of MLOps and AIOps: A Systematic Survey, ACM
Comput. Surv. 56 (2023) 84. doi:10.1145/3625289.
405
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
[6] G. Recupito, F. Pecorelli, G. Catolino, S. Moreschini, D. D. Nucci, F. Palomba, D. A. Tamburri, A
Multivocal Literature Review of MLOps Tools and Features, in: 2022 48th Euromicro Conference
on Software Engineering and Advanced Applications (SEAA), 2022, pp. 84–91. doi:10.1109/
SEAA56994.2022.00021.
[7] M. Steidl, M. Felderer, R. Ramler, The pipeline for the continuous development of artificial
intelligence models—current state of research and practice, Journal of Systems and Software 199
(2023) 111615. doi:10.1016/j.jss.2023.111615.
[8] A. Lima, L. Monteiro, A. P. Furtado, MLOps: Practices, Maturity Models, Roles, Tools, and
Challenges – A Systematic Literature Review, in: Proceedings of the 24th International Conference
on Enterprise Information Systems - Volume 1: ICEIS, INSTICC, SciTePress, 2022, pp. 308–320.
doi:10.5220/0010997300003179.
[9] K. Haller, Managing AI in the enterprise: Succeeding with AI projects and MLOps to build
sustainable AI organizations, Apress Berkeley, CA, 2022. doi:10.1007/978-1-4842-7824-6.
[10] E. e Oliveira, M. Rodrigues, J. P. Pereira, A. M. Lopes, I. I. Mestric, S. Bjelogrlic, Unlabeled learning
algorithms and operations: overview and future trends in defense sector, Artificial Intelligence
Review 57 (2024) 66. doi:10.1007/s10462-023-10692-0.
[11] A. B. Kolltveit, J. Li, Operationalizing machine learning models: a systematic literature review,
in: Proceedings of the 1st Workshop on Software Engineering for Responsible AI, SE4RAI ’22,
Association for Computing Machinery, New York, NY, USA, 2023, p. 1–8. doi:10.1145/3526073.
3527584.
[12] F. Calefato, F. Lanubile, L. Quaranta, A Preliminary Investigation of MLOps Practices in GitHub, in:
Proceedings of the 16th ACM / IEEE International Symposium on Empirical Software Engineering
and Measurement, ESEM ’22, Association for Computing Machinery, New York, NY, USA, 2022, p.
283–288. doi:10.1145/3544902.3546636.
[13] M. J. Page, J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hoffmann, C. D. Mulrow, L. Shamseer,
J. M. Tetzlaff, E. A. Akl, S. E. Brennan, R. Chou, J. Glanville, J. M. Grimshaw, A. Hróbjartsson, M. M.
Lalu, T. Li, E. W. Loder, E. Mayo-Wilson, S. McDonald, L. A. McGuinness, L. A. Stewart, J. Thomas,
A. C. Tricco, V. A. Welch, P. Whiting, D. Moher, The PRISMA 2020 statement: an updated guideline
for reporting systematic reviews, BMJ 372 (2021) n71. doi:10.1136/bmj.n71.
[14] C. Haertel, D. Staegemann, C. Daase, M. Pohl, A. Nahhas, K. Turowski, MLOps in Data Science
Projects: A Review, in: 2023 IEEE International Conference on Big Data (BigData), 2023, pp.
2396–2404. doi:10.1109/BigData59044.2023.10386139.
[15] R. Cohen, Digital Strategy, Machine Learning, and Industry Survey of MLOps, in: Digital Strategies
and Organizational Transformation, 2023, pp. 137–150. URL: https://tinyurl.com/33z6zpd3. doi:10.
1142/9789811271984_0008.
[16] T. A. Sipe, W. L. Curlette, A meta-synthesis of factors related to educational achievement: a
methodological approach to summarizing and synthesizing meta-analyses, International Journal
of Educational Research 25 (1996) 583–698. doi:10.1016/S0883-0355(96)80001-2.
[17] J. Chrastina, Meta-synthesis of qualitative studies: background, methodology and applications, in:
NORDSCI Conference proceedings, volume 1 of NORDSCI Conference, Saima Consult Ltd, 2018.
doi:10.32008/nordsci2018/b1/v1/13.
[18] S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, T. Zimmermann,
Software Engineering for Machine Learning: A Case Study, in: 2019 IEEE/ACM 41st International
Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019, pp.
291–300. doi:10.1109/ICSE-SEIP.2019.00042.
[19] S. Dhanorkar, C. T. Wolf, K. Qian, A. Xu, L. Popa, Y. Li, Who needs to know what, when?:
Broadening the Explainable AI (XAI) Design Space by Looking at Explanations Across the AI
Lifecycle, in: Proceedings of the 2021 ACM Designing Interactive Systems Conference, DIS ’21,
Association for Computing Machinery, New York, NY, USA, 2021, p. 1591–1602. doi:10.1145/
3461778.3462131.
[20] L. E. Lwakatare, I. Crnkovic, J. Bosch, DevOps for AI – Challenges in Development of AI-enabled
Applications, in: 2020 International Conference on Software, Telecommunications and Computer
406
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Networks (SoftCOM), 2020, pp. 1–6. doi:10.23919/SoftCOM50211.2020.9238323.
[21] R. Akkiraju, V. Sinha, A. Xu, J. Mahmud, P. Gundecha, Z. Liu, X. Liu, J. Schumacher, Characterizing
Machine Learning Processes: A Maturity Framework, in: D. Fahland, C. Ghidini, J. Becker,
M. Dumas (Eds.), Business Process Management, volume 12168 of Lecture Notes in Computer Science,
Springer International Publishing, Cham, 2020, pp. 17–31. doi:10.1007/978-3-030-58666-9_
2.
[22] C. Min, A. Mathur, U. G. Acer, A. Montanari, F. Kawsar, SensiX++: Bringing MLOps and Multi-
tenant Model Serving to Sensory Edge Devices, ACM Trans. Embed. Comput. Syst. 22 (2023) 98.
URL: https://doi.org/10.1145/3617507. doi:10.1145/3617507.
[23] F. Bachinger, J. Zenisek, M. Affenzeller, Automated Machine Learning for Industrial Applications
– Challenges and Opportunities, Procedia Computer Science 232 (2024) 1701–1710. doi:10.1016/
j.procs.2024.01.168.
[24] K. Filippou, G. Aifantis, G. A. Papakostas, G. E. Tsekouras, Structure Learning and Hyperparameter
Optimization Using an Automated Machine Learning (AutoML) Pipeline, Information 14 (2023)
232. doi:10.3390/info14040232.
[25] A. Bodor, M. Hnida, N. Daoudi, Machine Learning Models Monitoring in MLOps Context: Metrics
and Tools, International Journal of Interactive Mobile Technologies (iJIM) 17 (2023) pp. 125–139.
doi:10.3991/ijim.v17i23.43479.
[26] P. Singh, Systematic review of data-centric approaches in artificial intelligence and machine
learning, Data Science and Management 6 (2023) 144–157. doi:10.1016/j.dsm.2023.06.001.
[27] J. Czakon, K. Kluge, ML Experiment Tracking: What It Is, Why It Matters, and How to Implement
It, 2024. URL: https://neptune.ai/blog/ml-experiment-tracking.
[28] E. Peltonen, S. Dias, LinkEdge: Open-sourced MLOps Integration with IoT Edge, in: Proceedings of
the 3rd Eclipse Security, AI, Architecture and Modelling Conference on Cloud to Edge Continuum,
ESAAM ’23, Association for Computing Machinery, New York, NY, USA, 2023, p. 67–76. doi:10.
1145/3624486.3624496.
[29] L. A. Melgar, D. Dao, S. Gan, N. M. Gürel, N. Hollenstein, J. Jiang, B. Karlas, T. Lemmin, T. Li,
Y. Li, S. X. Rao, J. Rausch, C. Renggli, L. Rimanic, M. Weber, S. Zhang, Z. Zhao, K. Schawinski,
W. Wu, C. Zhang, Ease.ML: A Lifecycle Management System for MLDev and MLOps, in: 11th
Conference on Innovative Data Systems Research, CIDR 2021, Virtual Event, January 11-15, 2021,
Online Proceedings, 2021. URL: https://www.cidrdb.org/cidr2021/papers/cidr2021_paper26.pdf.
[30] H. Chen, M. A. Babar, Security for Machine Learning-based Software Systems: A Survey of Threats,
Practices, and Challenges, ACM Comput. Surv. 56 (2024) 151. doi:10.1145/3638531.
[31] N. K. Gopalakrishna, D. Anandayuvaraj, A. Detti, F. L. Bland, S. Rahaman, J. C. Davis, “If security
is required”: engineering and security practices for machine learning-based IoT devices, in:
Proceedings of the 4th International Workshop on Software Engineering Research and Practice
for the IoT, SERP4IoT ’22, Association for Computing Machinery, New York, NY, USA, 2023, p.
1–8. doi:10.1145/3528227.3528565.
[32] T. Miller, Explanation in artificial intelligence: Insights from the social sciences, Artificial
Intelligence 267 (2019) 1–38. doi:10.1016/j.artint.2018.07.007.
[33] F. Rezazadeh, H. Chergui, L. Alonso, C. Verikoukis, SliceOps: Explainable MLOps for Streamlined
Automation-Native 6G Networks, IEEE Wireless Communications 31 (2024) 224–230. doi:10.
1109/MWC.007.2300144.
[34] R. C. Godwin, R. L. Melvin, Toward efficient data science: A comprehensive MLOps template for
collaborative code development and automation, SoftwareX 26 (2024) 101723. doi:10.1016/j.
softx.2024.101723.
[35] D. Yongqiang, W. Xin, L. Yongbo, Y. Wang, Building Network Domain Knowledge Graph from
Heterogeneous YANG Models, Journal of Computer Research and Development 57 (2020) 699–708.
doi:10.7544/issn1000-1239.2020.20190882.
[36] Neptune Labs, MLOps Landscape in 2024: Top Tools and Platforms, 2024. URL: https://neptune.ai/
blog/mlops-tools-platforms-landscape.
[37] A. Gunny, D. Rankin, P. Harris, E. Katsavounidis, E. Marx, M. Saleem, M. Coughlin, W. Benoit, A
407
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Software Ecosystem for Deploying Deep Learning in Gravitational Wave Physics, in: Proceedings
of the 12th Workshop on AI and Scientific Computing at Scale Using Flexible Computing Infras-
tructures, FlexScience ’22, Association for Computing Machinery, New York, NY, USA, 2022, p.
9–17. doi:10.1145/3526058.3535454.
[38] C. Vuppalapati, A. Ilapakurti, K. Chillara, S. Kedari, V. Mamidi, Automating Tiny ML Intelligent
Sensors DevOPS Using Microsoft Azure, in: 2020 IEEE International Conference on Big Data (Big
Data), 2020, pp. 2375–2384. doi:10.1109/BigData50022.2020.9377755.
[39] R. Sothilingam, V. Pant, E. S. K. Yu, Using i* to Analyze Collaboration Challenges in MLOps Project
Teams, in: A. Maté, T. Li, E. J. T. Gonçalves (Eds.), Proceedings of the 15th International iStar
Workshop (iStar 2022) co-located with 41th International Conference on Conceptual Modeling
(ER 2022), Virtual Event, Hyderabad, India, October 17, 2022, volume 3231 of CEUR Workshop
Proceedings, CEUR-WS.org, 2022, pp. 1–6. URL: https://ceur-ws.org/Vol-3231/iStar22_paper_1.pdf.
A. Using the large language model Claude 3 Sonnet for analyzing
systematic reviews
The queries were created on 10.05.2024. A PDF file with the article text was added to each query.
A report according to the submitted plan was expected from the chatbot. The queries consisted of
universal and variable parts.
Universal part of the query:
Using the added review article file as a data source,
write a detailed report on it according to the following plan.
Variable parts of the query:
1. 1. Year of publication.
2. Research objective (paper aim).
3. Research questions.
2. 4. Information sources (databases, etc.)
5. Inclusion criteria.
6. Exclusion criteria.
7. Quality criteria.
3. 8. MLOps definition (if any).
9. MLOps workflow stages (if any).
4. 10. What frameworks and architectures facilitate MLOps?
5. 11. What MLOps tools can be used to build ML pipelines for
Continuous Deployment? What tools are used in the
activities for operationalizing machine learning models?
6. 12. What are the main features offered by MLOps tools?
7. 13. How are machine learning models deployed in production
environments?
8. 14. What maturity models are used to assess the level of
automation in deploying machine learning models?
9. 15. What roles and responsibilities are identified in the
activities of operationalization of machine learning models?
408
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
10. 16. What challenges are encountered with regard to
deploying machine learning models in production
environments?
11. 17. What are the open issues, challenges and
particularities in MLOps?
18. What are the opportunities and future trends in MLOps?
What are the current and future fields in which MLOps is
thriving?
B. Results of systematic reviews analysis
Table B.1
Results of systematic review analysis [6, 8, 5].
Comparison Review by Review by Review by
object Recupito et al. [6] Lima et al. [8] Diaz-de Arcaya et al. [5]
Year of publi- 2022 2022 2023
cation
Research To identify tools that allow cre- Review of existing literature The main objective of this sys-
objective ating MLOps pipelines for con- to identify practices, standards, tematic literature review is to
tinuous deployment. roles, maturity models, chal- provide an understanding of the
To analyze the main character- lenges and tools used to auto- implementation of MLOps and
istics and functions of these mate activities of operational- AIOps methodologies in both in-
MLOps tools to provide a com- izing machine learning mod- dustry and academia. The au-
prehensive overview of their els into industrial operation thors seek to highlight the chal-
value. (MLOps). lenges, opportunities and future
trends in these areas.
Research ques- We answer the following main RQ1: How are machine learning Q1: What are the open issues,
tions research question: What tools models deployed in production challenges and particularities in
and capabilities enable develop- environments? MLOps and AIOps?
ers to create ML-enabled soft- RQ2: What maturity models are Q2: What are the opportunities
ware systems? used to assess the level of au- and future trends in MLOps?
Which was detailed into two tomation in deploying machine Q3: What are the opportunities
sub-questions: learning models? and future trends in AIOps?
RQ1. What MLOps tools can be RQ3: What roles and responsi- Q4: What platforms and archi-
used to create machine learning bilities are defined in the activi- tectures facilitate MLOps and
pipelines for continuous deploy- ties of operationalizing machine AIOps?
ment? learning models? Q5: What are the current and fu-
RQ2. What are the main fea- RQ4: What tools are used in ture fields in which MLOps and
tures offered by MLOps tools? the activities of operationaliz- AIOps are thriving?
ing machine learning models?
RQ5: What challenges are en-
countered when deploying ma-
chine learning models in pro-
duction environments?
Continued on next page
409
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Continuation of table B.1
Comparison Review by Review by Review by
object Recupito et al. [6] Lima et al. [8] Diaz-de Arcaya et al. [5]
Information Google Scholar – for searching The automated search was To search for relevant arti-
sources scientific literature, such as jour- conducted in the following cles, the authors used several
nals, books and dissertations. electronic research databases: databases and repositories, in-
Google Search – for searching ACM Digital Library, IEEE cluding arXiv, Springer, IEEE.
so-called “gray” literature, such Xplore, ScienceDirect and However, the main source was
as blog posts, developer sites, SpringerLink. the Scopus database from Else-
webinars, GitHub repositories vier, as it contains metadata and
and YouTube videos. abstracts of many publications.
Using both academic (white lit-
erature) and professional (gray
literature) sources allowed the
authors to explore MLOps from
different perspectives – theoret-
ical and practical.
Inclusion crite- The study discusses compo- Studies related to machine Articles published between
ria nents of a minimal end-to-end learning operations (MLOps) in 2018 and 2023, identified by
MLOps workflow. general. search queries, containing new
The study discusses MLOps Studies evaluating the lifecycle ideas and closely related to
practice or machine learning- of machine learning solutions. the topic of MLOps and AIOps
based applications. Studies related to maturity were included in the analysis.
The study relates to the imple- models of the machine learning
mentation of MLOps tool(s). process.
The study describes experi- Studies analyzing roles and
ences, opinions, or practices re- responsibilities involved in the
garding MLOps pipelines. development and deployment
of machine learning solutions.
Studies covering tools for
deploying machine learning
solutions.
Studies identifying challenges
for the development and de-
ployment of machine learning
models.
Exclusion cri- The study does not provide de- Studies not published in En- Publications not in English, re-
teria tails on the design or implemen- glish. tracted publications, publica-
tation of MLOps tool(s). Studies related to the applica- tions from irrelevant publishers,
The study only proposes the de- tion of machine learning mod- subscription materials without
sign of a certain component of els. access, and articles with insuf-
machine learning pipelines. Short papers or posters. ficient citations (depending on
The study does not provide or Studies not related to machine the year of publication) were ex-
reference details on machine learning operations. cluded.
learning automation. Studies whose content is not ac-
The study refers to commercial cessible.
platforms that offer MLOps ap- Articles not relevant to the re-
plications to promote their de- search questions.
velopment and deployment ser-
vices.
Continued on next page
410
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Continuation of table B.1
Comparison Review by Review by Review by
object Recupito et al. [6] Lima et al. [8] Diaz-de Arcaya et al. [5]
Quality crite- The repository must have at Does the study report unam- Comprehensive literature
ria least 100 stars biguous findings based on ev- review and gap identification
The YouTube video must have idence and arguments? Verification of results on a use
been viewed at least 1000 times Does the study represent a re- case
search project and not an expert Number of research questions
opinion? addressed
Is the context being analyzed Publication under an open
fully described in the study? license
Are the research objectives Type of publication (jour-
clearly defined? nal/other)
Are the research results prop- Publication in a high-impact
erly validated? journal
Number of citations
MLOps defini- MLOps is a practice that helps A set of practices and princi- MLOps uses machine learning,
tion (if avail- model, develop and operate ples for operationalizing data DevOps and data engineering to
able) the machine learning lifecycle, science solutions, used to au- bring machine learning systems
drawing on DevOps principles tomate the deployment of ma- into production, facilitating the
and practices. chine learning models into an creation of machine products.
operational environment
MLOps work- Data extraction for integration Data collection Data management
flow stages (if Data analysis Data transformation Distributed training
available) Data cleaning, transformation Continuous training Deployment
and feature engineering to split Continuous model deployment Monitoring
data Presentation of results Retraining
Model training Monitoring of machine learning The need to manage the lifecy-
Model validation to assess the solutions cle and key components of AI
quality applications using specialized
Model deployment in target en- platforms and tools is empha-
vironments sized.
Model monitoring
Frameworks Continuous training pipelines MLflow HPC
and archi- deployed via CI/CD Kubeflow Cloud computing
tectures that Orchestration platforms Polyaxon Edge/IoT platforms
facilitate TensorFlow Extended (TFX) Comet.ml Serverless architectures
MLOps imple- Machine learning cloud plat- Kafka-ML Frameworks for integration
mentation forms MLModelCI Automatic data labeling meth-
ods
Proactive incident management
Platforms for orchestration
Semantically enhanced
pipelines
Architectures for distributed
training and deployment
Frameworks for monitoring
Programming languages
Containerized solutions
AutoML software
Use of APIs
Deep learning and neural
networks
Continued on next page
411
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Continuation of table B.1
Comparison Review by Review by Review by
object Recupito et al. [6] Lima et al. [8] Diaz-de Arcaya et al. [5]
MLOps tools Machine learning cloud MLflow - this open platform Containerized solutions (e.g.,
for creating platforms (AWS SageMaker, has components such as MLflow Docker)
machine learn- AzureML, Google AI Platform) Projects and MLflow Model Reg- Serverless computing (AWS
ing pipelines Orchestration platforms istry Lambda, Azure Functions, etc.)
and opera- (Apache Airflow, Jenkins, Kubeflow Continuous integra-
tionalizing Kubeflow, MLflow, Polyaxon, Kafka-ML tion/continuous deployment
models Seldon Core, Valohai) MLModelCI (CI/CD) tools
Configuration frameworks Monitoring of processes and
(TensorFlow Extended, Gitlab) events
MLOps automation using
AutoML
Containerization for packaging
model dependencies.
API technologies for deploy-
ment as a web service.
Main features Common features related to Experiment tracking Data, model, and code version-
offered by all phases of machine learning Model packaging and version- ing
MLOps tools pipelines ing Workflow orchestration and au-
Data management features ML project management tomation
Model management features Model registry Monitoring of processes and
Continuous integration and de- events
livery Integration with cloud and edge
Model monitoring infrastructure
Hyperparameter tuning Containerization
Portability MLOps automation using Au-
toML
Methods of Machine learning cloud MLOps is considered as a set of Deployment in the cloud Using
deploying platforms (AWS SageMaker, practices and principles for op- cloud computing resources
machine learn- AzureML, DotScience, Google erationalizing Providing isolation
ing models in AI Platform) Some MLOps tools, such as Hybrid approaches
production Orchestration platforms MLflow, Kubeflow, and Kafka- Deployment on edge/IoT de-
environments (Apache Airflow, Jenkins, ML vices
Kubeflow, MLflow, Polyaxon, The role of “Deployment Lead” Deployment directly on IoT de-
Seldon Core, Valohai) vices and mobile devices.
TensorFlow Extended (TFX) TensorFlow Lite and Core ML.
Overcoming limitations
Containerized deployments
Packaging models
Docker.
Serverless architectures
Deploying ML functions as a
service
Reducing costs
Deployment via APIs
Continued on next page
412
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Continuation of table B.1
Comparison Review by Review by Review by
object Recupito et al. [6] Lima et al. [8] Diaz-de Arcaya et al. [5]
Maturity mod- Support for continuous inte- Maturity model proposed by The article does not define spe-
els for assess- gration and continuous deploy- Amershi et al. [18] cific maturity models for assess-
ing the level of ment (CI/CD) Dhanorkar et al. [19] classify or- ing the automation of machine
automation in The ability to automatically ganizations into three levels of learning model deployment, but
deploying ma- tune maturity emphasizes the importance of
chine learning Full automation of model man- Lwakatare et al. [20] describe adapting software development
models agement processes five stages of improvement in practices to the field of machine
development practices learning.
Akkiraju et al. [21] propose an
adaptation of the Capability Ma-
turity Model (CMM)
Roles and re- Data scientists – responsible for Domain specialist – has deep Software developers
sponsibilities developing and training knowledge of the subject area Data specialists/data scientists
identified in Data engineers – responsible Computational scien- Operations engineers
the activities for extracting, processing, trans- tist/engineer – has high Domain experts
of operational- forming, and ensuring the qual- technical skills Management/stakeholders
izing machine ity of data ML scientist/engineer – respon-
learning mod- DevOps engineers – responsi- sible for designing
els ble for automating deployment Provenance specialist – man-
processes and managing opera- ages the supply of data
tional environments Manager – evaluates models
Product managers and business Application developer – devel-
stakeholders – provide require- ops applications
ments for models and partici- Deployment lead – assesses
pate in decision-making aspects
Challenges Complexity of managing Integration of software devel- Managing the ML lifecycle
encountered Ensuring consistency opment Gap between software engineer-
when deploy- Integration of various tools Implementation of ing
ing machine Automation of all stages MLOps/AIOps practices Data management
learning Monitoring model performance Machine learning models need Distributed and parallel execu-
models in Scaling infrastructure monitoring tion
production Ensuring security Identifying infrastructure Diversity of computing infras-
environments components tructure
Deploying and versioning Monitoring
Explainability
Challenges Lack of standardization Integration of software develop- Lack of skilled personnel
encountered Ensuring portability ment processes Data management issues
when deploy- Configuration and integration Implementation of MLOps prac- Complexity of orchestration
ing machine Full automation tices Heterogeneity of hardware
learning Understandability It is necessary to go beyond an- Need for continuous monitor-
models in alyzing model prototypes ing
production Careful monitoring Lack of explainability
environments Determining infrastructure Scaling and performance issues
Addressing scalability
Need for versioning
Automation
Managing the lifecycle
Integration with DevOps
Continued on next page
413
Danylo O. Hanchuk et al. CEUR Workshop Proceedings 362–414
Continuation of table B.1
Comparison Review by Review by Review by
object Recupito et al. [6] Lima et al. [8] Diaz-de Arcaya et al. [5]
Opportunities, Standardization of MLOps Demand for MLOps tools and Involvement of business units
future trends, practices and tools platforms is expected to grow Greater attention to the ML
and areas of Improved support environ- Industries where MLOps is lifecycle
application of ments. actively developing – finance, Better data management prac-
MLOps Advances in automation and healthcare, industry, retail, tices
Integration of MLOps with transportation, and logistics. Use of new hardware platforms
DevOps and DevSecOps. Possible development of spe- Use of containers, serverless
Increased attention to data cialized technologies
management Integration of MLOps with De- Development of versioning
Areas of application vSecOps, MLSecOps concepts tools
Financial services and banking Emergence of new roles and Industries where MLOps is
Healthcare and biotechnology competencies thriving:
Manufacturing and Internet of Traditional industries
Things Innovative industries
Retail and e-commerce Academic disciplines
Telecommunications Communication and network-
Transportation and logistics ing technologies
Healthcare
Scientific activity
414