=Paper= {{Paper |id=Vol-3231/iStar22_paper_1 |storemode=property |title=Using i* to Analyze Collaboration Challenges in MLOps Project Teams |pdfUrl=https://ceur-ws.org/Vol-3231/iStar22_paper_1.pdf |volume=Vol-3231 |authors=Rohith Sothilingam,Vik Pant,Eric Yu |dblpUrl=https://dblp.org/rec/conf/istar/SothilingamPY22 }} ==Using i* to Analyze Collaboration Challenges in MLOps Project Teams== https://ceur-ws.org/Vol-3231/iStar22_paper_1.pdf
Using i* to Analyze Collaboration Challenges in
MLOps Project Teams
Rohith Sothilingam1,2 , Vik Pant1 and Eric Yu1,2
1
    Faculty of Information, University of Toronto, Toronto, Canada
2
    Department of Computer Science, University of Toronto, Canada


                                         Abstract
                                         The rapidly growing interest of applying continuous software engineering practices from DevOps
                                         in Machine Learning (ML) software projects has led to the relatively new area of Machine Learning
                                         Operations (MLOps). The need for software engineers to collaborate with data scientists, ML Engineers,
                                         DevOps Engineers, and other specialists has contributed to the emergence of MLOps. MLOps introduces
                                         unique challenges involving the intersection of infrastructure engineering to the exploratory model
                                         development process of ML. The collaboration of team members of diverse skills and knowledge is
                                         required due to the need for continuous evolution and monitoring of both ML systems and the underlying
                                         infrastructure. The capabilities of i* modeling are potentially a good fit for the the unique characteristics of
                                         MLOps with respect to the analysis of nuances in strategic interests between actors of diverse disciplinary
                                         backgrounds that commonly face challenges during collaboration. In this work, we use i* Strategic
                                         Rationale modeling of actor relationships to analyze and resolve common conflicts and challenges faced
                                         in MLOps project teams during collaboration in the development of production ML systems. Examples
                                         from common key MLOps challenges are used to illustrate.

                                         Keywords
                                         Conceptual Modeling, Requirements Engineering, MLOps




1. Introduction
Continuous software engineering practices such as DevOps has contributed greatly to business
value in software development teams. The recent growth in interest toward applying the
practices of DevOps towards Machine Learning (ML) has led to the emergence of MLOps.
MLOps is an intersection between ML and DevOps practices. Tamburri [1] defines MLOps
as the distribution of a set of software components and middleware encompassing five ML
pipeline functions: data ingestion, data transformation, continuous ML model (re-)training,
(re-)deployment, and output presentation.
   MLOps projects and team organization are often at the early stages of the maturity curve,
compared to other types of software projects [2]. The software lifecycle for MLOps is similar
to traditional practice of DevOps within other software engineering areas, which involve
collaboration among people from diverse backgrounds and skills [3]. While DevOps aims


iStar’22: The 15th International i* Workshop, October 17th, 2022, Hyderabad, India
Envelope-Open rohith.sothilingam@mail.utoronto.ca (R. Sothilingam); vik.pant@utoronto.ca (V. Pant); eric.yu@utoronto.ca
(E. Yu)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
to shorten the development lifecycle of software systems, MLOps aims to borrow DevOps
principles to automate and operationalize ML applications and workflows.
    Unlike more mature areas of software development, MLOps practices are still evolving and
ill-defined, leading to challenges upon collaboration between roles of diverse backgrounds and
technical skills. Conflicts are particularly apparent during collaboration within ML teams due
to the interdisciplinary nature of MLOps and lack of MLOps practices in place, which can often
pose challenges. Other studies have observed that differing expectations, perspectives, and
interests in a software system can lead to conflicts during collaboration due to methodological
differences [4]. To achieve successful interdisciplinary collaboration in MLOps, it is important
to account for and analyze professional roles, strategic interests, skills, and attributes of team
members involved.
    In this paper we use i* modeling to examine how MLops practices are able to overcome
key challenges of ML before the adoption of MLOps practices. We use i* Strategic Rationale
modeling to demonstrate why and how MLOps practices are able to overcome those challenges.
This paper contributes to i* with respect to identifying nuanced challenges of MLOps that are
rooted in collaboration. The main contribution of this paper is its demonstration of the ability
of i* to convey the effect of collaboration patterns because of enabling MLOps practices. The
analysis demonstrated in this paper contributes to our ongoing work of examining whether i*
Strategic Rationale models can sufficiently address intricate issues of MLOps collaboration.


2. Using i* Strategic Actor Modeling to Identify Challenges in
   MLOps Project Team Collaboration
Recent empirical studies have found that while some organizations have adopted MLOps
practices more successfully than others [3], many struggle setting up structures, processes,
and tooling for effective collaboration among team members with different backgrounds when
developing ML-enabled systems.
   Much of software code in ML systems is dedicated to pipelines which contribute to continuous
learning, such as the training pipeline and data pipeline [5]. The processes used to develop
such pipelines are often subject to technical debt. In this section, we use i* Strategic Rationale
modeling to demonstrate an example of a design pattern as identified in Sculley et al. [5] which
can lead to technical debt [5]. Specifically, we demonstrate how i* can be used to identify the
underlying goals and soft-goals which lead to the issue of Process Management Debt.
   In more mature ML project teams, numerous ML models may be running simultaneously [6]
which raises the important challenge of managing changes and versioning of models safely and
automatically. It is challenging to respect different, often conflicting business priorities among
models and to subsequently detect any blockages or issues in the related pipelines.
   To understand such points of conflict, we pose the following questions. What tasks and
goals must be achieved for MLOps project teams to maintain continuous model performance?
What tasks and goals must be achieved to continually monitor and train models simultaneously
across different releases? In the following i* modeling, we will identify what modeling elements
are not satisfied and the significance of those elements not being satisfied. Our intention is to
demonstrate collaboration challenges known before the construction of the i* models. The target
Figure 1: i* Strategic Rationale Model showing collaboration challenges as a result of Process Manage-
ment Debt (as mentioned in [5])


audience of the i* models are ML model designers in practical settings who would consume and
analyze such i* models. The i* model in Fig. 1 is an example of one of many common recurrent
challenges faced with MLOps regarding team collaboration. Though the model deals with a
specific problem domain, the model can be customized to specific MLOps project challenges.
   This model in Fig. 1 conveys the attempted collaboration between three important actors
of any MLOps project team [7]: the Data Scientist, Operational Engineer, and the Software
Engineer. The Data Scientist is typically responsible for the design of the model, including
feature engineering, model training, and model fit with accordance to the business requirements.
The Operational Engineer is responsible for constantly monitoring the performance of the model,
with dips in performance possibly indicating that this entire process may need to be repeated to
update the model to understand new trends. The Software Engineer is responsible for deploying
and integrating the models into the application through model pipelines. In the following
paragraphs, we will show the i* modeling elements in Fig. 1 are not satisfied as a result of a
lack of established collaborative practices between each actor. The i* modeling elements which
are not satisfied are highlighted using red circles in Fig. 1.
   The Operational Engineer Actor cannot achieve the Goal of Model performance be maintained
as the task dependum Scale infrastructure needs between the Operational Engineer and Software
Engineer Actors is not satisfied. Scalability is a crucial contribution of the DevOps aspect of
MLOps. MLOps adopts the DevOps principles of Continuous Integration (CI) and Continuous
Delivery (CD) [8]. From the DevOps perspective of CD, processed datasets and trained models
Figure 2: i* Strategic Rationale Model showing the effects of MLOps practices to overcome collaboration
challenges faced by MLOps project teams


are automatically and continuously delivered by data scientists to Operational Engineers. From
the perspective of Continuous Training, introduction of new data and avoidance of model
performance degradation require a trigger to retrain the model or improve model performance
through online methods. Without scalable infrastructure, the Operational Engineer cannot
maintain continuous model performance.
  As a result of the lack of continuous processes in place, the Data Scientist Actor cannot
achieve the Goal of Model be ready for deployment because the underlying Goal Model version
control in place and Resource Continuous monitoring across releases for drift are not satisfied.
The Data Scientist Actor cannot achieve the Goal Model version control in place, leading to the
inability of the Software Engineer Actor to satisfy the Goal Ability to work in parallel on the same
application, to train and deploy multiple models continuously.


3. Overcoming Challenges in ML Project Team Collaboration
   and Relationships with MLOps Practices
The i* Strategic Rationale model in Fig. 2 conveys how the adoption of appropriate MLOps
practices can help overcome such challenges. In this section, we demonstrate an i* Strategic
Rationale model with MLOps practices added (highlighted as yellow elements) to Fig. 1. Adding
the i* intentional elements conveying MLOps practices and its effects on goals, soft-goals,
and dependum relationships fosters the ability to understand why and how MLOps practices
can improve collaboration challenges among relationships in MLOps project teams. Modeling
elements conveying MLOps practices are drawn from recent research studies on MLOps practices
[7] [9][10].
   To address Process Management Debt, it is crucial to develop tooling to aid recovery from
production incidents and manage conflicting priorities [6]. In Fig. 2, we added dependency
relationships with MLOps practices to address the collaboration relationships which were
unsatisfied in Fig. 1.
   The Task dependum Conduct code quality check allows the Software Engineer Actor to now
satisfy the task Assess code quality while collaborating with the Data Scientist Actor to ensure
the Business-approved model meets the expectations of the code quality check.
   The Goal dependum Feedback loop in place for model drift detection [10] allows the Data
Scientist Actor to satisfy the Resource element Continuous monitoring across releases for drift.
This is done with improved collaboration between the Data Scientist Actor and Software Engineer
Actor with the newly added task Build continuous monitoring capability across releases [9] which
ensures CD principles are applied to continuous model training and monitoring.
   The Resource dependum Model benchmarks and metrics allows the Operational Engineer to
satisfy the task Scale infrastructure needs as the MLOps practices of Assess resource utilization [7]
and Monitor predictive service performance are added as task elements to provide the Operational
Engineer with the resource and practices required to ensure the compute resources of the ML
model can be optimized and ultimately satisfy the goal Model performance be maintained.
   The problems identified in Fig.1 are resolved using i* with the addition of these i* elements.
The collaborative relationships between the Data Scientist, Software Engineer, and Operational
Engineer Actors are improved as a result. Though i* can address some aspects of MLOps
challenges, there are important limitations. Firstly, though i* is able to implicitly express the
existence of a feedback loop, it cannot explicitly express the feedback loop and its continuity
temporally. This is an important limitation of i* as each stage of the MLOps lifecycle is con-
tinuous and cyclical. Key examples include continuous training, deployment, monitoring, and
simultaneous training of multiple models. Secondly, it is not clear the relationship between the
skills, knowledge, and training of the Data Scientist and their ability to achieve a satisfactory
level of code quality. Further work is required to better analyze how such knowledge can evolve
and be incorporated at different points in collaborative relationships regarding MLOps projects.


4. Ongoing Work
In this work, we used i* Strategic Rationale Modeling to demonstrate the effects of MLOps
practices on overcoming collaboration challenges in MLOps project teams. Modeling the
relationships at an intentional level, compared to conventional process models, offers a higher
level of abstraction for analysis. In ongoing work, we are applying the i* concepts of Agents,
Roles, and Positions for modeling complex organizational relationships with respect to MLOps
project teams [11] [12]. As mentioned in Section 3, one important example of a challenge that
requires deeper analysis is the relationship between skills of a person and the expectations or
goals of the corresponding job role they are expected to occupy. Challenges can occur due to an
Agent not having the skill required to a Role, or having Roles with conflicting goals under one
Position. By identifying where such issues occur, we will use i* Agents, Roles, and Positions to
explore how organizations can diagnose challenges in team design in greater detail, through the
early detection of the problem. This ongoing work is a part of larger PhD thesis objectives, which
will include the following: (1) a requirements-driven framework which deals with conflicting
goals at design decision points throughout MLOps; (2) compilation and codification of design
knowledge from pertinent literature on MLOps and responsible AI to be available during design
decision in the form of knowledge catalogs; (3) tool support for the proposed framework.


References
 [1] D. A. Tamburri, Sustainable mlops: Trends and challenges, in: 2020 22nd International
     Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC),
     IEEE, 2020, pp. 17–23.
 [2] M. M. John, H. H. Olsson, J. Bosch, Towards mlops: A framework and maturity model, in:
     2021 47th Euromicro Conference on Software Engineering and Advanced Applications
     (SEAA), IEEE, 2021, pp. 1–8.
 [3] N. Nahar, S. Zhou, G. Lewis, C. Kästner, Collaboration challenges in building ml-enabled
     systems: Communication, documentation, engineering, and process, Organization 1 (2022)
     3.
 [4] S. Passi, P. Sengers, Making data science systems work, Big Data & Society 7 (2020)
     2053951720939605.
 [5] D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young,
     J.-F. Crespo, D. Dennison, Hidden technical debt in machine learning systems, Advances
     in neural information processing systems 28 (2015).
 [6] D. Sculley, M. E. Otey, M. Pohl, B. Spitznagel, J. Hainsworth, Y. Zhou, Detecting adversarial
     advertisements in the wild, in: Proceedings of the 17th ACM SIGKDD international
     conference on Knowledge discovery and data mining, 2011, pp. 274–282.
 [7] P. Ruf, M. Madan, C. Reich, D. Ould-Abdeslam, Demystifying mlops and presenting a
     recipe for the selection of open-source tools, Applied Sciences 11 (2021) 8861.
 [8] S. Mäkinen, H. Skogström, E. Laaksonen, T. Mikkonen, Who needs mlops: What data
     scientists seek to accomplish and how can mlops help?, in: 2021 IEEE/ACM 1st Workshop
     on AI Engineering-Software Engineering for AI (WAIN), IEEE, 2021, pp. 109–112.
 [9] H. Baniecki, W. Kretowicz, P. Piatyszek, J. Wisniewski, P. Biecek, dalex: Responsible
     machine learning with interactive explainability and fairness in python, arXiv preprint
     arXiv:2012.14406 (2020).
[10] M. Treveil, N. Omont, C. Stenac, K. Lefevre, D. Phan, J. Zentici, A. Lavoillotte, M. Miyazaki,
     L. Heidmann, Introducing MLOps, O’Reilly Media, 2020.
[11] R. Sothilingam, S. Eric, Modeling agents, roles, and positions in machine learning project
     organizations., in: iStar, 2020, pp. 61–66.
[12] R. Sothilingam, Analyzing Organizational Processes in Machine Learning Projects: Ex-
     ploring Modeling Approaches, Ph.D. thesis, University of Toronto (Canada), 2020.