=Paper=
{{Paper
|id=Vol-3609/paper21
|storemode=property
|title=MLOps Approach for Automatic Segmentation of Biomedical Images
|pdfUrl=https://ceur-ws.org/Vol-3609/short5.pdf
|volume=Vol-3609
|authors=Oleh Berezsky,Oleh Pitsun,Grygoriy Melnyk,Yuriy Batko,Petro Liashchynskyi,Mykola Berezkyi
|dblpUrl=https://dblp.org/rec/conf/iddm/BerezskyPMBLB23
}}
==MLOps Approach for Automatic Segmentation of Biomedical Images==
MLOps Approach for Automatic Segmentation of Biomedical Images Oleh Berezskya, Oleh Pitsuna, Grygoriy Melnyka, Yuriy Batkoa , Petro Liashchynskyia, Mykola Berezkyia a West Ukrainian National University, 11 Lvivska st., Ternopil, 46001, Ukraine Abstract When using artificial intelligence systems for processing medical images, a large amount of software libraries, data and cloud computing is required. Implementing deep learning elements in CAD is a complex process and applying DevOps can help speed up this process. The implementation of DevOps approaches in the field of machine learning differs from the operations with standard programs; therefore the development of MLOps approaches to the implementation of deep learning elements for the analysis of biomedical images is an actual task. The developed pipeline allows scientists and specialists to use the findings in this article to launch projects based on machine learning and focus on model development rather than the process of setting up the environment. This paper provides examples of improved MLOps pipelines that can be used for solving problems of automatic image segmentation and evaluating the quantitative characteristics of microobjects. Keywords 1 Machine learning, MLOps, biomedical images, programming. 1. Introduction Every year, software systems increasingly use machine learning elements. Despite the great demand for neural networks, there is still a need for programmers with specialized knowledge including the knowledge of development and system administrators. Special MLOps approaches are applied to speed up the software development process and increase its reliability and ease of software support. The purpose of this work is to develop MLOps approaches for automatic segmentation of histological and immunohistochemical images and evaluate quantitative characteristics of cell nuclei. MLOps approaches are developed to efficiently and reliably deploy infrastructure for running machine learning elements and provide convenient and continuous delivery and deployment of program code on cloud systems. This is a relatively new industry that requires the development of solutions for specific subject area. Thus, in this paper, we consider the processing of biomedical images. Usually, machine learning models are developed at the local level, which does not allow one to quickly run the code on any other computer system for data processing on the basis of machine learning. In most cases, such developments are used at the level of specialized laboratories and do not become widely used. However, modern hardware and cloud computing make it possible to use local developments on an industrial scale. The development of specific pipelines allows automating the process of deploying software code and increasing the efficiency of this process. Applying MLOps approaches for software development can help to get the following advantages: - less time for preparing and launching machine learning models; - scalability; - reduction of the number of errors and elimination of contradictory situations; IDDM’2023: 6th International Conference on Informatics & Data-Driven Medicine, November 17 - 19, 2023, Bratislava, Slovakia EMAIL: ob@wunu.edu.ua (A. 1); o.pitsun@wunu.edu.ua (A. 2); mgm@wunu.edu.ua (A. 3); bum@wunu.edu.ua (A. 4); p.liashchynskyi@st.wunu.edu.ua (A. 5); mykolaberezkyy@gmail.com (A. 6); ORCID: 0000-0001-9931-4154 (A. 1); 0000-0003-0280-8786 (A. 2); 0000-0003-0646-7448 (A. 3) ; 0000-0002-6732-4865 (A. 4) ; 0000- 0002-3920-6239 (A. 5) ; 0000-0001-6507-9117 (A. 6) ©️ 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings - program automation; - reduction of possible risks. Currently, there are already a large number of tools that allow you to deploy infrastructure, such as terraform. Mechanisms for continuous code delivery and deployment are also available. However, most of these mechanisms are used in DevOps tasks. The scientific novelty of this work lies in the development of MLOps workflow for automatic segmentation of biomedical images using the elements of deep machine learning. The purpose of our research is to improve the existing mechanisms for machine learning tasks. The object of our research is the processes of automatic creation of infrastructure for microscopic image processing. The subject of the research is DevOps practices for the creation of CI/CD pipelines. 2. Literature review In [1], the authors emphasized on the lack of regulatory documents for MLOps and offered their own analysis and classification of the existing documents. Based on the conducted analysis, they proposed a 10-step pipeline. Sajid Nazir et al. conducted a detailed analysis of artificial intelligence tools for biomedical image processing, using deep machine learning in [2]. In particular, the authors carried out an in-depth analysis of artificial intelligence tools when investigating breast cancer. In [3], unsolved problems in machine learning relating to the analysis of health preserving means were highlighted. The authors paid considerable attention to the problem of generating datasets for machine learning process. In work [4], the authors analyzed the problem of organizing interaction between specialists in IT field to solve problems based on machine learning. Therefore, the development of unified pipelines for software deployment is currently one of the most relevant problems in the field of machine learning. Adrien Bennetot et al. in [5] considered artificial intelligence tools applied on biomedical use case applications. The authors analyzed both standard models of neural networks and the latest ones such as transformers. Due to the analysis of trends in machine learning, modern diagnostic tools are defined. However, there is a need to reduce the complexity of the software configuration process and configure the interaction between different technologies. In [6-9], approaches for the implementation of DevOps as tools for processing biomedical images were highlighted, which made it possible to create the main elements of the pipeline. In [10], the authors presented a pipeline for the classification of biomedical images and the structure of a convolutional neural network for the classification of immunohistochemical images. In work [11], an approach for evaluating the quantitative characteristics of microobjects for diagnosis was proposed. The structure of the u-net neural network for automatic segmentation of biomedical images was presented in [12]. The analysis of the above publications has shown that the development of a pipeline, which can be used in deep machine learning tools and algorithms for processing biomedical images, is an urgent task. 3. Problem statement To develop a unified approach for automatic segmentation and evaluation of quantitative characteristics of microobjects, it is necessary to: - analyze the existing tools for implementing MLOps pipeline; - select the main components of the pipeline; - develop a pipeline for processing images with elements of artificial intelligence. 4. Analysis of MLOps tools and platforms Table 1 shows the results of the analysis of MLOps tools in computer vision. The main criteria for evaluating the existing tools are the following ones: - image classification; - image segmentation; - image tagging. Table 1 Results of the analysis of MLOps tools Software system Image classification Image tagging Image segmentation Nyckel [13] + + - Ximilar [14] + + - Hasty [15] - + + Levity [16] + + - Ximilar focuses on systems development with computer vision elements. This software can be considered as an API developer for business. The main emphasis is laid on image recognition and detection of elements in the image. In addition, this program has a convenient means of visually setting up components for the operation of neural networks. The Nyckel software system processes images and text. Nyckel is characterized by the mechanism for running learning models in a short time and by having a mechanism that makes it possible to use a small amount of input data to get started. The presence of an API allows integration with third-party services, which makes the program more flexible. The Roboflow platform specializes in computer vision. Roboflow uses cloud technologies to develop workflow management systems for image processing [17].The Levity software system is implemented using the "no-code" approach, which makes it possible to develop the necessary pipelines and systems for solving problems in many fields, both business and science. In [18], Multi-Domain Object Detection Benchmark was proposed using the Roboflow platform. In [19], the authors analyzed modern software systems for developing programs with machine learning elements, and highlighted such a stage as "validation" for optimizing the operation of neural network models. Hasty is used to annotate objects in an image and generate datasets. The main functions of this software product include: - classification; - tagging; - object detection; - instance segmentation; - panoptic segmentation; - attribute prediction. The main requirements for systems using MLOps approaches were given in works [20-22]. So, modern tools for building a pipeline have a standard set of components, but not all systems have the necessary functionality for image segmentation. 5. MLOps workflow for biomedical image segmentation This section presents a pipeline for automatic segmentation of images using Unet. The key stage in this process is generation and preparation of data (images). This is one of the key differences compared to analogues. MLOps workflow consists of 3 main components: 1. Build. 2. Deploy. 3. Monitor. The stage of data preparation includes the following steps: - creating the directory structure for training and test samples (for example, "original", "masks"); - creating the internal directory structure for storing image masks; - Data Labelling. The stage includes the rules for creating file names (for example, the suffix "_mask" is added to the mask); - changes in file size and other parameters. MLOps-workflow for biomedical images segmentation is shown in Figure 1. Figure 1: MLOps workflow for biomedical images segmentation. Image processing is an important stage because we are developing an image processing pipeline. This stage also includes the process of histogram alignment and changes in image parameters depending on the settings. U-net is used for image segmentation. This is a modern approach that makes it possible to use deep learning. At the same time, it is necessary to create the architecture and select hyperparameters for u- net. After creating the architecture, it is necessary to conduct training and validate the model. The monitoring stage is one of the key stages in DevOps approaches and is aimed at analyzing the system performance. Deployment is necessary for software release and for the use in real conditions. 6. CI/CD pipeline for evaluating the quantitative characteristics of microobjects The module for evaluating the quantitative characteristics of microobjects is an important component of the software system. Unlike other modules, this module can frequently change the code. This is due to the need to set parameters for different types of images. To automate the process of transferring parameter settings, it is proposed to use a separate deploy.yml file (Figure 2). Figure 2: Configuration file deploy.yml. In addition to the necessary entries for connection to the cloud server, this example shows the path to the repository with the software code for launching the project of evaluating the quantitative characteristics of microobjects. 7. Peculiarities of using the Infrastructure as Code approach Infrastructure as Code is a modern approach for the development and implementation of software, which makes it possible to write all the necessary elements of the server environment as software code. This is especially convenient for solving problems with elements of deep learning. As a tool, terraform is chosen, which allows one to use a large number of providers to deploy the project on various cloud services, such as AWS, digitalocean, Azure, etc. An example of biomedical images is shown in Figure 3. Figure 3: Example of biomedical images Configuration file structure for infrastructure deployment used in a project is shown in Figure 4. Figure 4: Configuration file structure for infrastructure deployment. Digitalocean is chosen as the provider for conducting experiments. To connect to the cloud storage, a token and ssh-keys are used as standard. Ubuntu server is chosen as the operating system. After deploying the main environment, you need to install the necessary software and download the dataset for further processing. Software code is updated using the CI/CD mechanism and taking github actions. The minimum requirements for the developed system are as follows: RAM – 4GB Disk Space – 25GB 1000 GB transfer OS – Ubnntu 18.X 8. Conclusions 1. According to the comparative analysis, the advantages and disadvantages of the existing systems with pipelines for automatic segmentation are highlighted. It is found that not all the systems have the necessary functionality. 2. The MLOps workflow is developed for the segmentation of biomedical images based on deep learning with Unet elements. 3. CI/CD pipeline is developed for software code delivery and deployment for evaluating the quantitative characteristics of microobjects. 4. The developed workflow can be a prototype not only for image segmentation programs, but also for solving other problems in another subject area. 9. References [1] M. Testi et al., "MLOps: A Taxonomy and a Methodology," in IEEE Access, vol. 10, pp. 63606- 63618, 2022, doi: 10.1109/ACCESS.2022.3181730. [2] Nazir, Sajid, Diane M. Dickson, and Muhammad Usman Akram. "Survey of explainable artificial intelligence techniques for biomedical imaging with deep neural networks." Computers in Biology and Medicine (2023): 106668. https://doi.org/10.1016/j.compbiomed.2023.106668 [3] Dhar, Tribikram, Nilanjan Dey, Surekha Borra, and R. Simon Sherratt. "Challenges of Deep Learning in Medical Image Analysis—Improving Explainability and Trust." IEEE Transactions on Technology and Society 4, no. 1 (2023): 68-75. https://doi.org/10.1109/TTS.2023.3234203 [4] Vega, Carlos, Miroslav Kratochvil, Venkata Satagopam, and Reinhard Schneider. "Translational challenges of biomedical machine learning solutions in clinical and laboratory settings." In International Work-Conference on Bioinformatics and Biomedical Engineering, pp. 353-358. Cham: Springer International Publishing, 2022. https://doi.org/10.1007/978-3- 031-07802-6_30 [5] Bennetot, Adrien, Ivan Donadello, Ayoub El Qadi, Mauro Dragoni, Thomas Frossard, Benedikt Wagner, Anna Saranti et al. "A Practical guide on Explainable AI Techniques applied on Biomedical use case applications." arXiv preprint arXiv:2111.14260 (2021). http://dx.doi.org/10.2139/ssrn.4229624 [6] Granlund, Tuomas, Vlad Stirbu, and Tommi Mikkonen. "Towards regulatory-compliant MLOps: Oravizio’s journey from a machine learning experiment to a deployed certified medical product." SN computer Science 2, no. 5 (2021): 342. https://doi.org/10.1007/s42979- 021-00726-1 [7] Reddy, Manjunatha, Brahmanand Dattaprakash, Sandesh Kammath, Subramanya Kn, Sumathra Manokaran, and Rangaswamy Be. "Application of mlops in prediction of lifestyle diseases." ECS Transactions 107, no. 1 (2022): 1191. DOI 10.1149/10701.1191ecst [8] Jain, Archit, Adarsh Malviya, Disha Bajaj, Revati Bhavsar, and Amit Savyanavar. "Brain Tumor Detection using MLops and Hybrid Multi-Cloud." In 2022 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS), pp. 1-6. IEEE, 2022. https://doi.org/10.1109/ICBDS53701.2022.9936020 [9] Stirbu, Vlad, Tuomas Granlund, and Tommi Mikkonen. "Continuous design control for machine learning in certified medical systems." Software Quality Journal 31, no. 2 (2023): 307- 333. https://doi.org/10.1007/s11219-022-09601-5 [10] Berezsky, Oleh, Oleh Pitsun, Grygory Melnyk, Yuriy Batko, Bohdan Derysh, and Petro Liashchynskyi. "Application Of MLOps Practices For Biomedical Image Classification." In IDDM, pp. 69-77. 2022. [11] Berezsky Oleh, Pitsun Oleh, Grygoriy Melnyk, Tamara Datsko, Ivan Izonin, and Bohdan Derysh. "An Approach toward Automatic Specifics Diagnosis of Breast Cancer Based on an Immunohistochemical Image." Journal of Imaging 9, no. 1 (2023): 12. https://doi.org/10.3390%2Fjimaging9010012 [12] Berezsky, Oleh, Pitsun Oleh, Bohdan Derysh, Ihor Pazdriy, Grygory Melnyk, and Yuriy Batko. "Automatic segmentation of immunohistochemical images based on U-net architecture." In 2021 IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT), vol. 1, pp. 29-32. IEEE, 2021. https://doi.org/10.1109/CSIT52700.2021.9648669 [13] Nyckel URL: https://www.nyckel.com/ [14] ximilar URL: https://www.ximilar.com/ [15] hasty URL: https://hasty.cloudfactory.com/ [16] levity URL: https://levity.ai/ [17] Alexandrova, Sonya, Zachary Tatlock, and Maya Cakmak. "RoboFlow: A flow-based visual programming language for mobile manipulation tasks." In 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 5537-5544. IEEE, 2015. https://doi.org/10.1109/ICRA.2015.7139973 [18] Ciaglia, Floriana, Francesco Saverio Zuppichini, Paul Guerrie, Mark McQuade, and Jacob Solawetz. "Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark." arXiv preprint arXiv:2211.13523 (2022). https://doi.org/10.48550/arXiv.2211.13523 [19] Moreschini, S., Lomio, F., Hästbacka, D., & Taibi, D. (2022, March). MLOps for evolvable AI intensive software systems. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 1293-1294). IEEE. https://doi.org/10.1109/SANER53432.2022.00155 [20] Kreuzberger, Dominik, Niklas Kühl, and Sebastian Hirschl. "Machine learning operations (mlops): Overview, definition, and architecture." IEEE Access (2023). https://doi.org/10.1109/ACCESS.2023.3262138 [21] Kumara, Indika, Rowan Arts, Dario Di Nucci, Willem Jan Van Den Heuvel, and Damian Andrew Tamburri. "Requirements and Reference Architecture for MLOps: Insights from Industry." (2022). [22] Recupito, Gilberto, Fabiano Pecorelli, Gemma Catolino, Sergio Moreschini, Dario Di Nucci, Fabio Palomba, and Damian A. Tamburri. "A multivocal literature review of mlops tools and features." In 2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 84-91. IEEE, 2022.