Human-in-the-Loop Approach Based on MRI and ECG for Healthcare Diagnosis Pavlo Radiuka, Oleksii Kovalchuka, Vitalii Slobodziana, Eduard Manziuka, Oleksander Barmaka, Iurii Krakb,c a Khmelnytskyi National University, 11, Institutes str., Khmelnytskyi, 29016, Ukraine b Taras Shevchenko National University of Kyiv, 64/13, Volodymyrska str., Kyiv, 01601, Ukraine c Glushkov Cybernetics Institute, 40, Glushkov ave., Kyiv, 03187, Ukraine Abstract The presented study investigates a human-centric approach to implementing human-in- the-loop models for healthcare diagnostics. The following tasks were considered and addressed in this work: a) identify the features necessary for future healthcare diagnosis based on electrocardiogram signals in the human-in-the-loop model: P, T-peaks, QRS- complex, PQ and ST segments, and b) detect inflammatory processes in the heart muscle (myocardium) based on cardiac magnetic resonance imaging. As a result of our investigation, a novel approach was proposed for embedding (integrating) clinical knowledge about the nature of these phenomena into the electrocardiogram signal and magnetic resonance imaging. Domain knowledge about the sample’s nature is encoded similarly to the input information. Moreover, the convolution operation within our approach serves as an embedding mechanism. The results presented in the article are a starting point for using the models obtained by the proposed approach (human-in-the-loop models) for classification problems using deep learning and convolutional neural networks. Also, visual analysis shows the proposed approaches’ ability to solve practical clinical problems. It also ensures transparent interpretation of the obtained results as the human-in-the-loop model, which, in turn, is built according to the human-centric approach. Overall, our contribution allows the implementation of a scheme for obtaining artificial intelligence solutions based on the principles of trust in them. Keywords 1 Human-centric approach, human-in-the-loop, trustworthiness in artificial intelligence, healthcare diagnosis, electrocardiogram, magnetic resonance imaging, autoencoder 1. Introduction The acceleration of the development of information systems is accompanied by expanding the spheres of practical use. Information systems are taking on a new form in connection with integrating intelligent systems, which have the appearance of relatively simple algorithmic decision-making systems and artificial intelligence (AI) systems. Such systems are used in various subject domains, such as industry, education, transport, health care, and so forth [1]. Intelligent systems have considerably changed the life of society, processing a vast amount of data that is constantly growing. Intelligent systems demonstrate their effectiveness in applied tasks but, at the same time, become more complicated. The complexity of intelligent systems leads to their opacity in decision-making, which is an essential parameter of their introduction, especially in areas of critical use. Decisions made by AI IDDM-2022: 5th International Conference on Informatics & Data-Driven Medicine, November 18–20, 2022, Lyon, France EMAIL: radiukpavlo@gmail.com (P. Radiuk); losha.kovalchyk1998@gmail.com (O. Kovalchuk); vitalii.slobodzian@gmail.com (V. Slobodzian); eduard.em.km@gmail.com (E. Manziuk); аlexander.barmak@gmail.com (O. Barmak); yuri.krak@gmail.com (I. Krak) ORCID: 0000-0003-3609-112X (P. Radiuk); 0000-0001-9828-0941 (O. Kovalchuk); 0000-0001-8897-0869 (V. Slobodzian); 0000-0002- 7310-2126 (E. Manziuk); 0000-0003-0739-9678 (O. Barmak); 0000-0002-8043-0785 (I. Krak) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) systems depend on many parameters and are difficult to interpret. AI systems take the form of a black box in which the decision-making mechanisms are opaque, incomprehensible, possibly incorrect, and potentially dangerous. Today, there are known cases of incorrect and dangerous decisions made by artificial intelligence [2]. For example, accidents caused by autopilot cars hitting pedestrians are biased towards a particular category of people in hiring and others. Such examples indicate the need to develop AI systems that meet specific requirements for building socially responsible intelligent systems. This suggests that the simple application of artificial intelligence based only on technical performance characteristics in classification or clustering tasks is currently insufficient. It is necessary to expand the range of AI systems and specify them according to the specifics of practical applications. Such manifestations of intelligent systems in tasks of practical importance necessitated the development of normative documents on the regulation of requirements and limitations of using AI to ensure safety, security, prevention of harm, etc. Guidelines for the development and use of AI are proposed. The Alliance for Artificial Intelligence of the European Union proposed ethical principles and frameworks for the management, development, and use of AI [3]. The General Data Protection Regulation (GDPR) [4] was adopted, within which the user’s right to receive an explanation regarding decisions obtained thanks to AI systems or generated by such systems autonomously is recognised. Several requirements have been formed that AI must meet, including fairness, compliance with legislation, transparency in decision-making, interpretability, confidentiality, accountability, and several others. The combination of these requirements allows the application of AI systems that are more secure and dependable. Today, significant attention is paid to AI, whose decisions are transparent, explanatory, and interpretable. The practical application of such systems must be controlled; people must clearly understand what solutions the system can generate, what impact they have, what possible consequences are generated from the generated solutions and their limitations. AI healthcare systems belong to the field of practical use, concerning which all the necessary safety, reliability, and criticality requirements are applied. AI in the healthcare field is undoubtedly necessary and vital [5] and can significantly affect human health, improve work processes, and improve the quality and efficiency of medical care. However, the application of AI is not limited to improving efficiency within specific tasks. The healthcare domain is an area of critical decision-making responsibility. In addition, AI must be able to work with data that has inaccuracy, is incomplete, has data gaps, is incorrect, erroneous, limited, and insufficient. It is not always possible to use AI systems, which by their characteristics, correspond to the black box, although they give the best results in terms of quality indicators. AI systems provide reliable solutions and can be practically applied [6]. Although AI systems are optimistic about changes in the healthcare field, there are significant caveats regarding their use in the healthcare field in responsible decisions. These issues follow from the following circumstances: • AI systems at today’s level of development generate a certain number of incorrect decisions, with a general, high-quality level of the received decisions. • The developed systems do not provide an opportunity to determine which decision was wrong in each case but give a general assessment of the quality of a set of decisions. • AI systems that meet the characteristics of a black box do not make it possible to determine based on which features they reached such a specific decision. The presented study proposes the use of AI in diagnosing clinical diseases, considering the human- centric approach and the human-in-the-loop model. This application of AI intelligence allows the transformation of the information field of its practical use. Integrated transformation bridges the gap between theoretical AI research and its practical application in the healthcare field with the development of medical AI. The objective circumstances of practical application necessitate the creation of AI systems that consider ethical aspects, comply with legal regulations, and build trust. Consequently, the contribution of this work is presented in the following aspects. • Implementation of human-centric approach and human-in-the-loop models for healthcare diagnostics for analysis tasks: a) electrocardiogram (ECG) signals to intend to identify features necessary for further diagnosis: P, T-peaks, QRS-complex, PQ segments and ST; and b) MRI images of the heart for detected inflammatory processes in the heart muscle (myocardium). • An approach for embedding (integration) human knowledge about the nature of these phenomena into the ECG signal and MRI image; as an embedding mechanism, it is proposed to use the convolution operation. • Visual analysis of the ability of the proposed approaches to solve the tasks. The structure of the article is as follows: in section 2, an overview of sources that consider the set of requirements for trust in AI systems is given; in section 3, the approaches proposed by the authors to the use of AI in the tasks of healthcare diagnostics are given, considering the human-in-the-loop model and human-centric approach; Chapter 4 presents the results of research on the integration of doctors’ knowledge into the process of obtaining AI solutions. 2. Related work The problem of trust in AI systems became relevant due to the acceleration of practical implementation and revealed new aspects that need to be paid attention to, which in some places become the main reasons for the impossibility of using AI. New aspects of the use of AI not only go beyond the technical difficulties of building AI but also create new directions and varieties of AI. In healthcare, trust has two important directions [7]: social and technical. Socially, trust is an essential aspect of patient-doctor interaction. The patient comes to the doctor in a state different from ordinary life functionality and is vulnerable. In this state, the patient cannot help himself and is forced to seek external help. On the other hand, patients will use the services and follow the instructions according to the doctor’s recommendation if there is a trusting relationship with the doctor based on the level of maintenance. Trust in the doctor is not the last factor in the success of the treatment that has a therapeutic effect. However, the specified aspect of trust relates to the medical side of the patient-doctor interaction. The use of AI systems introduces a new aspect of patient-doctor interaction that can undermine general trust. AI systems can yield decent results, but their level of trust is low, so they cannot be considered dependable. In those circumstances, patients will be forced to rely on AI systems to obtain final decisions of medical importance, which may lead to a decrease in trust in the clinical practice of patient- doctor interaction [8]. Several studies have been devoted to creating AI systems that meet the requirements of trust [9]- [11]. Studies [12] and [13] are dedicated to studying the concept of trust based on the definition of a set of ethical principles that AI can be considered trustworthy. Proposed metrics for assessing trust in AI using the explainability of the expert-in-the-loop system [14]. The metric defines the difference between the explanations provided by the AI system and those obtained thanks to experts based on their reasoning and experience. The metric can be applied to diverse groups of experts to determine their confidence level in their recommendations. It allows for reducing the concept of confidence to a quantitative number by determining the distance between AI explanations and expert explanations. In this case, trust is reduced to explanation and equates to these two concepts. According to this approach, trust is entirely determined by the explainability of the decision. In recent years, there has been growing concern in the scientific community about the potential dangers of black-box algorithms used in various fields of human activity, including healthcare diagnostics. This observation narrows the practical application in those medical aspects when the trust and transparency of the obtained decisions are not essential and critical [15]. Since AI systems still give high results, their use is justified, but this is not enough for the normative service of doctors. The potential applications of AI become limited due to the low level of trust. As stated in work [16], one should accept AI as a black box, but it is necessary to gain experience in the interaction of doctors with bioinformatics to acquire the necessary skills and expertise. This will improve the quality of medical image analysis. Another way to address the black box issue is transforming the neural network’s complex structure into an understandable linear polynomial form [17], which allows reliable interpretation of the result of healthcare classification. However, this way of implementing medical AI requires highly qualified doctors to acquire specific competencies in bioinformatics. So, such approaches may limit and slow the spread of AI and might be considered too costly. An essential aspect of implementing medical AI is the availability of quality data for establishing decision-making models. In many cases, quality is the determining factor for developing effective and explainable AI, like cardiac MRI measurements and interpretation [18]. However, the availability of such data can be limited for valid reasons, and significant difficulties can accompany the collection of quality data. A significant amount of data in the healthcare field for training neural networks is focused on images. Synthesis algorithms are used to expand such data to obtain training data [19]. Examples of such data are clinical imaging data [20], electrocardiogram signals [21], electronic medical records [22], and so forth. In this aspect, trust in AI is recognised as the data generation necessary for distribution with compliance with restrictions and rules. The lack of data and its limitations is another area of development of reliable medical AI [15]. In particular, neural networks for their training require a large amount of clinical data to obtain high results. The prospects for the use of medical AI are promising, and today AI is used to predict caries on images and the development of reliable AI, which allows the explanation of the reasons according to which the prediction was made [23], [24]. However, in these studies, the capabilities of AI are limited by the need to trust AI and explain the reasons for the prediction. To improve the explainability of AI, systems for evaluating the results of prediction on images are proposed, involving a person as an expert in the prediction process [25]. In order to achieve the required results of trust and reliability of AI, three research areas have been identified that must be combined to obtain the required result. According to the authors, the combination of neural networks and their predictions, graphical causal models and methods of verification and explanation is the path that plays a transformative role in bridging the gap between theoretical research and the practical application of AI in clinical medicine [26]. Many studies reveal the need to develop medical AI with a set of characteristics that can be considered dependable. According to the conducted analysis, it can be considered that the urgent need is not so much the result of forecasting, but the feature set according to which the AI generated the forecast. The form of obtaining the required features can be presented in different forms. AI can independently indicate those features that are decisive in the obtained forecasts. Furthermore, the doctor can provide AI with a feature set that, according to clinical recommendations, play a decisive role in healthcare diagnosis. In this case, the AI should be able to focus computational algorithms for obtaining decisions on the given feature set. At the same time, other available features are also considered by AI with the necessary weighting of the influence and the difference in values. 3. Methods and materials To implement the principles of trust in the results of healthcare diagnostics obtained thanks to AI and within the framework of the human-in-the-loop model and human-centric approach, we propose to integrate the knowledge of doctors about these data into the input data of medical research (ECG signal and MRI image). The proposed approach may allow modelling and classifying features that are understandable for doctors and enable them to interpret the result obtained by the AI systems. As of today, the most prominent results in medical image processing have been obtained using deep learning methods and means, in particular, convolutional neural networks (CNNs) [27]. The convolution operation, when applied to two functions, f and g, returns a third function that corresponds to the cross-correlation functions 𝑓(𝑥) and 𝑔(−𝑥). The operation of convolution can be interpreted as the “similarity” of one function to a mirrored and shifted copy of another [28]. The concept of convolution is generalised for functions defined on arbitrary measurement spaces and can be considered a special integral transformation. In the discrete case, the convolution corresponds to the sum f of values with coefficients corresponding to the shifted values g and defines as (𝑓 ∗ 𝑔)(𝑥) = 𝑓(1)𝑔(𝑥 − 1) + 𝑓(2)𝑔(𝑥 − 2) + 𝑓(3)𝑔(𝑥 − 3)+. .. (1) The critical point in (1) is that a square (rectangular) impulse convolution (rectangular function, rectangular impulse, rectangular window) is a triangular (trapezoidal) impulse (function) [29]. That is, placed synchronously, the input and rectangular signal, because of convolution, give a signal with more pronounced peaks (known as features) with the input signal. We suggest using the given convolution property as a mechanism for integrating knowledge about the nature of the signal (image). It will look like integrating the knowledge about the ECG signal and the MRI image shown below. 3.1. Integration of knowledge into the ECG signal Let us consider what knowledge of the subject area (domain) can be for the ECG signal (Fig. 1). (a) (b) Figure 1: An illustrative sample of an ECG signal: (a) of a regular cardiac cycle; (b) of a cardiac cycle with individual feature knowledge implemented for an ECG signal [30] Identifying feature points for ECG signals usually involves identifying the onset and offset of the P wave, the QRS complex, and the T wave. The higher amplitude of the QRS complex is frequently easily identified. Distinguishing P and T waves is tricky because their amplitudes are lower and sometimes accompanied by noise. Delineation of feature points (reference points) allows for more information, such as intervals and amplitudes, and provides essential information for further ECG analysis. Domain knowledge is encoded in the form of the input signal. Since ECG signals (𝑠1 , 𝑠2 , … , 𝑠𝑛 ) are one-dimensional time series data, domain knowledge concerning possible pathologies in a medical image is encoded similarly. Alternatively, knowledge of the ECG domain can be presented in Fig. 1b). For example, knowledge about the P wave is encoded as follows ℎ𝑖 , … , ℎ𝑖 , 0, … ). (… ,0, ⏟ (2) 𝑤𝑖 The knowledge encoded as (2) can be represented as a rectangular pulse. Similarly, knowledge about the QRS complex and the T wave is encoded. 3.2. Integrating knowledge into MRI imaging This subsection presents a trust AI model applicable to interpretive cardiovascular segmentation using multimodal MRI data. Healthcare professionals rarely use a multimodal approach in practice because labelling these many images is highly laborious and time-consuming. Meanwhile, annotations for images with larger slice thicknesses are more common and readily available, while images with thinner slice thicknesses are not. Thus, in this work, we propose a thickness-free multimodal image segmentation model that can be applied to both thick-slice and thin-slice images but only needs to annotate the thick-slice images during the training procedure. Let us denote a set of images with thick slices as 𝐼𝐶 = {(𝑥𝑐 , 𝑦𝑐 )|𝑥𝑐 ∈ ℝ𝐻×𝑊×3 , 𝑦𝑐 ∈ ℝ𝐻×𝑊 } and with thin slices as 𝐼𝑃 = {𝑥𝑝 |𝑥𝑝 ∈ ℝ𝐻×𝑊×3 }. The proposed model uses unlabeled thin-slice images 𝐼𝑃 to minimise the gap in model performance between thick and thin-slice images. In other words, this approach applies domain knowledge from one modality to image segmentation from another modality, resulting in trustful AI. Here, a CNN architecture of the encoder-decoder type was used to segment medical images. We used the vanilla U-Net architecture and replaced the original coder with a pre-trained ResNet-50 [29]. ResNet-50 was utilised as it better represents the features of the input images. The proposed decoder uses subpixel convolution to construct segmentation results. Subpixel convolution is defined as 𝐶𝑜𝑛𝑣𝑆𝑢𝑏𝐿 = 𝑆𝑃(𝑊𝐿 ∗ 𝐹𝐿−1 + 𝑏𝐿 ), (3) where operator 𝑆𝑃(∙) transforms a matrix of 𝐻 × 𝑊 × 𝐷 × 𝑟 2 into a matrix of 𝑟𝐻 × 𝑊 × 𝐷, 𝑟 is a scale factor for H, 𝐹 𝐿−1 and 𝐹 𝐿 stand for the input and output feature maps, 𝑊𝐿 and 𝑏𝐿 represent parameters of the sub-pixel convolution operators for layer 𝐿. A multimodal procedure was used to train the CNN with (3), which provides joint optimisation for both types of images. The objective function of the proposed multimodal training is defined as ℒ(𝑥𝑐 , 𝑥𝑝 ) = ℒ𝑐 (𝑞𝑐 , 𝑦𝑐 ) + ℓℒ𝑝 (𝑞𝑝 ), (4) where ℓ represents a hyperparameter for weighting the impact of ℒ𝑐 and ℒ𝑝 , 𝑞𝑐 and 𝑞𝑝 stand for predictions of the segmentation probability maps of 𝑟𝐻 × 𝑊 × 𝐷 for images with thick and thin slices, respectively. For (4), the cross-entropy loss is determined as 𝐻𝑊 𝐷 1 ℒ𝑐 (𝑞𝑐 , 𝑦𝑐 ) = − ∑ ∑ 𝑦𝑐𝑛,𝑑 ln 𝑞𝑐𝑛,𝑑 . 𝐻𝑊𝐷 𝑛=1 𝑑=1 In the case of images with thin slices, ℒ𝑝 pushes the features away from the decision boundary of the feature distribution of thick-slice images, obtaining a flattening of the distribution. 3.3. Evaluation of the quality of the obtained results Applying only a qualitative assessment of the obtained results at this research stage is possible. The purpose of these evaluations is to prove the capability of the proposed approaches for use in the given tasks. It is also proposed to visually evaluate changes in the signal with integrated knowledge concerning the input signal. Analyse how these changes affected the feature points. For the task of MRI analysis, the segmentation quality of a network trained with multimodal datasets is evaluated through the Dice coefficient. 2TP 𝐷𝑖𝑐𝑒 = , (5) 2TP + FP + FN where, for the segmentation task, TP stands for true positive, TN – true negative, FP – false positive, and FN – false negative cases. Considering the relationship between CNN performance and input samples remains unclear, a multi- layer perceptron was used to pick decomposed samples and their corresponding Dice scores for whole- space estimation. Such an approach can provide insight into Dice scores for individual regions of interest in latent space where no data are available. Consequently, it becomes possible to obtain information about the relationships between samples and their predictive ability by analysing the characteristics of samples in the hidden space. As a result, we can achieve an elevated level of trustfulness in the CNN model. 4. Results and discussion 4.1. Analysis of the ECG signal with integrated knowledge Several experiments were conducted to evaluate the proposed mechanisms for integrating knowledge about a signal into a signal. The results of the experiments showed the ability of the proposed approach to solve the following tasks: 1. Clearer selection of signal features (R, P, T-peaks, PR-segment, ST-segment, P, T-waves). 2. Cleaning of «noise» in the signal for a more straightforward interpretation of the behaviour of P, T-waves. The results of the conducted research can be visually evaluated in Fig. 2. Input ECG signal Knowledge Convolutions signal Figure 2: Input ECG signal (P wave fragment), signal knowledge, systolic signal The picture shows a P-wave fragment. Applying the convolution operation to the given fragment resulted in a more apparent expression of the P-peak and wave behaviour. The following steps were taken to incorporate signal knowledge into the input signal. To incorporate knowledge, we need to match that knowledge with the corresponding ECG signals; for each cardiac cycle of the ECG, we use the position of the peak of the R wave as a reference point for matching knowledge; to find the peaks, we used the approach based on Shannon’s entropy [31], as the one that gave the best results. The results of such incorporation are shown in Fig. 3. Figure 3: Detection of R-peaks in the ECG signal using Shannon entropy Since ECG signals (𝑠1 , 𝑠2 , … , 𝑠𝑛 ) are one-dimensional time series data, we encode knowledge in the same form; additional three data channels (knowledge of P, R and T waves) are added to the primary input ECG signal (Fig. 4). Figure 4: Channels of ECG signal and knowledge about ECG signal The process of levelling knowledge includes three stages: 1. Alignment of the central point of the rectangular wave R with the identified reference point of the R peak. 2. Displacement of the control point of the R peak to the left by a fixed length (from the central point of the rectangular P-wave). 3. A shift of the control point of the R peak to the right by a fixed length (from the central point of the rectangular T-wave). For domain knowledge, these three types of knowledge are encoded in the ECG signal data (channel0) as follows: 𝑐ℎ𝑎𝑛𝑛𝑒𝑙0 ⋯ 𝑠𝑘 𝑠𝑘+1 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ 𝑠𝑘+𝑚−1 𝑠𝑘+𝑚 𝑐ℎ𝑎𝑛𝑛𝑒𝑙1 ⋯ 0 ℎ𝑃𝑖 ⋯ ℎ𝑃𝑖 0 0 0 0 0 0 0 0 0 ⋯ 𝑐ℎ𝑎𝑛𝑛𝑒𝑙2 ⋯ 0 0 0 0 0 ℎ𝑅𝑖 ⋯ ℎ𝑅𝑖 0 0 0 0 0 ⋯ 𝑐ℎ𝑎𝑛𝑛𝑒𝑙3 ⋯ 0 0 0 0 0 0 0 0 0 ℎ𝑇𝑖 ⋯ ℎ𝑇𝑖 0 ⋯ After the encoding and knowledge matching is complete, we include this data as input to a neural network model of a hidden convolutional layer encoder-decoder [30]. The results obtained by an encoder are presented in Fig. 5. Figure 5: The output layer of the encoder-decoder neural network with a hidden convolutional layer; red lines are forecasting According to Fig. 6, as a result of the operation of the encoder-decoder neural network, the PQ and ST segments and the width of the QRS complex are quite successfully selected. The selection of P, R, and T peaks requires certain postprocessing, illustrated in Fig. 6. Figure 6: Postprocessing of the output layer of the encoder-decoder with the implemented knowledge The P, R, and T peaks selected from the input image are shown in Fig. 7. Figure 7: The result of the determination of P, R, and T peaks with the implemented knowledge As can be seen from Fig. 5-7, the signal convoluted by the encoder-decoder neural network allows extracting the necessary information from the input ECG signal reliably: P and T peaks, QRS-complex, PQ and ST segments. The approach will not work only on signal sections where the R-peak is not detected. It is not critical because the specified areas are areas with artefacts, and they are removed from the analysis as they do not contain the necessary information. 4.2. Cardiac MRI studies with integrated knowledge The dataset used for the experiments contained 1890 cardiac MRI samples excluded from 136 patients. The result of short-axis stack segmentation during the cardiac cycle with implemented domain knowledge is presented in Fig. 8. As a result of computational experiments, it was found that the total percentage of unsuccessful segmentations obtained by the AI system reached 1.5% (that is, 28 unsuccessfully segmented images out of 1890). According to the domain knowledge, almost all failures were caused either by congenital heart diseases, such as a ventricular septal defect (Fig. 8a) or by visual artefacts and technical problems that affected image quality. At once, in 43 samples out of 1,890 (2.3%), segmentation errors were caused by a poor image of the apex of the heart (Fig. 8b). The analysis of Dice score (5) demonstrated a decent correspondence between the CNN and manual LV and RV contours in both the internal and external test cohorts. The final values of the Dice coefficient for the internal cohort were obtained at 83,4-85,1% in the LV, while in the RV – 82,7-84,9%. Meanwhile, for the external cohort, the CNN model achieved 82,8-83% in the LV and 80,4-83,1% in the RV. (a) (b) Figure 8: Instances of successful and unsuccessful segmentation by the AI system: (a) significant insufficiency due to congenital heart defect causing widening of the left ventricular (LV) contours in the right ventricular (RV) (red segment); (b) slight insufficiency at the apex, where the RV was incorrectly labelled as LV (red segment); red, green, blue and yellow ovals indicate the selection by the healthcare professionals of endocardial LV, epicardial LV, endocardial RV and epicardial RV lineaments The analysis of the Dice coefficient demonstrated promising results for automatic segmentation conducted using AI with our human-in-the-loop approach. It is worth noting the constant differences in the automatic segmentation of the scan-re-scan cohort, for example, the exclusion of parts of the outflow tract of the RV (Fig. 8b). While this sequence maintained excellent repeatability, the Dice coefficient took smaller values (82,7-84,9% for the internal cohort and 80,4-83,1% for the external cohort). Despite the promising results of the proposed human-in-the-loop approach, it has a limitation concerning the need for up-to-date databases for ECG signals and MRI, which contain more significant variability for a broader range of cardiac pathologies. Specifically for ECG, our approach based on the individual feature knowledge highly depends on R-wave peak detection when matching with ECG signals. As a result, automated delimitation may produce systematic errors in T-waves because the autoencoder predicts ups and downs as independent waves for very long T waves. For MRI, our approach fails to predict the junction region between the third and fourth ventricles because it is too small to be distinguished. In sum, the proposed human-in-the-loop approach is subjective to the domain knowledge and currently remains a proof of concept. The approach’s performance might be improved by applying intelligent data techniques, such as further partial observation in large databases without annotations or through realistic simulations of ECG and MRI samples. 5. Conclusions and Future work This study proposes a novel human-centric approach to healthcare diagnostics. Our contribution is based on resolving the following tasks: a) ECG signals to identify the features necessary for future diagnosis in the human-in-the-loop model: P and T peaks, QRS-complex, PQ and ST segments, and b) cardiac MRI for detected inflammatory processes in the heart muscle (myocardium). An approach is proposed for embedding human knowledge about the nature of these phenomena into a signal or an image. It is proposed to use the convolution operation as an embedding mechanism. For the problems under consideration, knowledge about the nature of the signal and image is encoded in the same form as the input information. The visual analysis revealed the ability of the proposed approaches to solve the problems under investigation. Moreover, experimental results on MRI demonstrated a decent correspondence between the CNN and manual LV and RV contours in both the internal and external test cohorts. Despite the promising results of the proposed human-in-the-loop approach, it has a limitation concerning the need for up-to-date databases for ECG signals and MRI, which contain more significant variability for a broader range of cardiac pathologies. In addition, our approach is subjective to the domain knowledge and remains proof of concept for now. Further research will be directed to using models from the approach given in the article (human-in-the- loop models) for classification problems using convolutional neural networks and deep learning. A unique feature will be that such technology allows transparent interpretation of the obtained results in terms of the human-in-the-loop model, which, in turn, is built according to the human-centric approach. It might allow the implementation of a scheme for obtaining an AI solution based on the principles of trust. 6. References [1] Z. Sun et al., A review of Earth artificial intelligence, Computers & Geosciences, vol. 159, p. 105034, Feb. 2022, doi:10.1016/j.cageo.2022.105034. [2] Y. Duan, J. S. Edwards, and Y. K. Dwivedi, Artificial intelligence for decision making in the era of Big Data – evolution, challenges and research agenda, International Journal of Information Management, vol. 48, pp. 63–71, Oct. 2019, doi:10.1016/j.ijinfomgt.2019.01.021. [3] N. A. Smuha, The EU approach to ethics guidelines for trustworthy artificial intelligence, Computer Law Review International, vol. 20, no. 4, pp. 97–106, Aug. 2019, doi:10.9785/cri-2019-200402. [4] R. Chatila and J. C. Havens, The IEEE global initiative on ethics of autonomous and intelligent systems, in Robotics and Well-Being, vol. 95, M. I. Aldinhas Ferreira, J. Silva Sequeira, G. Singh Virk, M. O. Tokhi, and E. E. Kadar, Eds. Cham: Springer International Publishing, 2019, pp. 11– 16. doi:10.1007/978-3-030-12524-0_2. [5] S. Secinaro, D. Calandra, A. Secinaro, V. Muthurangu, and P. Biancone, The role of artificial intelligence in healthcare: A structured literature review, BMC Med Inform Decis Mak, vol. 21, no. 1, p. 125, Apr. 2021, doi:10.1186/s12911-021-01488-9. [6] S. Keel, J. Wu, P. Y. Lee, J. Scheetz, and M. He, Visualising deep learning models for the detection of referable diabetic retinopathy and glaucoma, JAMA Ophthalmology, vol. 137, no. 3, pp. 288– 292, Mar. 2019, doi:10.1001/jamaophthalmol.2018.6035. [7] O. Asan and A. Choudhury, Research trends in artificial intelligence applications in human factors health care: Mapping review, JMIR Human Factors, vol. 8, no. 2, p. e28236, Jun. 2021, doi:10.2196/28236. [8] J. J. Hatherley, Limits of trust in medical AI, Journal of Medical Ethics, vol. 46, no. 7, pp. 478– 481, Jul. 2020, doi:10.1136/medethics-2019-105935. [9] O. V. Barmak, Yu. V. Krak, and E. Manziuk, Characteristics for choice of models in the ansables classification, Problems in Programming, vol. 2–3, pp. 171–179, Jan. 2018, doi:10.15407/pp2018.02.171. [10] E. Manziuk, Approach to creating an ensemble on a hierarchy of clusters using model decisions correlation, Electrotechnical Review, vol. 1, no. 9, pp. 110–115, Sep. 2020, doi:10.15199/48.2020.09.23. [11] A. Singh, S. Sengupta, and V. Lakshminarayanan, Explainable deep learning models in medical image analysis, Journal of Imaging, vol. 6, no. 6, Art. no. 6, Jun. 2020, doi:10.3390/jimaging6060052. [12] E. Manziuk, O. Barmak, I. Krak, O. Mazurets, and T. Skrypnyk, Formal model of trustworthy artificial intelligence based on standardisation, in Proceedings of the 2nd International Workshop on Intelligent Information Technologies & Systems of Information Security (IntelITSIS-2021), Khmelnytskyi, Ukraine, March 24–26, 2021, 2021, vol. 2853, pp. 190–197. [Online]. Available: http://ceur-ws.org/Vol-2853/short18.pdf [13] E. Manziuk, I. Krak, O. Barmak, O. Mazurets, V. Kuznetsov, and O. Pylypiak, Structural alignment method of conceptual categories of ontology and formalised domain, in Proceedings of International Workshop of IT-professionals on Artificial Intelligence (ProfIT AI 2021), Kharkiv, Ukraine, September 20–21, 2021, Sep. 2021, vol. 3003, pp. 11–22. [Online]. Available: http://ceur- ws.org/Vol-3003/ [14] D. Kaur, S. Uslu, A. Durresi, S. Badve, and M. Dundar, Trustworthy explainability acceptance: A new metric to measure the trustworthiness of interpretable ai medical diagnostic systems, in Complex, Intelligent and Software Intensive Systems, Asan, Korea, July 1–3, 2021, 2021, vol. 278, pp. 35–46. doi:10.1007/978-3-030-79725-6_4. [15] J. M. Durán and K. R. Jongsma, Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI, Journal of Medical Ethics, vol. 47, no. 5, pp. 329–335, May 2021, doi:10.1136/medethics-2020-106820. [16] W. J. von Eschenbach, Transparency and the black box problem: Why we do not trust AI, Philos. Technol., vol. 34, no. 4, pp. 1607–1622, Dec. 2021, doi:10.1007/s13347-021-00477-0. [17] I. Izonin, R. Tkachenko, N. Kryvinska, P. Tkachenko, and M. Greguš ml., Multiple linear regression based on coefficients identification using non-iterative sgtm neural-like structure, in Advances in Computational Intelligence, Gran Canaria, Spain, June 12-14, 2019, 2019, vol. 11506, pp. 467–479. doi:10.1007/978-3-030-20521-8_39. [18] A. Janik, J. Dodd, G. Ifrim, K. Sankaran, and K. Curran, Interpretability of a deep learning model in the application of cardiac MRI segmentation with an ACDC challenge dataset, in Medical Imaging 2021: Image Processing, Feb. 2021, vol. 11596, pp. 861–872. doi:10.1117/12.2582227. [19] G. Yang, Q. Ye, and J. Xia, Unbox the black-box for the medical explainable AI via multimodal and multi-centre data fusion: A mini-review, two showcases and beyond, Information Fusion, vol. 77, pp. 29–52, Jan. 2022, doi:10.1016/j.inffus.2021.07.016. [20] D. B. Larson, D. C. Magnus, M. P. Lungren, N. H. Shah, and C. P. Langlotz, Ethics of using and sharing clinical imaging data for artificial intelligence: A proposed framework, Radiology, vol. 295, no. 3, pp. 675–682, Jun. 2020, doi:10.1148/radiol.2020192536. [21] Y.-Y. Jo et al., Explainable artificial intelligence to detect atrial fibrillation using electrocardiogram, International Journal of Cardiology, vol. 328, pp. 104–110, Apr. 2021, doi:10.1016/j.ijcard.2020.11.053. [22] J. Duell, X. Fan, B. Burnett, G. Aarts, and S.-M. Zhou, A comparison of explanations given by explainable artificial intelligence methods on analysing electronic health records, in 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), Jul. 2021, vol. 2021, pp. 1–4. doi:10.1109/BHI50953.2021.9508618. [23] N. Hasani et al., Trustworthy artificial intelligence in medical imaging, PET Clin, vol. 17, no. 1, pp. 1–12, Jan. 2022, doi:10.1016/j.cpet.2021.09.007. [24] J. Ma et al., Towards trustworthy AI in dentistry, J Dent Res, vol. 101, no. 11, pp. 1263–1268, Oct. 2022, doi:10.1177/00220345221106086. [25] D. Kaur, S. Uslu, and A. Durresi, Trustworthy AI explanations as an interface in medical diagnostic systems, in Advances in Network-Based Information Systems, Cham, 2022, vol. 526, pp. 119– 130. doi:10.1007/978-3-031-14314-4_12. [26] A. Holzinger et al., Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence, Information Fusion, vol. 79, pp. 263– 278, Mar. 2022, doi:10.1016/j.inffus.2021.10.007. [27] L. Wang et al., Trends in the application of deep learning networks in medical image analysis: Evolution between 2012 and 2020, European Journal of Radiology, vol. 146, p. 110069, Jan. 2022, doi:10.1016/j.ejrad.2021.110069. [28] X. Liang et al., ECG_SegNet: An ECG delineation model based on the encoder-decoder structure, Computers in Biology and Medicine, vol. 145, p. 105445, Jun. 2022, doi:10.1016/j.compbiomed.2022.105445. [29] I. Krak, O. Barmak, and P. Radiuk, Detection of early pneumonia on individual CT scans with dilated convolutions, in Proceedings of the 2nd International Workshop on Intelligent Information Technologies & Systems of Information Security (IntelITSIS-2021), Khmelnytskyi, Ukraine, March 24–26, 2021, 2021, vol. 2853, pp. 214–227. Accessed: May 09, 2021. [Online]. Available: http://ceur-ws.org/Vol-2853/ [30] J. Wang, R. Li, R. Li, and B. Fu, A knowledge-based deep learning method for ECG signal delineation, Future Generation Computer Systems, vol. 109, pp. 56–66, Aug. 2020, doi:10.1016/j.future.2020.02.068. [31] S. Modak, L. Y. Taha, and E. Abdel-Raheem, A novel method of QRS detection using time and amplitude thresholds with statistical false peak elimination, IEEE Access, vol. 9, pp. 46079–46092, 2021, doi:10.1109/ACCESS.2021.3067179.