1. Introduction

Designing XAI-based Computer-aided Diagnostic Systems: Operationalising User Research Methods

Elsa Oliveira

Cristiana Braga

Ana Sampaio

Tiago Oliveira

Filipe Soares

Luís Rosado

1 0 First Solutions - Sistemas de Informação , S.A., Rua Conselheiro Costa Braga, Matosinhos , Portugal 1 Fraunhofer Portugal AICOS , Rua Alfredo Allen 455/461, 4200-135, Porto , Portugal

AI technology has the potential to support humans' processes and tasks by augmenting human capabilities and efectiveness. Computer-aided systems have been implemented in healthcare mainly to support clinical decisions. As in other areas, the impact, complexity, and opacity of AI operations have led to the establishment of guidelines for trustworthy AI, which implies being understandable. This study describes the user research work carried out by a multidisciplinary team composed of ML engineers, design researchers, and medical experts, to inform the design of algorithms and user interfaces for two XAI-based clinical decision support tools targeted at Cervical cancer and Glaucoma screening. In particular, we sought to leverage and bridge individual and collective expertise to understand the context, decision-making processes and criteria, and values that frame the respective clinical decisions. The article describes how we operationalised the research activities with expert users and what strategies we followed for subsequent content analysis, ending with the sharing of lessons learned as valuable insights for other research teams interested in designing computer-aided diagnostic systems based on human-centred XAI approaches.

eol>Explainable AI Computer-aided detection Decision Support System Ophthalmology Glaucoma Cytology Cervical cancer Retinal Imaging Microscopy

1. Introduction

aged human-centred design methods to uncover what to explain, why, how, and for whom [ 4, 5, 6, 7 ]. We share Despite its potential, AI has struggled to be understand- a study of how we operationalised Human-Centred Deable. This requirement has been critical in several areas, sign (HCD) methods to inform the design of algorithms mainly in healthcare [ 1, 2 ], where AI can support clini- and user interfaces for two XAI-based clinical decision cal decisions. There has been consensus on the need to support tools for Cervical cancer and Glaucoma screenpromote accountable and trustworthy AI. The European ing. We were concerned with grasping medical experts’ Commission’s High-Level Expert Group on Artificial In- mental models and reasoning processes. While mental telligence (AI HLEG) says that whenever an AI system models are mental constructs that represent a distinct has a significant impact on people’s lives, it should be pos- possibility and derive a conclusion from them, reasoning sible to demand a suitable explanation of the AI system’s implies a process to derive a conclusion and depends on decision-making process [ 3 ]. These considerations have envisaging the possibilities (mental models) consistent led AI towards Explainable AI (XAI), which in turn lever- with a starting point [8]. So, to access the diagnosis’ reasoning and identify the decision-making data and the Elsa Oliveira, Cristiana Braga, Ana Sampaio, Tiago Oliveira, Filipe explanations structures to apply in the design of XAISoares and Luís Rosado. 2023. Designing XAI-based Computer-aided based clinical decision support tools, we needed to get Diagnostic Systems: Operationalising User Research Methods. Joint inside the diagnosis process with those who practice it Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, the medical experts. [9, 10] *ACuostrrraelsiap,o1n1dipnaggeasu. thor. This paper is structured into 6 sections. First section † These authors contributed equally. introduces the demand for XAI systems. Section 2 iden$ elsa.oliveira@aicos.fraunhofer.pt (E. Oliveira); tifies the objectives and design of the study, subdivided cristiana.braga@aicos.fraunhofer.pt (C. Braga); into three phases: contextualisation, elicitation, and valiana.sampaio@aicos.fraunhofer.pt (A. Sampaio); dation. Section 3 briefly introduces the medical context ifltiipage.os.ooalirveesi@raa@icfirosst.-fgraloubnahlo.cfoemr.p(tT(.FO.Sliovaerieras));; of Cervical cancer and Glaucoma, on which the work luis.rosado@aicos.fraunhofer.pt (L. Rosado) was focused. Section 4 describes how we operationalised 0000-0002-7105-9654 (E. Oliveira); 0000-0002-9384-2252 the research work focusing on the research activities (C. Braga); 0000-0003-1770-4429 (A. Sampaio); with the users and the analysis of the collected content. 0000-0002-2881-313X (F. Soares); 0000-0002-8060-831X (L. Rosado)

© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Finally, in section 5 we share lessons learned from this CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) study, and section 6 indicates the main conclusions and future work.

2.2. Elicitation The elicitation phase asked for more detail on the

2. Goals and Study Design decision-making process, decision-making data, and on the explanation structures that support it. To this end the As a multidisciplinary team, composed by Machine Learn- research team relied on referenced methods for mental ing (ML) engineers, design researchers, and medical ex- models’ elicitation [11], such as Semi-structured interperts, we sought to leverage and bridge individual and views, Observation, and Think-Aloud [12, 13], together collective expertise, especially from the medical area for with co-creation practices - that made use of imaging which the systems were conceived, to inform the de- data and other design materials to facilitate participants sign of algorithms and user interfaces for explainable in demonstrating the processes of analysis and decisiondecision support software targeted at Cervical cancer making. Nielsen refers Think-Aloud method as efective and Glaucoma screening. We based our study on the in giving us insights into users’ mental models regarding results of the user research activities, which aimed to a given task. The study also drew on the procedures of a understand the context, processes, and values that frame ifeld study method based on Observation and interviews clinical decisions in the above-mentioned health areas. to understand work practices and behaviors - Contextual The research process was guided by three phases: con- inquiry [14, 15]. Kim Salazar on Nielsen Norman Group textualisation, elicitation, and validation. The user re- website highlights the value of the contextual inquiry search methods applied in each phase (described below) method – to inquiry in context, which results in a colreturned a considerable amount of fieldwork materials, laborative interpretation between researchers and expert i.e., written, verbal and visual content, that researchers users about work practices and behaviors, with a more needed to analyse to enhance understanding of the data. in-depth understanding of experts’ reasoning. With these In analysing these data, we initially focused on codify- references in mind, the research team set up workshops ing what the clinicians said (written transcription) about to observe, and question medical experts analysing and their decision-making process. However, most of their deciding on clinical cases and from clinical data. explanations evoked visual aspects of the images. As we are non-experts, we quickly realised that we needed to 2.3. Validation match what the doctors were saying with the respective visual elements they were characterising in their expla- The validation stage allowed us to discuss, correct, comnations. For example, when clinicians explained that a plete, and refine with medical experts the research findcell was abnormal "because it had a halo around the nu- ings. Through co-creation design practices, researchers cleus", HCD and ML researchers could not understand designed group and individual workshops, in both remote what a halo was without a visual reference of that cell and in-person versions, in order to display the decisioncontaining a halo. The content analysis process based on making criteria within the respective structures, to be Transcription, Coding, and Systematisation, paved the discussed and easily edited and iterated in real-time. For way for the decision-making data and inherent reasoning some questions, we used A/B testing method for particistructure. pants to select the best option.

2.1. Contextualisation

As a first step, the researchers sought to become familiarised with the jargon, clinical practices, and decisionmaking processes used by health professionals. Initially, the researchers made more superficial research in online medical articles, also to acquire the basic knowledge to prepare for the interviews with medical experts. In fact, the contextualisation was accomplished mainly through semi-structured interviews which script applied Task Reflection and Retrospection methods, to prompt participants to reflect and describe their daily clinical tasks and diagnostic practices. The interviews gave us an overview of clinical practices, decision-making processes, values, and a quick window into participants’ mental models as they gave examples of clinical cases and how they decided on them.

3. Cervical cancer and Glaucoma

As mentioned in the introduction, this study addresses two distinct health areas, Anatomical Pathology and Ophthalmology, more specifically, Cervical cancer and Glaucoma. The main goal of the study was to design an explainable decision support software per area, both based on imaging screening, to be used by medical experts, and physicians in training.

Cervical cancer screening will mainly rely on cytological microscopic images, while Glaucoma screening on retinal images. The research team needed to go deep into each clinical practice to define the systems’ requirements.

Table 1 lists the main aspects that characterise the two health areas under study, considering the analysis that medical experts carry out per patient. This knowledge was built-up throughout the contextualisation and elici- nologists; in Glaucoma, 4 ophthalmologists specialised tation phases. While both areas share common aspects, in Glaucoma (glaucomatologists). Because of COVID-19, they also have some significant diferences. in particular the restrictions on in person group meetings and on normal access to hospitals and clinical settings, most user research activities took place remotely 4. Operationalisation of research through digital and online platforms. Through these, activities participants were able to access anonymised screening images, as well as other clinical data, to demonstrate their In this section, we describe how we operationalised the re- decision-making process, and reasoning, while observed search activities for the contextualisation, elicitation, and and questioned by the research team. validation phases. At the beginning of the study, all participants received a general informed consent that gave 4.1. Contextualisation interviews an overview of the user research agenda, being further provided a detailed informed consent per activity. The After some basic research through online medical articles, study counted with up to 5 participants per health area the research team draw the interview script addressed - in Cervical cancer, 3 cytopathologists and 2 cytotech- to the medical experts. The interview script aimed at: understanding clinical procedures, i.e., from the first diagnostic exams with diverse diagnosis: unconfirmed , consultation up to and after diagnosis, eliciting medi- borderline, early stage Glaucoma and advanced stage Glaucal experts’ values, i.e., their motivation for the medical coma. ifeld, examples of impactful cases, and, very important, To ensure participants’ unbiased decisions, we first medical expectations regarding the introduction of AI conducted individual workshops. Afterwards we ran a systems in the clinical practice. The semi-structured in- group workshop for both medical fields to help identify terviews were carried out remotely through video call consensual criteria and foster discussions around the by Microsoft Teams software. To note that in Cervical least consensual ones. cancer, the research team took advantage of the results of a previous and related study with cytopathologists and 4.2.2. Conducting individual workshops cytotechnicians [22, 20] that had conducted in-person semi-structured interviews with the same participants.

These interviews enabled us to understand the processes involved in cytological analysis, from the reception of the sample to the diagnosis.

Each workshop consisted of one main task: the partici

pant, as a medical expert, would assess imaging examinations in real-time and think aloud about their analysis.

This way, we could follow the assessment process and ask questions whenever needed to better understand it. More4.1.1. Interviews analysis over, we asked participants to annotate relevant findings whose appearance suggested a pathological change and Once the interviews were completed, we transcribed to provide the respective diagnosis classification. Exthem using oTranscribe software [23]. We then organ- perts in Cervical cancer classified cytological images acised the participants’ insights into the main themes raised cording to the Bethesda System’s convention. Experts during the interviews. in Glaucoma classified retinal images according to the four stages mentioned in section 4.2.1. Each participant 4.2. Workshops for eliciting diagnostic analysed from three to seven images consisting of liquidbased cytological samples (in Cervical cancer) or from processes four to sixteen images consisting of eight pairs of retinal Familiarised with both medical areas, we inspired our- images (in Glaucoma). Figure 1 shows a visual field of a selves in the contextual inquiry method to design the cytological sample with two cells annotated by a particiworkshops that would enable us to elicit experts’ diag- pant for their abnormality, and figure 2 a retinal image nostic assessment process. Our goal was to understand being analysed by a participant. what experts look at when they analyse a clinical case, Given the interdependence with other examinations in specifically, an imaging examination, and what criteria Glaucoma diagnosis, and the wider range of diagnostic they use to assess whether it is a pathological change. factors outside imaging data, Glaucoma workshops comprised an additional task. Each participant was asked to 4.2.1. Designing remote workshops list the steps of a usual medical procedure, from the first consultation to the diagnosis, describing other relevant The analysis of imaging examinations was a requirement examinations beyond the retinal image. Figure 3 shows for the diagnosis assessment, thus, we needed to observe the timeline filled in by 1 of the 4 participants considering experts analysing such images. Usually, we would visit the examinations performed throughout the analysis of the experts’ workplace and observe them in a real clini- a given clinical case (for example, José, 62 years old with cal setting. However, due to COVID-19, the workshops high intraocular pressure) - from the first consultation had to be remote, and so, we mimicked this observation to diagnosis, and, where necessary, in the patient followremotely. up. In the second part of the workshop, the participant

For Cervical cancer, we asked a cytologist to provide accessed anonymised eye examinations, corresponding us with images of liquid-based cytological samples. For to diferent diagnoses, from non-Glaucoma to advanced Glaucoma, we asked a glaucomatologist to provide us stage Glaucoma, to then select and analyse the most repwith retinal images. However, this was not all. From the resentative of a specific clinical case. Figure 2 is one of interviews, we learned that both medical fields comple- the retinal images that a participant has zoomed in and mented images’ interpretation with clinical data, which centred on the optical disc to show which image features we were attending to. But we also learned that Glaucoma- reflect the state of the eye’s structures and should theretous pathology was more complex to diagnose, because fore be considered as a criterion for decision making. glaucomatologists often had to integrate complementary diagnostic exams to reach a diagnosis.

With this in mind, we asked the glaucomatologist to provide us with a set of anonymised complementary 4.2.3. Transcribing and analysing in fact, we were able to verify this. Below (Figure 5) is an As we transcribed the workshops, it became evident that example of the same cytological field analysed by the five we should assign textual excerpts to image cut-outs, as Cervical cancer experts. Both annotations of suspicious most of experts’ explanations consisted of descriptions or abnormal cells and final classifications varied across of visible characteristics in the analysed images. Thus, analysts. While three experts classified the cytological mapping the object of analysis with the respective tran- field as ASC-US - an oficial classification for uncertainty scription enabled us to keep a correspondence between regarding an Abnormal Squamous Cell(s), two of the five what was said and what was being observed in the image experts classified the sample as LSIL - an oficial classi(Figure 4). ifcation comparable to ASC-US, but that assigns a Low

We did this for each participant. Almost all partici- grade of Intraepithelial Lesion to the Squamous cell(s). pants, from both medical fields, mentioned how the analysis and conclusions of some clinical cases were subjec4.2.4. Coding and systematisation tive. For instance, a glaucomatologist said: “Sometimes it’s not black and white, it’s grey”, meaning that the same Once we completed the transcripts, we created a cateexaminations and clinical data may lead experts to difer- gorisation matrix in Excel to code the data into a set ent decisions. This happens when the available elements of categories that constitute the building blocks of the for diagnosis are unclear, due to either image character- explanations, which allowed us to uncover a generic existics that hinder experts’ analysis (e.g. blurry image), or planation structure suitable for both use cases. We used to characteristics of the anatomical structures, which can the columns’ headings for the categories, and the rows be themselves confusing (when the same visual appear- to list the image that has triggered the explanation toance can be the result of diferent possible causes), which gether with the textual explanation (quote) and the set requires more tests and more time. of categorisable criteria (Figure 6).

Moreover, Cervical cancer experts highlighted the sub- As we went on with the codification, we iteratively jectivity intra- and inter-observer, explaining that not refined the categories into Key structure(s) examined, only the decision may vary between experts, as the same Key feature(s) concerned, Risk factor, Not Cervical canexpert could give a diferent classification to the same cer/Glaucoma factor, Doubt factor, Result attributed, and sample at diferent moments in time. Therefore, we ifnally, Key expression used by the expert. As the Excel’s sought this subjective dimension by comparing the analy- content increased, we identified that the criteria we filled sis of each participant to the same object of analysis, and in the categories would repeat. So, we created an Excel tab to list the criteria for each category as they emerged throughout the process. We ended up gathering a list of options that enabled us to streamline the filling-in process. To avoid subjectivity and/or interpretation errors in the process of codification, we organised an internal panel of three coders composed of researchers involved in these activities. All transcriptions were assigned to this panel, varying who would be the first coder. While the ifrst coder would codify the transcription from scratch, the following two would validate the first codification.

Taking the following quote as an example from Cervical cancer, we would describe as table 2 shows.

It has a darker nucleus, but with this resolution, when I try to zoom in, I can’t see the characteristics.

Figure 6 shows the variability of decision criteria by category that was raised throughout the analysis.

By the end of the analysis, we uncovered the most relevant criteria used by experts in each medical field to analyse and explain their decisions. And we could standardise that most of the explanations followed this structure:

The [Key feature concerned] of the [Key struc- suggest a plausible contradiction that prevented them ture(s) examined] is [Risk factor] OR [Not Cervical from providing a classification of which they were conficancer/Glaucoma]. dent.

e.g. Cervical cancer: The [colour] of the [nucleus] is [hyperchromatic]. Glaucoma: The [optic disc] has an [excavation greater than 0.4].

Moreover, we found that sentences stating a “not Cervical cancer/Glaucoma factor” or “doubt factor” could follow the Key feature concerned. Experts used them to

e.g., Cervical cancer: The [colour] of the [nucleus] is [hyperchromatic], however, [there are overlapping cells]. Glaucoma: The [optic disc] has an [excavation greater than 0.4], however, [is symmetric].

It has a darker nucleous (Part of a cell) nucleous (Nucleous) a darker nucleous (Colour intensity) a darker nucleous (Hyperchromasia) Not applicable but with this resolution,... I can’t see the characteristics (Image quality - Blurred) ... I can’t see the characteristics Insuficient / No classification

In these explanations, the experts point out a struc- with images of the prototype of the GUI together with its ture that he/she observed and characterise its aspect – content (the elicited decision-making criteria) listed in an reflecting a well-known and established risk factor in the editable text box, as shown in Figure 9. The content was domain knowledge, i.e., Cervical cytology: [hyperchro- discussed in real-time and, whenever necessary, easily matic], Glaucoma: [excavation greater than 0.4]. Never- edited. theless, the explanations also stress - through the contrastive expression ‘however’ - other characteristics that complement and contrast the first ones, i.e., Cervical cy- 5. Lessons Learned tology: [there are overlapping cells], Glaucoma: [is symmetric]. And this prevents the experts from discerning with certainty whether the first observed characteristic is an anomaly or not.

L1: Multidisciplinary team The design of XAI-based clinical decision support tools requires extensive knowledge from various domains. It is paramount that teams ensure an iterative communication that keeps all in the loop, i.e., design researchers, medical experts, ML En4.3. Workshops for validation gineers, etc. Let us highlight ML Engineers’ guidance Based on the results of previous user research activi- on the feasibility of the required functionalities, their ties, researchers designed validation workshops to: (i) support in defining the needed data, i.e., quantity and ensure no conflicting information among the knowledge quality, and the infrastructure for implementation. Many shared by each participant, (ii) remove possible impre- systems based on supervised learning require annotated cision from researchers’ interpretation and consequent data, analysed by experts in terms of elements needed to analysis outcomes, and (iii) get insights on a first ver- guide the models’ learning process. In the case of medical sion of the graphical user interface (GUI) designed from XAI systems, this requires close cooperation with clinical scratch to attend the elicited diagnostic processes. experts to ensure the annotation of the data instances objectively and uniformly. This way, ML Engineers guar4.3.1. Conducting group workshops antee that the final data set comprises cases suficiently representative of the diferent data properties that may The first validation session was carried out through the arise in practical scenarios.

Mural platform, from where participants accessed and L2. Contextual inquiry method as a basis for eliciinteracted (by editing, deleting, or adding content) with tation The contextual inquiry method inspired the study the list of decision-making criteria raised so far in or- to observe experts performing a task as close to realder to ensure their correctness and completeness. In ity as possible by having them verbalise their thoughts Glaucoma workshops, participants were also asked to while analysing imaging examinations and providing dianalyse several examinations, mainly retinal images, and agnostic classifications for them. We conclude that, when to choose the applicable criteria for each one from the in-loco sessions are not possible, researchers can simulate list elicited by researchers. We asked participants to po- the method remotely using digital and online platforms sition the selected criteria in one of three possibilities: that enable to: video call, screen sharing, display relenon-Glaucoma, Glaucoma, or borderline (Figure 7). vant data for analysis and discussion, and freely write. We asked the experts for analysis materials from their 4.3.2. Validating content and container - an daily work, e.g., anonymised imaging examinations, and informed GUI prototype then used the online platform Mural to present analysis tasks using these materials. While sharing the screen, In the second validation session, researchers presented experts analysed, selected, and annotated the digital imthe validated decision-making criteria integrated into a ages, and researchers asked timely questions that arose Graphical User Interface (GUI) prototype. The aim was from observing what participants were doing and saying to get feedback on the criteria and on the UI components (think-aloud). presenting it. According to participants’ availability, the L3. Mapping text with images helped associate Cervical cancer session took place in person (Figure 8), features to structures From the elicitation to the conand the Glaucoma session took place remotely (Figure 9). tent analysis, we found it elementary to map the textual

Some categories and criteria seemed to have more transcripts with the image that experts were analysing. than one possible way to name or present in the inter- We cropped, framed, and sketched over the images to face. Thus, to assess the correctness and completeness correlate what experts were saying with what they were of the data as well as the system’s components and re- seeing. In doing this, some categories emerged translated features, we applied A/B testing for participants to versely among both experts and images analysed, so this choose the best options. mapping led to discovering a standard structure of the

In Glaucoma study, we conducted a remote session explanations. through which we shared a PowerPoint presentation

L4. Categorisation matrix for multidisciplinary analysis As the categories emerged, we used Excel’s functionalities, such as drop-down lists to streamline the process of matching features to structures facilitating the systematisation of the analysis across more team members, i.e., design researchers and ML engineers.

6. Conclusions and Future Work This paper describes the user research activities carried

out by a multidisciplinary team to inform the design of Machine Learning algorithms and user interfaces for two XAI-based computer-aided diagnostic systems for Cervical cancer and Glaucoma. We shared what we think might be useful for other teams involved in the design of Explainable AI systems, namely, ways to operationalise human-centred design methods considering the objectives of Contextualisation, Elicitation, and Validation of such systems. In that scope, we demonstrate transcription, coding, and systematisation strategies that facilitated our content analysis, in particular, a categorisation matrix that helped uncover decision-making criteria and respective explanations’ structure to inform the design of AI-generated explanations. Future work will focus on further developing the graphical user interface (GUI) to adapt it to an AI-based classification system to support experts’ decision-making process.

Acknowledgments We would like to thank the medical experts from the

Anatomical Pathology Service of the Portuguese Oncology Institute - Porto (IPO-Porto) and from the University Hospital Centre of Porto (CHPorto), who participated in the user research sessions. A special thanks to our senior colleagues at Fraunhofer Portugal AICOS, Ana Barros and Francisco Nunes, who mentored us during the writing of the article. Finally, this work was financially supported by the project Transparent Artificial Medical Intelligence (TAMI), co-funded by Portugal 2020 framed under the Operational Programme for Competitiveness and Internationalization (COMPETE 2020), Fundação para a Ciência and Technology (FCT), Carnegie Mellon University, and European Regional Development Fund under Grant 45905. computer-centred methods, Applied Sciences 12 [17] L. P. Aiello, I. Odia, A. R. Glassman, M. Melia, L. M. (2022). URL: https://www.mdpi.com/2076-3417/12/ Jampol, N. M. Bressler, S. Kiss, P. S. Silva, C. C. 19/9423. doi:10.3390/app12199423. Wykof, J. K. Sun, D. R. C. R. Network, Com[8] P. N. Johnson-Laird, Mental models and human parison of Early Treatment Diabetic Retinopathy reasoning, Proc. Natl. Acad. Sci. U.S.A. 107 (2010) Study Standard 7-Field Imaging With Ultrawide18243–18250. doi:10.1073/pnas.1012933107. Field Imaging for Determining Severity of Diabetic [9] C. Rickheit, Gert; Habel, Mental Models in Dis- Retinopathy, JAMA Ophthalmol. 137 (2019) 65– course Processing and Reasoning, Elsevier Science 73. doi:10.1001/jamaophthalmol.2018.4982. B.V., Amsterdam, 1999. URL: https://books.google. arXiv:30347105. pt/books?hl=pt-PT&lr=&id=96jBqz_ar8AC& [18] Eurocytology, Criteria for adequacy of a cervioi=fnd&pg=PP1&dq=mental+models+versus+ cal cytology sample | Eurocytology, 2022. URL: reasoning+process&ots=Ou3b1SOv77&sig= https://www.eurocytology.eu/en/course/1142, [Onr5NouxMzR56klQTrokyvHSclJuQ&redir_esc= line; accessed 13. Oct. 2022]. y#v=onepage&q=mental%20models%20versus% [19] S. Rêgo, M. Monteiro-Soares, M. Dutra-Medeiros, 20reasoning%20process&f=false. F. Soares, C. C. Dias, F. Nunes, Implementation and [10] Z. Liu, J. Stasko, Mental models, visual reason- evaluation of a mobile retinal image acquisition ing and interaction in information visualization: A system for screening diabetic retinopathy: Study top-down perspective, IEEE Transactions on Visual- protocol, Diabetology 3 (2022) 1–16. URL: https: ization and Computer Graphics 16 (2010) 999–1008. //www.mdpi.com/2673-4540/3/1/1. doi:10.3390/ doi:10.1109/TVCG.2010.177. diabetology3010001. [11] J. S. Holtrop, L. D. Scherer, D. D. Matlock, R. E. Glas- [20] T. Conceição, C. Braga, L. Rosado, M. J. M. Vascongow, L. A. Green, The Importance of Mental Models celos, A Review of Computational Methods for in Implementation Science, Front. Public Health 9 Cervical Cells Segmentation and Abnormality Clas(2021). doi:10.3389/fpubh.2021.680316. sification, Int. J. Mol. Sci. 20 (2019). doi: 10.3390/ [12] R. Binns, M. Van Kleek, M. Veale, U. Lyngs, J. Zhao, ijms20205114.

N. Shadbolt, ’it’s reducing a human being to a [21] D. A. De Jesus, L. S. Brea, J. B. Breda, E. Fokkinga, percentage’: Perceptions of justice in algorithmic V. Ederveen, N. Borren, A. Bekkers, M. Pircher, decisions, in: Proceedings of the 2018 CHI Con- I. Stalmans, S. Klein, T. van Walsum, OCTA Mulference on Human Factors in Computing Systems, tilayer and Multisector Peripapillary MicrovascuCHI ’18, Association for Computing Machinery, lar Modeling for Diagnosing and Staging of GlauNew York, NY, USA, 2018, p. 1–14. URL: https: coma, Trans. Vis. Sci. Tech. 9 (2020) 58. doi:10. //doi.org/10.1145/3173574.3173951. doi:10.1145/ 1167/tvst.9.2.58.

3173574.3173951. [22] CLARE: Computer-aided cervical cancer screen[13] T. Kulesza, S. Stumpf, M. Burnett, W.-K. Wong, ing, 2023. URL: https://www.aicos.fraunhofer.pt/ Y. Riche, T. Moore, I. Oberst, A. Shinsel, K. McIn- en/our_work/projects/clare.html, [Online; accessed tosh, Explanatory debugging: Supporting end-user 15. Feb. 2023]. debugging of machine-learned programs, in: 2010 [23] E. Bentley, oTranscribe: A free web app to take the IEEE Symposium on Visual Languages and Human- pain out of transcribing recorded interviews., 2023. Centric Computing, 2010, pp. 41–48. doi:10.1109/ URL: https://otranscribe.com/, [Online; accessed 15.

VLHCC.2010.15. Feb. 2023]. [14] S. Jalil, T. Myers, I. Atkinson, M. Soden, Complementing a Clinical Trial With Human-Computer Interaction: Patients’ User Experience With Telehealth, JMIR Human Factors 6 (2019) e9481. doi:10.

2196/humanfactors.9481. [15] T. Dagdelen, Modernizing the User Interface of a Legacy System at the Swedish Police Authority : Collaborative Mental Model: A New Participatory Design Method, 2019. URL: https://www.diva-portal.org/smash/record.jsf? pid=diva2%3A1366483&dswid=8599. [16] D.-G. of Health of Portugal, Rastreio da retinopatia diabética - portal das normas clínicas, 2018.

URL: https://normas.dgs.min-saude.pt/2018/09/13/ rastreio-da-retinopatia-diabetica/.

[1]

F. K.

Došilović ,

Brčić ,

Hlupić , Explainable artificial intelligence: A survey , in: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) , 2018 , pp. 0210 - 0215 . doi: 10 .23919/ MIPRO. 2018 . 8400040 .

[2] J.-M. Fellous , G.

Sapiro , A.

Rossi , H.

Mayberg , M.

Ferrante , Explainable artificial intelligence for neuroscience: Behavioral neurostimulation , Frontiers in Neuroscience 13 ( 2019 ). URL: https://www. frontiersin.org/articles/10.3389/fnins. 2019 . 01346 . doi: 10 .3389/fnins. 2019 . 01346 .

[3]

F. E.

Commission , Ethics Guidelines for Trustworthy AI - FUTURIUM - European Commission , 2021 . URL: https://ec.europa.eu/futurium/en/ ai-alliance -consultation.1 .html, [Online; accessed 13. Oct. 2022 ].

[4]

Adadi ,

Berrada , Peeking inside the black-box: A survey on explainable artificial intelligence (xai) , IEEE Access 6 ( 2018 ) 52138 - 52160 . doi: 10 .1109/ ACCESS. 2018 . 2870052 .

[5]

Burkart ,

M. F.

Huber , A Survey on the Explainability of Supervised Machine Learning , J. Artif. Intell. Res . 70 ( 2021 ) 245 - 317 . doi: 10 .1613/jair. 1.12228.

[6]

Q. V.

Liao ,

Pribić , J. Han,

Miller ,

Sow , Question-driven design process for explainable ai user experiences , 2021 . URL: https://arxiv.org/abs/ 2104.03483. doi: 10 .48550/ARXIV.2104.03483.

[7]

Lopes ,

Silva ,

Braga ,

Oliveira , L. Rosado, Xai systems evaluation: A review of human and