<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Designing XAI-based Computer-aided Diagnostic Systems: Operationalising User Research Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Elsa Oliveira</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristiana Braga</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ana Sampaio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiago Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filipe Soares</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luís Rosado</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>First Solutions - Sistemas de Informação</institution>
          ,
          <addr-line>S.A., Rua Conselheiro Costa Braga, Matosinhos</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Fraunhofer Portugal AICOS</institution>
          ,
          <addr-line>Rua Alfredo Allen 455/461, 4200-135, Porto</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>AI technology has the potential to support humans' processes and tasks by augmenting human capabilities and efectiveness. Computer-aided systems have been implemented in healthcare mainly to support clinical decisions. As in other areas, the impact, complexity, and opacity of AI operations have led to the establishment of guidelines for trustworthy AI, which implies being understandable. This study describes the user research work carried out by a multidisciplinary team composed of ML engineers, design researchers, and medical experts, to inform the design of algorithms and user interfaces for two XAI-based clinical decision support tools targeted at Cervical cancer and Glaucoma screening. In particular, we sought to leverage and bridge individual and collective expertise to understand the context, decision-making processes and criteria, and values that frame the respective clinical decisions. The article describes how we operationalised the research activities with expert users and what strategies we followed for subsequent content analysis, ending with the sharing of lessons learned as valuable insights for other research teams interested in designing computer-aided diagnostic systems based on human-centred XAI approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable AI</kwd>
        <kwd>Computer-aided detection</kwd>
        <kwd>Decision Support System</kwd>
        <kwd>Ophthalmology</kwd>
        <kwd>Glaucoma</kwd>
        <kwd>Cytology</kwd>
        <kwd>Cervical cancer</kwd>
        <kwd>Retinal Imaging</kwd>
        <kwd>Microscopy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        aged human-centred design methods to uncover what to
explain, why, how, and for whom [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
        ]. We share
Despite its potential, AI has struggled to be understand- a study of how we operationalised Human-Centred
Deable. This requirement has been critical in several areas, sign (HCD) methods to inform the design of algorithms
mainly in healthcare [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], where AI can support clini- and user interfaces for two XAI-based clinical decision
cal decisions. There has been consensus on the need to support tools for Cervical cancer and Glaucoma
screenpromote accountable and trustworthy AI. The European ing. We were concerned with grasping medical experts’
Commission’s High-Level Expert Group on Artificial In- mental models and reasoning processes. While mental
telligence (AI HLEG) says that whenever an AI system models are mental constructs that represent a distinct
has a significant impact on people’s lives, it should be pos- possibility and derive a conclusion from them, reasoning
sible to demand a suitable explanation of the AI system’s implies a process to derive a conclusion and depends on
decision-making process [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. These considerations have envisaging the possibilities (mental models) consistent
led AI towards Explainable AI (XAI), which in turn lever- with a starting point [8]. So, to access the diagnosis’
reasoning and identify the decision-making data and the
Elsa Oliveira, Cristiana Braga, Ana Sampaio, Tiago Oliveira, Filipe explanations structures to apply in the design of
XAISoares and Luís Rosado. 2023. Designing XAI-based Computer-aided based clinical decision support tools, we needed to get
Diagnostic Systems: Operationalising User Research Methods. Joint inside the diagnosis process with those who practice it
Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, the medical experts. [9, 10]
*ACuostrrraelsiap,o1n1dipnaggeasu. thor. This paper is structured into 6 sections. First section
† These authors contributed equally. introduces the demand for XAI systems. Section 2
iden$ elsa.oliveira@aicos.fraunhofer.pt (E. Oliveira); tifies the objectives and design of the study, subdivided
cristiana.braga@aicos.fraunhofer.pt (C. Braga); into three phases: contextualisation, elicitation, and
valiana.sampaio@aicos.fraunhofer.pt (A. Sampaio); dation. Section 3 briefly introduces the medical context
ifltiipage.os.ooalirveesi@raa@icfirosst.-fgraloubnahlo.cfoemr.p(tT(.FO.Sliovaerieras));; of Cervical cancer and Glaucoma, on which the work
luis.rosado@aicos.fraunhofer.pt (L. Rosado) was focused. Section 4 describes how we operationalised
0000-0002-7105-9654 (E. Oliveira); 0000-0002-9384-2252 the research work focusing on the research activities
(C. Braga); 0000-0003-1770-4429 (A. Sampaio); with the users and the analysis of the collected content.
0000-0002-2881-313X (F. Soares); 0000-0002-8060-831X (L. Rosado)
      </p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Finally, in section 5 we share lessons learned from this
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) study, and section 6 indicates the main conclusions and
future work.</p>
      <sec id="sec-1-1">
        <title>2.2. Elicitation</title>
        <sec id="sec-1-1-1">
          <title>The elicitation phase asked for more detail on the</title>
          <p>2. Goals and Study Design decision-making process, decision-making data, and on
the explanation structures that support it. To this end the
As a multidisciplinary team, composed by Machine Learn- research team relied on referenced methods for mental
ing (ML) engineers, design researchers, and medical ex- models’ elicitation [11], such as Semi-structured
interperts, we sought to leverage and bridge individual and views, Observation, and Think-Aloud [12, 13], together
collective expertise, especially from the medical area for with co-creation practices - that made use of imaging
which the systems were conceived, to inform the de- data and other design materials to facilitate participants
sign of algorithms and user interfaces for explainable in demonstrating the processes of analysis and
decisiondecision support software targeted at Cervical cancer making. Nielsen refers Think-Aloud method as efective
and Glaucoma screening. We based our study on the in giving us insights into users’ mental models regarding
results of the user research activities, which aimed to a given task. The study also drew on the procedures of a
understand the context, processes, and values that frame ifeld study method based on Observation and interviews
clinical decisions in the above-mentioned health areas. to understand work practices and behaviors - Contextual
The research process was guided by three phases: con- inquiry [14, 15]. Kim Salazar on Nielsen Norman Group
textualisation, elicitation, and validation. The user re- website highlights the value of the contextual inquiry
search methods applied in each phase (described below) method – to inquiry in context, which results in a
colreturned a considerable amount of fieldwork materials, laborative interpretation between researchers and expert
i.e., written, verbal and visual content, that researchers users about work practices and behaviors, with a more
needed to analyse to enhance understanding of the data. in-depth understanding of experts’ reasoning. With these
In analysing these data, we initially focused on codify- references in mind, the research team set up workshops
ing what the clinicians said (written transcription) about to observe, and question medical experts analysing and
their decision-making process. However, most of their deciding on clinical cases and from clinical data.
explanations evoked visual aspects of the images. As we
are non-experts, we quickly realised that we needed to 2.3. Validation
match what the doctors were saying with the respective
visual elements they were characterising in their expla- The validation stage allowed us to discuss, correct,
comnations. For example, when clinicians explained that a plete, and refine with medical experts the research
findcell was abnormal "because it had a halo around the nu- ings. Through co-creation design practices, researchers
cleus", HCD and ML researchers could not understand designed group and individual workshops, in both remote
what a halo was without a visual reference of that cell and in-person versions, in order to display the
decisioncontaining a halo. The content analysis process based on making criteria within the respective structures, to be
Transcription, Coding, and Systematisation, paved the discussed and easily edited and iterated in real-time. For
way for the decision-making data and inherent reasoning some questions, we used A/B testing method for
particistructure. pants to select the best option.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>2.1. Contextualisation</title>
        <p>As a first step, the researchers sought to become
familiarised with the jargon, clinical practices, and
decisionmaking processes used by health professionals. Initially,
the researchers made more superficial research in online
medical articles, also to acquire the basic knowledge to
prepare for the interviews with medical experts. In fact,
the contextualisation was accomplished mainly through
semi-structured interviews which script applied Task
Reflection and Retrospection methods, to prompt
participants to reflect and describe their daily clinical tasks and
diagnostic practices. The interviews gave us an overview
of clinical practices, decision-making processes, values,
and a quick window into participants’ mental models
as they gave examples of clinical cases and how they
decided on them.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Cervical cancer and Glaucoma</title>
      <p>As mentioned in the introduction, this study addresses
two distinct health areas, Anatomical Pathology and
Ophthalmology, more specifically, Cervical cancer and
Glaucoma. The main goal of the study was to design an
explainable decision support software per area, both based
on imaging screening, to be used by medical experts, and
physicians in training.</p>
      <p>Cervical cancer screening will mainly rely on
cytological microscopic images, while Glaucoma screening on
retinal images. The research team needed to go deep into
each clinical practice to define the systems’ requirements.</p>
      <p>Table 1 lists the main aspects that characterise the two
health areas under study, considering the analysis that
medical experts carry out per patient. This knowledge
was built-up throughout the contextualisation and elici- nologists; in Glaucoma, 4 ophthalmologists specialised
tation phases. While both areas share common aspects, in Glaucoma (glaucomatologists). Because of COVID-19,
they also have some significant diferences. in particular the restrictions on in person group
meetings and on normal access to hospitals and clinical
settings, most user research activities took place remotely
4. Operationalisation of research through digital and online platforms. Through these,
activities participants were able to access anonymised screening
images, as well as other clinical data, to demonstrate their
In this section, we describe how we operationalised the re- decision-making process, and reasoning, while observed
search activities for the contextualisation, elicitation, and and questioned by the research team.
validation phases. At the beginning of the study, all
participants received a general informed consent that gave 4.1. Contextualisation interviews
an overview of the user research agenda, being further
provided a detailed informed consent per activity. The After some basic research through online medical articles,
study counted with up to 5 participants per health area the research team draw the interview script addressed
- in Cervical cancer, 3 cytopathologists and 2 cytotech- to the medical experts. The interview script aimed at:
understanding clinical procedures, i.e., from the first diagnostic exams with diverse diagnosis: unconfirmed ,
consultation up to and after diagnosis, eliciting medi- borderline, early stage Glaucoma and advanced stage
Glaucal experts’ values, i.e., their motivation for the medical coma.
ifeld, examples of impactful cases, and, very important, To ensure participants’ unbiased decisions, we first
medical expectations regarding the introduction of AI conducted individual workshops. Afterwards we ran a
systems in the clinical practice. The semi-structured in- group workshop for both medical fields to help identify
terviews were carried out remotely through video call consensual criteria and foster discussions around the
by Microsoft Teams software. To note that in Cervical least consensual ones.
cancer, the research team took advantage of the results of
a previous and related study with cytopathologists and 4.2.2. Conducting individual workshops
cytotechnicians [22, 20] that had conducted in-person
semi-structured interviews with the same participants.</p>
      <p>These interviews enabled us to understand the processes
involved in cytological analysis, from the reception of
the sample to the diagnosis.</p>
      <sec id="sec-2-1">
        <title>Each workshop consisted of one main task: the partici</title>
        <p>pant, as a medical expert, would assess imaging
examinations in real-time and think aloud about their analysis.</p>
        <p>This way, we could follow the assessment process and ask
questions whenever needed to better understand it.
More4.1.1. Interviews analysis over, we asked participants to annotate relevant findings
whose appearance suggested a pathological change and
Once the interviews were completed, we transcribed to provide the respective diagnosis classification.
Exthem using oTranscribe software [23]. We then organ- perts in Cervical cancer classified cytological images
acised the participants’ insights into the main themes raised cording to the Bethesda System’s convention. Experts
during the interviews. in Glaucoma classified retinal images according to the
four stages mentioned in section 4.2.1. Each participant
4.2. Workshops for eliciting diagnostic analysed from three to seven images consisting of
liquidbased cytological samples (in Cervical cancer) or from
processes four to sixteen images consisting of eight pairs of retinal
Familiarised with both medical areas, we inspired our- images (in Glaucoma). Figure 1 shows a visual field of a
selves in the contextual inquiry method to design the cytological sample with two cells annotated by a
particiworkshops that would enable us to elicit experts’ diag- pant for their abnormality, and figure 2 a retinal image
nostic assessment process. Our goal was to understand being analysed by a participant.
what experts look at when they analyse a clinical case, Given the interdependence with other examinations in
specifically, an imaging examination, and what criteria Glaucoma diagnosis, and the wider range of diagnostic
they use to assess whether it is a pathological change. factors outside imaging data, Glaucoma workshops
comprised an additional task. Each participant was asked to
4.2.1. Designing remote workshops list the steps of a usual medical procedure, from the first
consultation to the diagnosis, describing other relevant
The analysis of imaging examinations was a requirement examinations beyond the retinal image. Figure 3 shows
for the diagnosis assessment, thus, we needed to observe the timeline filled in by 1 of the 4 participants considering
experts analysing such images. Usually, we would visit the examinations performed throughout the analysis of
the experts’ workplace and observe them in a real clini- a given clinical case (for example, José, 62 years old with
cal setting. However, due to COVID-19, the workshops high intraocular pressure) - from the first consultation
had to be remote, and so, we mimicked this observation to diagnosis, and, where necessary, in the patient
followremotely. up. In the second part of the workshop, the participant</p>
        <p>For Cervical cancer, we asked a cytologist to provide accessed anonymised eye examinations, corresponding
us with images of liquid-based cytological samples. For to diferent diagnoses, from non-Glaucoma to advanced
Glaucoma, we asked a glaucomatologist to provide us stage Glaucoma, to then select and analyse the most
repwith retinal images. However, this was not all. From the resentative of a specific clinical case. Figure 2 is one of
interviews, we learned that both medical fields comple- the retinal images that a participant has zoomed in and
mented images’ interpretation with clinical data, which centred on the optical disc to show which image features
we were attending to. But we also learned that Glaucoma- reflect the state of the eye’s structures and should
theretous pathology was more complex to diagnose, because fore be considered as a criterion for decision making.
glaucomatologists often had to integrate complementary
diagnostic exams to reach a diagnosis.</p>
        <p>With this in mind, we asked the glaucomatologist to
provide us with a set of anonymised complementary
4.2.3. Transcribing and analysing
in fact, we were able to verify this. Below (Figure 5) is an
As we transcribed the workshops, it became evident that example of the same cytological field analysed by the five
we should assign textual excerpts to image cut-outs, as Cervical cancer experts. Both annotations of suspicious
most of experts’ explanations consisted of descriptions or abnormal cells and final classifications varied across
of visible characteristics in the analysed images. Thus, analysts. While three experts classified the cytological
mapping the object of analysis with the respective tran- field as ASC-US - an oficial classification for uncertainty
scription enabled us to keep a correspondence between regarding an Abnormal Squamous Cell(s), two of the five
what was said and what was being observed in the image experts classified the sample as LSIL - an oficial
classi(Figure 4). ifcation comparable to ASC-US, but that assigns a Low</p>
        <p>We did this for each participant. Almost all partici- grade of Intraepithelial Lesion to the Squamous cell(s).
pants, from both medical fields, mentioned how the
analysis and conclusions of some clinical cases were
subjec4.2.4. Coding and systematisation
tive. For instance, a glaucomatologist said: “Sometimes
it’s not black and white, it’s grey”, meaning that the same Once we completed the transcripts, we created a
cateexaminations and clinical data may lead experts to difer- gorisation matrix in Excel to code the data into a set
ent decisions. This happens when the available elements of categories that constitute the building blocks of the
for diagnosis are unclear, due to either image character- explanations, which allowed us to uncover a generic
existics that hinder experts’ analysis (e.g. blurry image), or planation structure suitable for both use cases. We used
to characteristics of the anatomical structures, which can the columns’ headings for the categories, and the rows
be themselves confusing (when the same visual appear- to list the image that has triggered the explanation
toance can be the result of diferent possible causes), which gether with the textual explanation (quote) and the set
requires more tests and more time. of categorisable criteria (Figure 6).</p>
        <p>Moreover, Cervical cancer experts highlighted the sub- As we went on with the codification, we iteratively
jectivity intra- and inter-observer, explaining that not refined the categories into Key structure(s) examined,
only the decision may vary between experts, as the same Key feature(s) concerned, Risk factor, Not Cervical
canexpert could give a diferent classification to the same cer/Glaucoma factor, Doubt factor, Result attributed, and
sample at diferent moments in time. Therefore, we ifnally, Key expression used by the expert. As the Excel’s
sought this subjective dimension by comparing the analy- content increased, we identified that the criteria we filled
sis of each participant to the same object of analysis, and in the categories would repeat. So, we created an Excel
tab to list the criteria for each category as they emerged
throughout the process. We ended up gathering a list of
options that enabled us to streamline the filling-in
process. To avoid subjectivity and/or interpretation errors
in the process of codification, we organised an internal
panel of three coders composed of researchers involved
in these activities. All transcriptions were assigned to this
panel, varying who would be the first coder. While the
ifrst coder would codify the transcription from scratch,
the following two would validate the first codification.</p>
        <p>Taking the following quote as an example from Cervical
cancer, we would describe as table 2 shows.</p>
        <p>It has a darker nucleus, but with this
resolution, when I try to zoom in, I can’t see
the characteristics.</p>
        <p>Figure 6 shows the variability of decision criteria by
category that was raised throughout the analysis.</p>
        <p>By the end of the analysis, we uncovered the most
relevant criteria used by experts in each medical field
to analyse and explain their decisions. And we could
standardise that most of the explanations followed this
structure:</p>
        <p>The [Key feature concerned] of the [Key struc- suggest a plausible contradiction that prevented them
ture(s) examined] is [Risk factor] OR [Not Cervical from providing a classification of which they were
conficancer/Glaucoma]. dent.</p>
        <p>e.g. Cervical cancer: The [colour] of the
[nucleus] is [hyperchromatic].
Glaucoma: The [optic disc] has an
[excavation greater than 0.4].</p>
      </sec>
      <sec id="sec-2-2">
        <title>Moreover, we found that sentences stating a “not Cervical cancer/Glaucoma factor” or “doubt factor” could follow the Key feature concerned. Experts used them to</title>
        <p>e.g., Cervical cancer: The [colour] of the
[nucleus] is [hyperchromatic],
however, [there are overlapping cells].
Glaucoma: The [optic disc] has an
[excavation greater than 0.4], however, [is
symmetric].</p>
        <p>It has a darker nucleous (Part of a cell)
nucleous (Nucleous)
a darker nucleous (Colour intensity)
a darker nucleous (Hyperchromasia)
Not applicable
but with this resolution,... I can’t see the characteristics (Image quality - Blurred)
... I can’t see the characteristics Insuficient / No classification</p>
        <p>In these explanations, the experts point out a struc- with images of the prototype of the GUI together with its
ture that he/she observed and characterise its aspect – content (the elicited decision-making criteria) listed in an
reflecting a well-known and established risk factor in the editable text box, as shown in Figure 9. The content was
domain knowledge, i.e., Cervical cytology: [hyperchro- discussed in real-time and, whenever necessary, easily
matic], Glaucoma: [excavation greater than 0.4]. Never- edited.
theless, the explanations also stress - through the
contrastive expression ‘however’ - other characteristics that
complement and contrast the first ones, i.e., Cervical cy- 5. Lessons Learned
tology: [there are overlapping cells], Glaucoma: [is
symmetric]. And this prevents the experts from discerning
with certainty whether the first observed characteristic
is an anomaly or not.</p>
        <p>L1: Multidisciplinary team The design of XAI-based
clinical decision support tools requires extensive
knowledge from various domains. It is paramount that teams
ensure an iterative communication that keeps all in the
loop, i.e., design researchers, medical experts, ML
En4.3. Workshops for validation gineers, etc. Let us highlight ML Engineers’ guidance
Based on the results of previous user research activi- on the feasibility of the required functionalities, their
ties, researchers designed validation workshops to: (i) support in defining the needed data, i.e., quantity and
ensure no conflicting information among the knowledge quality, and the infrastructure for implementation. Many
shared by each participant, (ii) remove possible impre- systems based on supervised learning require annotated
cision from researchers’ interpretation and consequent data, analysed by experts in terms of elements needed to
analysis outcomes, and (iii) get insights on a first ver- guide the models’ learning process. In the case of medical
sion of the graphical user interface (GUI) designed from XAI systems, this requires close cooperation with clinical
scratch to attend the elicited diagnostic processes. experts to ensure the annotation of the data instances
objectively and uniformly. This way, ML Engineers
guar4.3.1. Conducting group workshops antee that the final data set comprises cases suficiently
representative of the diferent data properties that may
The first validation session was carried out through the arise in practical scenarios.</p>
        <p>Mural platform, from where participants accessed and L2. Contextual inquiry method as a basis for
eliciinteracted (by editing, deleting, or adding content) with tation The contextual inquiry method inspired the study
the list of decision-making criteria raised so far in or- to observe experts performing a task as close to
realder to ensure their correctness and completeness. In ity as possible by having them verbalise their thoughts
Glaucoma workshops, participants were also asked to while analysing imaging examinations and providing
dianalyse several examinations, mainly retinal images, and agnostic classifications for them. We conclude that, when
to choose the applicable criteria for each one from the in-loco sessions are not possible, researchers can simulate
list elicited by researchers. We asked participants to po- the method remotely using digital and online platforms
sition the selected criteria in one of three possibilities: that enable to: video call, screen sharing, display
relenon-Glaucoma, Glaucoma, or borderline (Figure 7). vant data for analysis and discussion, and freely write.
We asked the experts for analysis materials from their
4.3.2. Validating content and container - an daily work, e.g., anonymised imaging examinations, and
informed GUI prototype then used the online platform Mural to present analysis
tasks using these materials. While sharing the screen,
In the second validation session, researchers presented experts analysed, selected, and annotated the digital
imthe validated decision-making criteria integrated into a ages, and researchers asked timely questions that arose
Graphical User Interface (GUI) prototype. The aim was from observing what participants were doing and saying
to get feedback on the criteria and on the UI components (think-aloud).
presenting it. According to participants’ availability, the L3. Mapping text with images helped associate
Cervical cancer session took place in person (Figure 8), features to structures From the elicitation to the
conand the Glaucoma session took place remotely (Figure 9). tent analysis, we found it elementary to map the textual</p>
        <p>Some categories and criteria seemed to have more transcripts with the image that experts were analysing.
than one possible way to name or present in the inter- We cropped, framed, and sketched over the images to
face. Thus, to assess the correctness and completeness correlate what experts were saying with what they were
of the data as well as the system’s components and re- seeing. In doing this, some categories emerged
translated features, we applied A/B testing for participants to versely among both experts and images analysed, so this
choose the best options. mapping led to discovering a standard structure of the</p>
        <p>In Glaucoma study, we conducted a remote session explanations.
through which we shared a PowerPoint presentation</p>
        <p>L4. Categorisation matrix for multidisciplinary
analysis As the categories emerged, we used Excel’s
functionalities, such as drop-down lists to streamline the
process of matching features to structures facilitating
the systematisation of the analysis across more team
members, i.e., design researchers and ML engineers.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusions and Future Work</title>
      <sec id="sec-3-1">
        <title>This paper describes the user research activities carried</title>
        <p>out by a multidisciplinary team to inform the design of
Machine Learning algorithms and user interfaces for two
XAI-based computer-aided diagnostic systems for
Cervical cancer and Glaucoma. We shared what we think
might be useful for other teams involved in the design of
Explainable AI systems, namely, ways to operationalise
human-centred design methods considering the
objectives of Contextualisation, Elicitation, and Validation of
such systems. In that scope, we demonstrate
transcription, coding, and systematisation strategies that
facilitated our content analysis, in particular, a categorisation
matrix that helped uncover decision-making criteria and
respective explanations’ structure to inform the design
of AI-generated explanations. Future work will focus on
further developing the graphical user interface (GUI) to
adapt it to an AI-based classification system to support
experts’ decision-making process.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>We would like to thank the medical experts from the</title>
        <p>Anatomical Pathology Service of the Portuguese
Oncology Institute - Porto (IPO-Porto) and from the University
Hospital Centre of Porto (CHPorto), who participated
in the user research sessions. A special thanks to our
senior colleagues at Fraunhofer Portugal AICOS, Ana
Barros and Francisco Nunes, who mentored us during
the writing of the article. Finally, this work was
financially supported by the project Transparent Artificial
Medical Intelligence (TAMI), co-funded by Portugal 2020
framed under the Operational Programme for
Competitiveness and Internationalization (COMPETE 2020),
Fundação para a Ciência and Technology (FCT), Carnegie
Mellon University, and European Regional Development
Fund under Grant 45905.
computer-centred methods, Applied Sciences 12 [17] L. P. Aiello, I. Odia, A. R. Glassman, M. Melia, L. M.
(2022). URL: https://www.mdpi.com/2076-3417/12/ Jampol, N. M. Bressler, S. Kiss, P. S. Silva, C. C.
19/9423. doi:10.3390/app12199423. Wykof, J. K. Sun, D. R. C. R. Network,
Com[8] P. N. Johnson-Laird, Mental models and human parison of Early Treatment Diabetic Retinopathy
reasoning, Proc. Natl. Acad. Sci. U.S.A. 107 (2010) Study Standard 7-Field Imaging With
Ultrawide18243–18250. doi:10.1073/pnas.1012933107. Field Imaging for Determining Severity of Diabetic
[9] C. Rickheit, Gert; Habel, Mental Models in Dis- Retinopathy, JAMA Ophthalmol. 137 (2019) 65–
course Processing and Reasoning, Elsevier Science 73. doi:10.1001/jamaophthalmol.2018.4982.
B.V., Amsterdam, 1999. URL: https://books.google. arXiv:30347105.
pt/books?hl=pt-PT&amp;lr=&amp;id=96jBqz_ar8AC&amp; [18] Eurocytology, Criteria for adequacy of a
cervioi=fnd&amp;pg=PP1&amp;dq=mental+models+versus+ cal cytology sample | Eurocytology, 2022. URL:
reasoning+process&amp;ots=Ou3b1SOv77&amp;sig= https://www.eurocytology.eu/en/course/1142,
[Onr5NouxMzR56klQTrokyvHSclJuQ&amp;redir_esc= line; accessed 13. Oct. 2022].
y#v=onepage&amp;q=mental%20models%20versus% [19] S. Rêgo, M. Monteiro-Soares, M. Dutra-Medeiros,
20reasoning%20process&amp;f=false. F. Soares, C. C. Dias, F. Nunes, Implementation and
[10] Z. Liu, J. Stasko, Mental models, visual reason- evaluation of a mobile retinal image acquisition
ing and interaction in information visualization: A system for screening diabetic retinopathy: Study
top-down perspective, IEEE Transactions on Visual- protocol, Diabetology 3 (2022) 1–16. URL: https:
ization and Computer Graphics 16 (2010) 999–1008. //www.mdpi.com/2673-4540/3/1/1. doi:10.3390/
doi:10.1109/TVCG.2010.177. diabetology3010001.
[11] J. S. Holtrop, L. D. Scherer, D. D. Matlock, R. E. Glas- [20] T. Conceição, C. Braga, L. Rosado, M. J. M.
Vascongow, L. A. Green, The Importance of Mental Models celos, A Review of Computational Methods for
in Implementation Science, Front. Public Health 9 Cervical Cells Segmentation and Abnormality
Clas(2021). doi:10.3389/fpubh.2021.680316. sification, Int. J. Mol. Sci. 20 (2019). doi: 10.3390/
[12] R. Binns, M. Van Kleek, M. Veale, U. Lyngs, J. Zhao, ijms20205114.</p>
        <p>N. Shadbolt, ’it’s reducing a human being to a [21] D. A. De Jesus, L. S. Brea, J. B. Breda, E. Fokkinga,
percentage’: Perceptions of justice in algorithmic V. Ederveen, N. Borren, A. Bekkers, M. Pircher,
decisions, in: Proceedings of the 2018 CHI Con- I. Stalmans, S. Klein, T. van Walsum, OCTA
Mulference on Human Factors in Computing Systems, tilayer and Multisector Peripapillary
MicrovascuCHI ’18, Association for Computing Machinery, lar Modeling for Diagnosing and Staging of
GlauNew York, NY, USA, 2018, p. 1–14. URL: https: coma, Trans. Vis. Sci. Tech. 9 (2020) 58. doi:10.
//doi.org/10.1145/3173574.3173951. doi:10.1145/ 1167/tvst.9.2.58.</p>
        <p>3173574.3173951. [22] CLARE: Computer-aided cervical cancer
screen[13] T. Kulesza, S. Stumpf, M. Burnett, W.-K. Wong, ing, 2023. URL: https://www.aicos.fraunhofer.pt/
Y. Riche, T. Moore, I. Oberst, A. Shinsel, K. McIn- en/our_work/projects/clare.html, [Online; accessed
tosh, Explanatory debugging: Supporting end-user 15. Feb. 2023].
debugging of machine-learned programs, in: 2010 [23] E. Bentley, oTranscribe: A free web app to take the
IEEE Symposium on Visual Languages and Human- pain out of transcribing recorded interviews., 2023.
Centric Computing, 2010, pp. 41–48. doi:10.1109/ URL: https://otranscribe.com/, [Online; accessed 15.</p>
        <p>VLHCC.2010.15. Feb. 2023].
[14] S. Jalil, T. Myers, I. Atkinson, M. Soden,
Complementing a Clinical Trial With Human-Computer
Interaction: Patients’ User Experience With
Telehealth, JMIR Human Factors 6 (2019) e9481. doi:10.</p>
        <p>2196/humanfactors.9481.
[15] T. Dagdelen, Modernizing the User Interface
of a Legacy System at the Swedish Police
Authority : Collaborative Mental Model: A
New Participatory Design Method, 2019. URL:
https://www.diva-portal.org/smash/record.jsf?
pid=diva2%3A1366483&amp;dswid=8599.
[16] D.-G. of Health of Portugal, Rastreio da
retinopatia diabética - portal das normas clínicas, 2018.</p>
        <p>URL: https://normas.dgs.min-saude.pt/2018/09/13/
rastreio-da-retinopatia-diabetica/.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. K.</given-names>
            <surname>Došilović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hlupić</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence: A survey</article-title>
          ,
          <source>in: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>0210</fpage>
          -
          <lpage>0215</lpage>
          . doi:
          <volume>10</volume>
          .23919/ MIPRO.
          <year>2018</year>
          .
          <volume>8400040</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>J.-M. Fellous</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sapiro</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mayberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ferrante</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence for neuroscience: Behavioral neurostimulation</article-title>
          ,
          <source>Frontiers in Neuroscience</source>
          <volume>13</volume>
          (
          <year>2019</year>
          ). URL: https://www. frontiersin.org/articles/10.3389/fnins.
          <year>2019</year>
          .
          <volume>01346</volume>
          . doi:
          <volume>10</volume>
          .3389/fnins.
          <year>2019</year>
          .
          <volume>01346</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Commission</surname>
          </string-name>
          , Ethics Guidelines for
          <string-name>
            <surname>Trustworthy AI - FUTURIUM - European Commission</surname>
          </string-name>
          ,
          <year>2021</year>
          . URL: https://ec.europa.eu/futurium/en/ ai-alliance
          <source>-consultation.1</source>
          .html, [Online; accessed 13. Oct.
          <year>2022</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Adadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Berrada</surname>
          </string-name>
          ,
          <article-title>Peeking inside the black-box: A survey on explainable artificial intelligence (xai)</article-title>
          ,
          <source>IEEE Access 6</source>
          (
          <year>2018</year>
          )
          <fpage>52138</fpage>
          -
          <lpage>52160</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ACCESS.
          <year>2018</year>
          .
          <volume>2870052</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Burkart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on the Explainability of Supervised Machine Learning</article-title>
          ,
          <source>J. Artif. Intell. Res</source>
          .
          <volume>70</volume>
          (
          <year>2021</year>
          )
          <fpage>245</fpage>
          -
          <lpage>317</lpage>
          . doi:
          <volume>10</volume>
          .1613/jair. 1.12228.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pribić</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sow</surname>
          </string-name>
          ,
          <article-title>Question-driven design process for explainable ai user experiences</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/ 2104.03483. doi:
          <volume>10</volume>
          .48550/ARXIV.2104.03483.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lopes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Braga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          , L. Rosado,
          <article-title>Xai systems evaluation: A review of human and</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>