Explaining complex machine learning platforms to members of
the general public
Rachel Eardley,a, Ewan Soubutts,a , Amid Ayobi,a, Rachael Gooberman-Hill,a and Aisling
O'Kane,a
a
     University of Bristol, Beacon House, Queens Road, Bristol, U.K.

                                   Abstract
                                   In this workshop paper we present an overview of our research into understanding how to
                                   explain complex machine learning (ML) health platforms to members of the general public who
                                   might benefit from them, specifically those who have Type 2 Diabetes (T2D). The availability
                                   of home health sensor technology is increasing; however, it is unclear how to explain these
                                   platforms to potential users so that they can make an ‘informed decision’ on the adoption of
                                   that platform within their home. Through a user-centered-design approach, we have completed
                                   a case study with three studies that have (1) Given an overview of a complex ML platform, that
                                   of SPHERE; (2) Identified how the participants would like us to explain this content and (3)
                                   Created and validated an explanation document that presents, at a high-level the SPHERE
                                   platform. We present our findings on the priority of understanding how and why the platform
                                   can help them over the technical detail of the platform itself.

                                   Keywords 1
                                   Explanations, Machine Learning, Digital Health, Informed decision, Home health, Complex
                                   platforms, Design.


1. INTRODUCTION                                                                                               [1,7,9,15]. In order to bridge the lack of
                                                                                                              understanding, we look to Explainable AI
                                                                                                              (XAI), an area of study that challenges different
   In many parts of our daily lives, Artificial
                                                                                                              disciplines (‘developers’, ‘theorists’, ‘ethicists’
Intelligence (AI) and Machine Learning (ML)
                                                                                                              etc.) to make transparent the decisions that the
have become ubiquitous in assisting our
                                                                                                              AI and ML algorithms make. This is
decision making, e.g., suggesting films to
                                                                                                              particularly important for those who are
watch on Netflix [1], suggesting purchases
                                                                                                              receiving and those who are providing
online or people to ‘follow’ on social media.
                                                                                                              healthcare to understand what the system is
Similar technologies are also increasingly
                                                                                                              doing, for example to justify the clinical results
common in specialist areas such as healthcare,
                                                                                                              given, correct errors, improve medical
in particular clinical support tools [23], used to
                                                                                                              algorithms or to highlight a new discovery
support clinician and/or patient decision-
                                                                                                              [1,7,15].
making about their condition and the risks and
                                                                                                                  In the domain of healthcare, Holzinger et al
benefits of potential treatments. However,
                                                                                                              [2] states that there is a growing need for AI
when it comes to more critical factors such as
                                                                                                              systems that are ‘trustworthy, transparent,
our health and wellbeing, many would argue
                                                                                                              interpretable and explainable’, and there is
that those who are receiving and those who are
                                                                                                              evidence to benefit the use of clinical AI
providing healthcare, should be made aware of
                                                                                                              systems, for instance predicting the risks of
the reasonings behind those decisions

Joint Proceedings of the ACM IUI 2021 Workshops, April 13-17,
2021, College Station, USA
EMAIL: rachel@racheleardley.net (A. 1); e.soubutts@
bristol.ac.uk (A. 2); amid.ayobi@bristol.ac.uk (A. 3);
r.gooberman-hill@bristol.ac.uk (A. 4); a.okane@bristol.ac.uk (A.
5)
                               Copyright © 2021 for this paper by its authors. Use permitted under Creative
                               Commons License Attribution 4.0 International (CC BY 4.0)
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g
                               CEUR Workshop Proceedings (CEUR-WS.org)
hospital readmission for pneumonia patients or       manner that was understandable to our
spotting bone fractures [6,20]. However, there       participants and that they could see the
is also an opportunity for AI to contribute to       SPHERE platforms benefits, they were more
healthcare outside clinical settings, for instance   focused on the purpose of the technology,
supporting individuals with chronic illnesses        questioning why and how the platform could
who manage their own conditions at home, a           help them as individuals with T2D.
more common trend with today’s increasing
healthcare costs [4]. Ballegaard et al [2] argues       The seven devices
that healthcare is not just about keeping
                                                        a                             b
individuals healthy but allowing them to
continue to live a sustainable and independent
lives. With this in mind, we look to ML/AI
platforms such as SPHERE (sensor platform for
healthcare in a residential environment) which
                                                            Water sensor           Appliance sensor
uses ML to algorithmically interpret data based
on the individual’s patterns of living at home
[22]. How though, do we gain sufficiently
                                                                                     d
informed consent from the public install such           c
complex ML platforms within their homes?
    In the medical field, there is a legal and
ethical requirement for the patient and clinician
                                                                                     Environmental
to go through a process of ‘informed consent’
                                                                                        Sensor
[8,13,17], where the patient presented with the             Electricity (mains)
benefits, risks and any alternatives to their
treatment makes a decision [3,8]. For ML
platforms, there is also an ethical process that                  e
includes explaining the benefits, risks,
limitations and the data used for potential
translation of the ML algorithms [1,14]. To                              Silhouette sensor
make an ‘informed decision' around the
adoption of a complex platform, an individual           f
needs to have enough knowledge to think
                                                                                      g
critically about the processes that the platform
implements or supports [11,12]. As with
informed consent in medical care, for an                     Wearable
individual to make an informed decision around                                        SPHERE home
the adoption of a complex platform, a process                                           gateway
needs to occur that supports the explanation of         The ten sensors
both the platform's risks and benefits. When
and how does this informed decision process
occur for home health technology?
    To understand how we should explain
complex ML/AI platforms to members of the               Vibrations Electricity Silhouette Movement
general public, we conducted a case study that
focused on the SPHERE platform and members
of the general public with Type 2 Diabetes
(T2D), where most of the care takes place
                                                        Light levels Air pressure Motion     Humidity
outside clinical settings [19]. Using a user-
centered-design methodology in creating an
explanation document to aid informed consent,
we gained insight into users’ interpretation of
the ‘informed decision’ process of adopting the               Temperature         Appliance usage
complex platform within their homes. What we
found is that even though the document               Figure 1: Hardware and networks – the
explained the complex ML/AI platform in a            hardware devices of the platform and sensors
2. Defining the Explanation                        author into 35 further themes that were then
                                                   broken down into three overarching themes.
                                                   These overarching themes were (1) Hardware
    Using a user-centered-design methodology
                                                   and Network; (2) Installation, Training and
to define the explanation of the SPHERE
                                                   Data gathering; (3) Machine learning and Data
platform, we first completed semi-structured
                                                   visualization. We then transferred these themes
interviews with eight members of the SPHERE
                                                   into a Microsoft Word document. At that stage,
team who had built and maintained the system.
                                                   the first author merged any duplicated content.
After this, we ran a second study which
                                                   We then asked the eight core team members
presented alternative designs about the
                                                   who took part in the interviews to review the
platform’s hardware (figure 2a-c), the ground
                                                   document to confirm the draft document was
truthing of the data (figure 2d-f) and the ML
                                                   technically correct.
process unsupervised learning (figure 2g-i) to
                                                      These three overarching themes helped us
nine people with Type 2 diabetes and members
                                                   define the platform, for example, capturing
of their households who might also have to live
                                                   seven sensor devices (Figure 1a-g) and ten
with this domestic health technology. From the
                                                   individual sensors (Figure 1) with technical and
findings of these two studies, we created an
                                                   positioning limitations. We also captured the
explanation document (figure 4) that presents
                                                   installation process where the deployment
and explains the SPHERE platform to the
                                                   technicians will visit a participant’s home four
general public who had T2D. Finally, we ran a
                                                   times (survey, installation, maintenance and
validation study that reviewed how the
                                                   removal) and that the data collected is saved on
explanation document was used in an
                                                   a hard disk within the participants home and
onboarding/set-up session with technicians and
                                                   with their permission and processed through
how the SPHERE system and the document
                                                   supervised and unsupervised machine learning.
was interpreted and understood.

2.1.    Understanding the platform                 2.2. Understanding                            the
                                                   interpretations
    Our first challenge was to understand what
SPHERE was capable of, its processes,                  Once we had gained an understanding of the
hardware and ML/AI requirements. With this         complex platform, our next challenge was to
aim in mind, we conducted semi-structured          define how to present the information to our
interviews with eight out of eleven of the team    participants. For this study we focused on one
members. The team members had been working         area of each of the overarching themes: For
on the project from two to six years and had       Hardware & network we selected the most
mixed roles within SPHERE (2 x Deployment          technically complex sensor, the ‘environmental
technicians, 3 x ML experts, 1 x Hardware          sensor’ (figure 2a-c), for Installation, training &
engineer, 1 x Researcher and 1 x Community         data collection, we selected ‘ground truth’
liaison).                                          (figure 2d-f) as this process informs the ML
    By interviewing these team members with a      algorithms. For Machine learning & data
diverse range of roles within SPHERE, we were      visualization, we selected 'Unsupervised
able to gain an overview of all aspects of the     learning' (figure 2g-i) as this is the more
complex platform. We conducted the                 speculative form of ML. Through a design
interviews individually within a university-       workshop with six participants (three university
based meeting room, audio-recorded and then        researchers and three members of a community
transcribed     verbatim.      Using    affinity   engagement charity), we focused on the
diagramming and a bottom-up approach we            ‘environmental sensor’ (figure 2a-c) and
created a total of 681 post-it notes (Machine      created three alternative designs that presented
Learning x 245, Research x 63, Community           the platforms information at different technical
Engagement x 68, Hardware x 100 and                levels, detail, approaches to language and
Deployment Technician x 205). Once the five        visual elements. We then, used these design
job roles (deployment technicians, machine         decisions to create three alternative designs for
learning, research, hardware and community         the further two areas of the platform, ‘ground
liaison) had been initially coded into themes,     truth’ (figure 2d-f) and ‘unsupervised learning’
the post-it notes were organized by the first      (figure 2g-i).
Figure 2: The three alternative designs for the        For the environmental sensor (figure 2a-c),
three areas of the SPHERE platform                 the participants requested that the image of the
                                                   sensor be the version from figure 2c, with the
    We presented these nine designed               sensor measurements as in figure 2a in both
documents (figure 2) to nine participants who      centimeters and inches. They requested an
either had T2D or lived with someone who did.      understanding of where the position of the
The nine participants (five female, four male)     sensors within the home, however, they did not
were aged between 25 to 74, with a varying         like the list in figure 2a or the storyboard in
education level ranging from that of entry-level   figure 2b as they provided unnecessary
to PhD. Six participants had T2D, and three        information (the deployment technician would
participants lived with someone who did. All       fit the sensor). They preferred the more
participants owned a smartphone, four              structural visual approach to the rules of the
participants had an IoT device such as Amazon      sensor placement as in figure 2b and requested
Alexa or Google Home. Two participants (AD2        more of a description of what each sensor did.
and AD6) had weather stations at home and due          With the ‘ground truth’ (figure 2d-f) the
to this had prior knowledge of sensors and their   participants considered the simpler version
capabilities. The Environmental Sensors were       (figure 2f) to be just enough information and
presented first with the alternative designs       were positive with the storyboard flow. The
alternated (using the Latin square method), then   other two alternatives (figure 2d and 2e) were
the Ground Truth and finally Unsupervised          both thought of as too much information and
Learning.                                          not relevant to the participants as the
                                                   deployment technician would complete the
                                                   process.
2.2.1. Overview of findings                            Finally, for ‘unsupervised learning’ the
                                                   participants were confused by the charts and
    For all three areas (environmental sensor,     graphs considering figure 2i as the better
ground truth and unsupervised learning), the       description with a few changes. These changes
participants considered the alternative design     included the change of an icon so that it fits the
with the most technical information and detail     descriptive text better and combining the whole
to be far too complex, scary or off putting. The   of figure 2i with the righthand side of figure h,
participants additionally preferred the language   here showing the participant how the
as used in the simpler design alternatives as it   ‘unsupervised machine learning’ works and
used common language an non-technical              showing the results in an understandable chart.
words.
                                                     participants we merged figure 2h and 2i to
                                                     highlight the process of collecting and
                                                     presenting that data. From these final designs,
                                                     we updated the visual design style and created
                                                     a number of templates that we used for all
                                                     similar items (e.g. the SPHERE sensors).


Figure 3: The updated designs showing the
platforms content specified by participants
from the second study (a) environmental
sensor, (b) ground truth and (c) unsupervised
learning

2.2.2. Final designs as specified by
the participants

    Using this feedback, we then updated the
page designs (figure 3) to match the participants
preferences. For the environmental sensor
(figure 3a), we created an illustration to present
the sensor placement location and added
information about the sensor’s limitations as
                                                     Figure 4: The explanation document used for
suggested by Cai et al [5]. The ‘ground truth’
                                                     validation
we merged the content that was over two pages
in figure 2f to just one page in figure 3c. For
‘unsupervised learning’, as requested by the
2.3. Validating the explanation                      5. References
and interpretation
                                                     [1] Amina Adadi and Mohammed Berrada.
    Our next challenge was to validate this              2018. Peeking Inside the Black-Box: A
explanation document (figure 4) to understand            Survey      on     Explainable    Artificial
if we had created a translation of the SPHERE            Intelligence (XAI). IEEE Access 6:
platform that potential participants would feel          52138–52160.
they could use to make an ‘informed decision’.           https://doi.org/10.1109/ACCESS.2018.28
Overall, the participants liked the document, all        70052
understanding at a high-level the data collected     [2] Stinne Aaløkke Ballegaard, Thomas
and how that data would be used to identify              Riisgaard Hansen, and Morten Kyng.
their daily activity. The participants did ask for       2008. Healthcare in Everyday Life -
a number of updates (e.g. page order, image              Designing Healthcare Services for Daily
updates and a reduction of pages within the              Life. 1807–1816.
document) and even though they understood            [3] M Brezis, … S Israel - … Journal for
the platform (at a high-level) they wanted to            Quality in, and undefined 2008. Quality of
understand why SPHERE was useful to them as              informed consent for invasive procedures.
individuals with T2D.                                    academic.oup.com. Retrieved December
                                                         15,                2020               from
                                                         https://academic.oup.com/intqhc/article-
3. Next steps                                            abstract/20/5/352/1794518
                                                     [4] Alison Burrows and Ian Craddock. 2014.
    Our next steps are to investigate how we can         SPHERE: Meaningful and Inclusive
incorporate the findings from the validation             Sensor-Based Home Healthcare.
study so that we reduce the number of pages          [5] Carrie J. Cai, Samantha Winter, David
and not just explain the technical aspect of the         Steiner, Lauren Wilcox, and Michael
SPHERE platform but also understand how to               Terry. 2019. “Hello Ai”: Uncovering the
explain why this platform would be beneficial            onboarding needs of medical practitioners
to the participants without influencing their            for human–AI collaborative decision-
decision in consenting to have the platform              making. Proceedings of the ACM on
within their home. Additionally, we wish to              Human-Computer Interaction 3, CSCW.
investigate the best medium to presenting this           https://doi.org/10.1145/3359206
content (Paper or video) and understand how          [6] Rich Caruana, Yin Lou, Johannes Gehrke,
this explanation document can work within the            Paul Koch, Marc Sturm, and Noemie
first steps of creating a process for the self-          Elhadad. 2015. Intelligible Models for
installation of the SPHERE platform.                     HealthCare. Proceedings of the 21th ACM
                                                         SIGKDD International Conference on
4. Acknowledgements                                      Knowledge Discovery and Data Mining -
                                                         KDD              ’15:          1721–1730.
                                                         https://doi.org/10.1145/2783258.2788613
   We would like to thank Sue Mackinnon,
                                                     [7] Liya Ding. 2018. Human Knowledge in
Jess Linington, Zoe Banks Gross and Fiona
                                                         Constructing AI Systems — Neural Logic
Dowling from Knowle West Media Centre for
                                                         Networks       Approach      towards     an
their support on this project. We would
                                                         Explainable AI. Procedia Computer
additionally like to thank the SPHERE team
                                                         Science           126:         1561–1570.
members who engaged in this project and for
                                                         https://doi.org/10.1016/j.procs.2018.08.12
taking their time to explain their work to us.
                                                         9
   This work was completed through
                                                     [8] Johanna Glaser, Sarah Nouri, Alicia
the SPHERE Next Steps Project funded by the
                                                         Fernandez, Rebecca L. Sudore, Dean
UK Engineering and Physical Sciences
                                                         Schillinger, Michele Klein-Fedyshin, and
Research       Council      (EPSRC),       Grant
                                                         Yael Schenker. 2020. Interventions to
EP/R005273/1.
                                                         Improve Patient Comprehension in
                                                         Informed Consent for Medical and
                                                         Surgical Procedures: An Updated
     Systematic Review. Medical Decision                  Improve Patient Comprehension in
     Making              40,            119–143.          Informed Consent for Medical and
     https://doi.org/10.1177/0272989X198963               Surgical Procedures: A Systematic
     48                                                   Review. journals.sagepub.com 31, 1: 151–
[9] Andreas Holzinger, Chris Biemann,                     173.
     Constantinos S. Pattichis, and Douglas B.            https://doi.org/10.1177/0272989X103642
     Kell. 2017. What do we need to build                 47
     explainable AI systems for the medical          [18] Bastian Seegebarth, Felix Müller, Bernd
     domain?                Ml:              1–28.        Schattenberg, and Susanne Biundo. 2012.
     https://doi.org/10.3109/14015439.2012.66             Making Hybrid Plans More Clear to
     0499                                                 Human Users - A Formal Approach for
[10] Alexandra Kirsch. 2018. Explain to                   Generating        Sound        Explanations.
     whom? Putting the user in the center of              International Conference on Automated
     explainable AI. CEUR Workshop                        Planning and Scheduling: 225–233.
     Proceedings                             2071.        Retrieved                              from
     https://doi.org/10.1016/j.juro.2013.04.049           https://www.aaai.org/ocs/index.php/ICAP
[11] Emily R Lai. 2011. Critical Thinking: A              S/ICAPS12/paper/viewPaper/4691
     Literature Review Research Report.              [19] Diabetes      UK.     2020.     No    Title.
     Retrieved December 15, 2020 from                     https://www.diabetes.org.uk/type-2-
     http://www.pearsonassessments.com/rese               diabetes].
     arch.                                           [20] Rebecca Voelker. 2018. Diagnosing
[12] Susan Lechelt, Yvonne Rogers, and                    Fractures With AI. JAMA 320, 1: 23.
     Nicolai Marquardt. 2020. Coming to your              https://doi.org/10.1001/jama.2018.8565
     senses: Promoting critical thinking about       [21] Jichen Zhu, Antonios Liapis, Sebastian
     sensors through playful interaction in               Risi, Rafael Bidarra, and G. Michael
     classrooms. Proceedings of the Interaction           Youngblood. 2018. Explainable AI for
     Design and Children Conference, IDC                  Designers:        A        Human-Centered
     2020:                                 11–22.         Perspective on Mixed-Initiative Co-
     https://doi.org/10.1145/3392063.3394401              Creation.      IEEE       Conference      on
[13] Roger G. Lemaire. 2006. Informed                     Computatonal Intelligence and Games,
     consent - A contemporary myth? Journal               CIG                            2018-Augus.
     of Bone and Joint Surgery - Series B 88, 1:          https://doi.org/10.1109/CIG.2018.849043
     2–7.          https://doi.org/10.1302/0301-          3
     620X.88B1.16435                                 [22] Ni Zhu, Tom Diethe, Massimo Camplani,
[14] Tim Miller, Piers Howe, and Liz                      Lili Tao, Alison Burrows, Niall Twomey,
     Sonenberg. 2017. Explainable AI: Beware              Dritan Kaleshi, Majid Mirmehdi, Peter
     of Inmates Running the Asylum. IJCAI                 Flach, and Ian Craddock. 2015. Bridging
     International Joint Conference on                    e-Health and the Internet of Things: The
     Artificial                      Intelligence.        SPHERE Project. IEEE Intelligent
     https://doi.org/10.1016/j.jsams.2012.02.0            Systems          30,        4:       39–46.
     03                                                   https://doi.org/10.1109/MIS.2015.57
[15] Alun Preece, Dan Harborne, Dave                 [23] How Machine Learning is Transforming
     Braines, Richard Tomsett, and Supriyo                Clinical     Decision     Support    Tools.
     Chakraborty. 2018. Stakeholders in                   Retrieved December 14, 2020 from
     Explainable AI.                                      https://healthitanalytics.com/features/how
[16] Marco Tulio Ribeiro, Sameer Singh, and               -machine-learning-is-transforming-
     Carlos Guestrin. 2016. “Why should i trust           clinical-decision-support-tools
     you?” Explaining the predictions of any
     classifier. In Proceedings of the ACM
     SIGKDD International Conference on
     Knowledge Discovery and Data Mining,
     1135–1144.
     https://doi.org/10.1145/2939672.2939778
[17] Yael Schenker, Alicia Fernandez, and
     Rebecca Sudore. 2011. Interventions to