Explaining complex machine learning platforms to members of the general public Rachel Eardley,a, Ewan Soubutts,a , Amid Ayobi,a, Rachael Gooberman-Hill,a and Aisling O'Kane,a a University of Bristol, Beacon House, Queens Road, Bristol, U.K. Abstract In this workshop paper we present an overview of our research into understanding how to explain complex machine learning (ML) health platforms to members of the general public who might benefit from them, specifically those who have Type 2 Diabetes (T2D). The availability of home health sensor technology is increasing; however, it is unclear how to explain these platforms to potential users so that they can make an ‘informed decision’ on the adoption of that platform within their home. Through a user-centered-design approach, we have completed a case study with three studies that have (1) Given an overview of a complex ML platform, that of SPHERE; (2) Identified how the participants would like us to explain this content and (3) Created and validated an explanation document that presents, at a high-level the SPHERE platform. We present our findings on the priority of understanding how and why the platform can help them over the technical detail of the platform itself. Keywords 1 Explanations, Machine Learning, Digital Health, Informed decision, Home health, Complex platforms, Design. 1. INTRODUCTION [1,7,9,15]. In order to bridge the lack of understanding, we look to Explainable AI (XAI), an area of study that challenges different In many parts of our daily lives, Artificial disciplines (‘developers’, ‘theorists’, ‘ethicists’ Intelligence (AI) and Machine Learning (ML) etc.) to make transparent the decisions that the have become ubiquitous in assisting our AI and ML algorithms make. This is decision making, e.g., suggesting films to particularly important for those who are watch on Netflix [1], suggesting purchases receiving and those who are providing online or people to ‘follow’ on social media. healthcare to understand what the system is Similar technologies are also increasingly doing, for example to justify the clinical results common in specialist areas such as healthcare, given, correct errors, improve medical in particular clinical support tools [23], used to algorithms or to highlight a new discovery support clinician and/or patient decision- [1,7,15]. making about their condition and the risks and In the domain of healthcare, Holzinger et al benefits of potential treatments. However, [2] states that there is a growing need for AI when it comes to more critical factors such as systems that are ‘trustworthy, transparent, our health and wellbeing, many would argue interpretable and explainable’, and there is that those who are receiving and those who are evidence to benefit the use of clinical AI providing healthcare, should be made aware of systems, for instance predicting the risks of the reasonings behind those decisions Joint Proceedings of the ACM IUI 2021 Workshops, April 13-17, 2021, College Station, USA EMAIL: rachel@racheleardley.net (A. 1); e.soubutts@ bristol.ac.uk (A. 2); amid.ayobi@bristol.ac.uk (A. 3); r.gooberman-hill@bristol.ac.uk (A. 4); a.okane@bristol.ac.uk (A. 5) Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) hospital readmission for pneumonia patients or manner that was understandable to our spotting bone fractures [6,20]. However, there participants and that they could see the is also an opportunity for AI to contribute to SPHERE platforms benefits, they were more healthcare outside clinical settings, for instance focused on the purpose of the technology, supporting individuals with chronic illnesses questioning why and how the platform could who manage their own conditions at home, a help them as individuals with T2D. more common trend with today’s increasing healthcare costs [4]. Ballegaard et al [2] argues The seven devices that healthcare is not just about keeping a b individuals healthy but allowing them to continue to live a sustainable and independent lives. With this in mind, we look to ML/AI platforms such as SPHERE (sensor platform for healthcare in a residential environment) which Water sensor Appliance sensor uses ML to algorithmically interpret data based on the individual’s patterns of living at home [22]. How though, do we gain sufficiently d informed consent from the public install such c complex ML platforms within their homes? In the medical field, there is a legal and ethical requirement for the patient and clinician Environmental to go through a process of ‘informed consent’ Sensor [8,13,17], where the patient presented with the Electricity (mains) benefits, risks and any alternatives to their treatment makes a decision [3,8]. For ML platforms, there is also an ethical process that e includes explaining the benefits, risks, limitations and the data used for potential translation of the ML algorithms [1,14]. To Silhouette sensor make an ‘informed decision' around the adoption of a complex platform, an individual f needs to have enough knowledge to think g critically about the processes that the platform implements or supports [11,12]. As with informed consent in medical care, for an Wearable individual to make an informed decision around SPHERE home the adoption of a complex platform, a process gateway needs to occur that supports the explanation of The ten sensors both the platform's risks and benefits. When and how does this informed decision process occur for home health technology? To understand how we should explain complex ML/AI platforms to members of the Vibrations Electricity Silhouette Movement general public, we conducted a case study that focused on the SPHERE platform and members of the general public with Type 2 Diabetes (T2D), where most of the care takes place Light levels Air pressure Motion Humidity outside clinical settings [19]. Using a user- centered-design methodology in creating an explanation document to aid informed consent, we gained insight into users’ interpretation of the ‘informed decision’ process of adopting the Temperature Appliance usage complex platform within their homes. What we found is that even though the document Figure 1: Hardware and networks – the explained the complex ML/AI platform in a hardware devices of the platform and sensors 2. Defining the Explanation author into 35 further themes that were then broken down into three overarching themes. These overarching themes were (1) Hardware Using a user-centered-design methodology and Network; (2) Installation, Training and to define the explanation of the SPHERE Data gathering; (3) Machine learning and Data platform, we first completed semi-structured visualization. We then transferred these themes interviews with eight members of the SPHERE into a Microsoft Word document. At that stage, team who had built and maintained the system. the first author merged any duplicated content. After this, we ran a second study which We then asked the eight core team members presented alternative designs about the who took part in the interviews to review the platform’s hardware (figure 2a-c), the ground document to confirm the draft document was truthing of the data (figure 2d-f) and the ML technically correct. process unsupervised learning (figure 2g-i) to These three overarching themes helped us nine people with Type 2 diabetes and members define the platform, for example, capturing of their households who might also have to live seven sensor devices (Figure 1a-g) and ten with this domestic health technology. From the individual sensors (Figure 1) with technical and findings of these two studies, we created an positioning limitations. We also captured the explanation document (figure 4) that presents installation process where the deployment and explains the SPHERE platform to the technicians will visit a participant’s home four general public who had T2D. Finally, we ran a times (survey, installation, maintenance and validation study that reviewed how the removal) and that the data collected is saved on explanation document was used in an a hard disk within the participants home and onboarding/set-up session with technicians and with their permission and processed through how the SPHERE system and the document supervised and unsupervised machine learning. was interpreted and understood. 2.1. Understanding the platform 2.2. Understanding the interpretations Our first challenge was to understand what SPHERE was capable of, its processes, Once we had gained an understanding of the hardware and ML/AI requirements. With this complex platform, our next challenge was to aim in mind, we conducted semi-structured define how to present the information to our interviews with eight out of eleven of the team participants. For this study we focused on one members. The team members had been working area of each of the overarching themes: For on the project from two to six years and had Hardware & network we selected the most mixed roles within SPHERE (2 x Deployment technically complex sensor, the ‘environmental technicians, 3 x ML experts, 1 x Hardware sensor’ (figure 2a-c), for Installation, training & engineer, 1 x Researcher and 1 x Community data collection, we selected ‘ground truth’ liaison). (figure 2d-f) as this process informs the ML By interviewing these team members with a algorithms. For Machine learning & data diverse range of roles within SPHERE, we were visualization, we selected 'Unsupervised able to gain an overview of all aspects of the learning' (figure 2g-i) as this is the more complex platform. We conducted the speculative form of ML. Through a design interviews individually within a university- workshop with six participants (three university based meeting room, audio-recorded and then researchers and three members of a community transcribed verbatim. Using affinity engagement charity), we focused on the diagramming and a bottom-up approach we ‘environmental sensor’ (figure 2a-c) and created a total of 681 post-it notes (Machine created three alternative designs that presented Learning x 245, Research x 63, Community the platforms information at different technical Engagement x 68, Hardware x 100 and levels, detail, approaches to language and Deployment Technician x 205). Once the five visual elements. We then, used these design job roles (deployment technicians, machine decisions to create three alternative designs for learning, research, hardware and community the further two areas of the platform, ‘ground liaison) had been initially coded into themes, truth’ (figure 2d-f) and ‘unsupervised learning’ the post-it notes were organized by the first (figure 2g-i). Figure 2: The three alternative designs for the For the environmental sensor (figure 2a-c), three areas of the SPHERE platform the participants requested that the image of the sensor be the version from figure 2c, with the We presented these nine designed sensor measurements as in figure 2a in both documents (figure 2) to nine participants who centimeters and inches. They requested an either had T2D or lived with someone who did. understanding of where the position of the The nine participants (five female, four male) sensors within the home, however, they did not were aged between 25 to 74, with a varying like the list in figure 2a or the storyboard in education level ranging from that of entry-level figure 2b as they provided unnecessary to PhD. Six participants had T2D, and three information (the deployment technician would participants lived with someone who did. All fit the sensor). They preferred the more participants owned a smartphone, four structural visual approach to the rules of the participants had an IoT device such as Amazon sensor placement as in figure 2b and requested Alexa or Google Home. Two participants (AD2 more of a description of what each sensor did. and AD6) had weather stations at home and due With the ‘ground truth’ (figure 2d-f) the to this had prior knowledge of sensors and their participants considered the simpler version capabilities. The Environmental Sensors were (figure 2f) to be just enough information and presented first with the alternative designs were positive with the storyboard flow. The alternated (using the Latin square method), then other two alternatives (figure 2d and 2e) were the Ground Truth and finally Unsupervised both thought of as too much information and Learning. not relevant to the participants as the deployment technician would complete the process. 2.2.1. Overview of findings Finally, for ‘unsupervised learning’ the participants were confused by the charts and For all three areas (environmental sensor, graphs considering figure 2i as the better ground truth and unsupervised learning), the description with a few changes. These changes participants considered the alternative design included the change of an icon so that it fits the with the most technical information and detail descriptive text better and combining the whole to be far too complex, scary or off putting. The of figure 2i with the righthand side of figure h, participants additionally preferred the language here showing the participant how the as used in the simpler design alternatives as it ‘unsupervised machine learning’ works and used common language an non-technical showing the results in an understandable chart. words. participants we merged figure 2h and 2i to highlight the process of collecting and presenting that data. From these final designs, we updated the visual design style and created a number of templates that we used for all similar items (e.g. the SPHERE sensors). Figure 3: The updated designs showing the platforms content specified by participants from the second study (a) environmental sensor, (b) ground truth and (c) unsupervised learning 2.2.2. Final designs as specified by the participants Using this feedback, we then updated the page designs (figure 3) to match the participants preferences. For the environmental sensor (figure 3a), we created an illustration to present the sensor placement location and added information about the sensor’s limitations as Figure 4: The explanation document used for suggested by Cai et al [5]. The ‘ground truth’ validation we merged the content that was over two pages in figure 2f to just one page in figure 3c. For ‘unsupervised learning’, as requested by the 2.3. Validating the explanation 5. References and interpretation [1] Amina Adadi and Mohammed Berrada. Our next challenge was to validate this 2018. Peeking Inside the Black-Box: A explanation document (figure 4) to understand Survey on Explainable Artificial if we had created a translation of the SPHERE Intelligence (XAI). IEEE Access 6: platform that potential participants would feel 52138–52160. they could use to make an ‘informed decision’. https://doi.org/10.1109/ACCESS.2018.28 Overall, the participants liked the document, all 70052 understanding at a high-level the data collected [2] Stinne Aaløkke Ballegaard, Thomas and how that data would be used to identify Riisgaard Hansen, and Morten Kyng. their daily activity. The participants did ask for 2008. Healthcare in Everyday Life - a number of updates (e.g. page order, image Designing Healthcare Services for Daily updates and a reduction of pages within the Life. 1807–1816. document) and even though they understood [3] M Brezis, … S Israel - … Journal for the platform (at a high-level) they wanted to Quality in, and undefined 2008. Quality of understand why SPHERE was useful to them as informed consent for invasive procedures. individuals with T2D. academic.oup.com. Retrieved December 15, 2020 from https://academic.oup.com/intqhc/article- 3. Next steps abstract/20/5/352/1794518 [4] Alison Burrows and Ian Craddock. 2014. Our next steps are to investigate how we can SPHERE: Meaningful and Inclusive incorporate the findings from the validation Sensor-Based Home Healthcare. study so that we reduce the number of pages [5] Carrie J. Cai, Samantha Winter, David and not just explain the technical aspect of the Steiner, Lauren Wilcox, and Michael SPHERE platform but also understand how to Terry. 2019. “Hello Ai”: Uncovering the explain why this platform would be beneficial onboarding needs of medical practitioners to the participants without influencing their for human–AI collaborative decision- decision in consenting to have the platform making. Proceedings of the ACM on within their home. Additionally, we wish to Human-Computer Interaction 3, CSCW. investigate the best medium to presenting this https://doi.org/10.1145/3359206 content (Paper or video) and understand how [6] Rich Caruana, Yin Lou, Johannes Gehrke, this explanation document can work within the Paul Koch, Marc Sturm, and Noemie first steps of creating a process for the self- Elhadad. 2015. Intelligible Models for installation of the SPHERE platform. HealthCare. Proceedings of the 21th ACM SIGKDD International Conference on 4. Acknowledgements Knowledge Discovery and Data Mining - KDD ’15: 1721–1730. https://doi.org/10.1145/2783258.2788613 We would like to thank Sue Mackinnon, [7] Liya Ding. 2018. Human Knowledge in Jess Linington, Zoe Banks Gross and Fiona Constructing AI Systems — Neural Logic Dowling from Knowle West Media Centre for Networks Approach towards an their support on this project. We would Explainable AI. Procedia Computer additionally like to thank the SPHERE team Science 126: 1561–1570. members who engaged in this project and for https://doi.org/10.1016/j.procs.2018.08.12 taking their time to explain their work to us. 9 This work was completed through [8] Johanna Glaser, Sarah Nouri, Alicia the SPHERE Next Steps Project funded by the Fernandez, Rebecca L. Sudore, Dean UK Engineering and Physical Sciences Schillinger, Michele Klein-Fedyshin, and Research Council (EPSRC), Grant Yael Schenker. 2020. Interventions to EP/R005273/1. Improve Patient Comprehension in Informed Consent for Medical and Surgical Procedures: An Updated Systematic Review. Medical Decision Improve Patient Comprehension in Making 40, 119–143. Informed Consent for Medical and https://doi.org/10.1177/0272989X198963 Surgical Procedures: A Systematic 48 Review. journals.sagepub.com 31, 1: 151– [9] Andreas Holzinger, Chris Biemann, 173. Constantinos S. Pattichis, and Douglas B. https://doi.org/10.1177/0272989X103642 Kell. 2017. What do we need to build 47 explainable AI systems for the medical [18] Bastian Seegebarth, Felix Müller, Bernd domain? Ml: 1–28. Schattenberg, and Susanne Biundo. 2012. https://doi.org/10.3109/14015439.2012.66 Making Hybrid Plans More Clear to 0499 Human Users - A Formal Approach for [10] Alexandra Kirsch. 2018. Explain to Generating Sound Explanations. whom? Putting the user in the center of International Conference on Automated explainable AI. CEUR Workshop Planning and Scheduling: 225–233. Proceedings 2071. Retrieved from https://doi.org/10.1016/j.juro.2013.04.049 https://www.aaai.org/ocs/index.php/ICAP [11] Emily R Lai. 2011. Critical Thinking: A S/ICAPS12/paper/viewPaper/4691 Literature Review Research Report. [19] Diabetes UK. 2020. No Title. Retrieved December 15, 2020 from https://www.diabetes.org.uk/type-2- http://www.pearsonassessments.com/rese diabetes]. arch. [20] Rebecca Voelker. 2018. Diagnosing [12] Susan Lechelt, Yvonne Rogers, and Fractures With AI. JAMA 320, 1: 23. Nicolai Marquardt. 2020. Coming to your https://doi.org/10.1001/jama.2018.8565 senses: Promoting critical thinking about [21] Jichen Zhu, Antonios Liapis, Sebastian sensors through playful interaction in Risi, Rafael Bidarra, and G. Michael classrooms. Proceedings of the Interaction Youngblood. 2018. Explainable AI for Design and Children Conference, IDC Designers: A Human-Centered 2020: 11–22. Perspective on Mixed-Initiative Co- https://doi.org/10.1145/3392063.3394401 Creation. IEEE Conference on [13] Roger G. Lemaire. 2006. Informed Computatonal Intelligence and Games, consent - A contemporary myth? Journal CIG 2018-Augus. of Bone and Joint Surgery - Series B 88, 1: https://doi.org/10.1109/CIG.2018.849043 2–7. https://doi.org/10.1302/0301- 3 620X.88B1.16435 [22] Ni Zhu, Tom Diethe, Massimo Camplani, [14] Tim Miller, Piers Howe, and Liz Lili Tao, Alison Burrows, Niall Twomey, Sonenberg. 2017. Explainable AI: Beware Dritan Kaleshi, Majid Mirmehdi, Peter of Inmates Running the Asylum. IJCAI Flach, and Ian Craddock. 2015. Bridging International Joint Conference on e-Health and the Internet of Things: The Artificial Intelligence. SPHERE Project. IEEE Intelligent https://doi.org/10.1016/j.jsams.2012.02.0 Systems 30, 4: 39–46. 03 https://doi.org/10.1109/MIS.2015.57 [15] Alun Preece, Dan Harborne, Dave [23] How Machine Learning is Transforming Braines, Richard Tomsett, and Supriyo Clinical Decision Support Tools. Chakraborty. 2018. Stakeholders in Retrieved December 14, 2020 from Explainable AI. https://healthitanalytics.com/features/how [16] Marco Tulio Ribeiro, Sameer Singh, and -machine-learning-is-transforming- Carlos Guestrin. 2016. “Why should i trust clinical-decision-support-tools you?” Explaining the predictions of any classifier. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. https://doi.org/10.1145/2939672.2939778 [17] Yael Schenker, Alicia Fernandez, and Rebecca Sudore. 2011. Interventions to