An Agile Framework for Trustworthy AI Stefan Leijnen1 and Huib Aldewereld1 and Rudy van Belkom2 and Roland Bijvank1 and Roelant Ossewaarde1 Abstract. requirements for AI systems to be considered trustworthy: (1) hu- The ethics guidelines put forward by the AI High Level Expert man agency and oversight, (2) technical robustness and safety, (3) Group (AI-HLEG) present a list of seven key requirements that privacy and data governance, (4) transparency, (5) diversity, non- Human-centered, trustworthy AI systems should meet. These guide- discrimination and fairness, (6) societal and environmental well- lines are useful for the evaluation of AI systems, but can be comple- being, and (7) accountability. A ’Trustworthy AI Assessment List’ mented by applied methods and tools for the development of trust- was developed in order to determine to what extent an application worthy AI systems in practice. In this position paper we propose a meets the requirements. The AI-HLEG guidelines can be considered framework for translating the AI-HLEG ethics guidelines into the a primary ethics directive for the development of trustworthy AI sys- specific context within which an AI system operates. This approach tems, due to the thought and expertise that went into creating it and aligns well with a set of Agile principles commonly employed in the support of the European Commission (EC) for a human-centered software engineering. approach to AI. As a step towards ensuring compliance with this directives, con- ceivably through future legislation, the EC issued a Whitepaper on 1 INTRODUCTION Artificial Intelligence [4] on February 19th 2020. In this whitepaper the EC sets out proposals for promoting the development of AI in Artificial intelligence has the potential to support the accomplish- Europe while ensuring fundamental human rights are respected. An ment of some of human society’s deepest problems [22]. With AI important part of this white paper is the proposal to create a prior con- systems, we are able to investigate options that we would normally formity assessment for high-risk AI applications, based on the Ethics consider naive, but which could unexpectedly lead to major break- guidelines of the AI-HLEG. This legal framework should address the throughs. Simultaneously, AI has the potential to disrupt societies risks for fundamental rights and safety. through its impact on existing economic and social structures. Risks involved in the deployment of this powerful technology include a re- duction of control of the digital systems, the introduction of biases 1.2 Ex Ante Evaluation vs Continuous Design based on gender or race, and a radical increase of societal inequality, or accordingly to some, the end of the human race [5]. These risks With the choice for a prior conformity assessment, the EC opts for an may cause this key technological development that can aid humanity, ex ante approach to aligning systems with the ethics guidelines, i.e. it to be nullified by fear and distrust. should be determined in advance whether AI applications are able to meet the guidelines. Particularly when exposing society to high-risk AI applications such as facial recognition or deep fake algorithms, 1.1 Ethics Guidelines for Trustworthy AI thoughtful risk assessment and caution action is required [4]. How- ever, in order for AI systems to conform ex ante with these guide- To exploit opportunities and prevent threats it is important to increase lines, methods and tools need to be developed that allow these guide- the trustworthiness of AI and monitor its development. Ethical guide- lines to be integrated during the development of the AI system. For lines are required for this. To this end, ethics codes and principles example, full transparency of a decision model that has been trained have been published [8] by governments (e.g. [21]), private sector using machine learning methods may not be feasible, but during the (e.g. [15]) and research institutes (e.g. [1]). Despite the clear agree- development cycle an understanding of what constitutes ’sufficient ment that AI should be ethical, there is debate about what constitutes transparency’ can emerge, given a functional, ethical and technical ’ethical AI’ and what ethical requirements and technical standards context. So in theory, it is possible to check off every key require- are needed to achieve it [12]. ment of the list by conforming with each requirement to some extent The Ethics Guidelines for Trustworthy AI were presented by the thereby passing the ethical evaluation. In practice, the context and AI-HLEG on April 8th 2019 [10]. The report builds on a draft that value of the key requirements become explicit in designing, develop- was published in December 2018, on which over 500 comments were ing, training, testing and using AI systems. made following an open consultation. The guidelines state seven key Moreover, although the seven key requirements are considered 1 Research group Intelligent Data Systems, HU University of Ap- to be equally important [10] trade-offs can arise when integrating plied Sciences Utrecht, The Netherlands. Corresponding author: ste- guidelines into practice. Beyond evaluating these trade-offs and doc- fan.leijnen@hu.nl. All authors contributed equally. umenting the considerations as suggested in the AI-HLEG guide- 2 STT Netherlands Study Centre for Technology Trends, The Hague, The Netherlands. lines, methods and tools are required to deal with these trade-offs Copyright c 2020 for this paper by its authors. Use permitted under Cre- during development. The term ’trade-off’ suggests a compromise, ative Commons License Attribution 4.0 International (CC BY 4.0). but a design choice does not necessarily need to constitute a zero-sum game where increasing the value of one element naturally decreases starts. High-level requirements are consequently translated into de- another. sign requirements, for instance through the use of value hierarchies Similarly to valuing ethical requirements individually, the balanc- [19]. These abstract design requirements are the starting point of the ing act between conflicting key requirements also becomes explicit in engineering development cycle, and are then typically translated into the context where they are applied. Consider for example the appar- product features. It should be noted that VSD focuses on high-level ent trade-off between privacy (key requirement 2) and safety (key re- value conflicts, and tries to determine the possible (technical) de- quirement 3): a customs officer using a smart scanner to inspect your sign space allowed by the combination of values of the stakeholders. travel bag at an airport will typically yield a different level of coop- Tensions between the value conceptions at later stages of the devel- eration than a baker using the same scanner to inspect your shopping opment should be evaluated according to VSD, but it is not clearly bag. While we may value safety over privacy at an airport, in a differ- described how this should be done, as [11] already noticed VSD is ent context we may feel the same act violates our privacy. Different lacking a clear ethical perspective. privacy standards may exist within contexts, as do the design op- Also, ethical considerations made in the abstract design can be portunities to prevent conflicts or trade-offs between values. Design (unintentionally) overturned in the design decisions made by the de- decisions would thereby be ideally made in the most specific context velopment team during the development sprints. Sufficient support where values can be maximized in relation to each other. for ethical design has to be provided to the design phase as well. AI technologies affect many quality attributes, such as robustness, This can be tackled in keeping sufficient focus on the ethical guide- performance and security. Additional complexities they introduce are lines during the development process. that they are hard to introspect, hard to evaluate for side effects and Moreover, current engineering practice focus on functional and that their inner working is often not well understood by end users. non-functional aspects of the system from a system point of view. Designers and developers, who have the best understanding of the Creating Human-centered AI systems requires keeping a clear fo- systems they build, are best positioned to assess how technological cus on the points of view concerning human actors. Apart from the choices lead to the desired system behavior. Unlike some other qual- already mentioned Value-sensitive design method, there are few to ity attributes, ethical considerations are hard to quantify, test or com- none development methodologies that focus on this form of ethi- pare to external standards. This leaves a great responsibility for de- cal engineering. In principle, Agile methodologies such as Scrum velopers and designers to specifically employ forward thinking about allow for ethical engineering processes, as they are flexible enough ethical implications and heed how their decisions fit the desired eth- to include any steps regarding value discovery, or weighing stake- ical properties. holder concerns. Although the discovery of non-functional require- Finally, the design context matters because trustworthiness is a hu- ments in Agile software processes has received attention from other man concern. Although technology can be labeled as trustworthy, it is researchers [7], it is still unclear how to prevent the neglect of the the context in which this technology is placed that reflects on whether ethical concerns of some stakeholders. the technology is trustworthy and for the benefit of the human actors concerned. 2.1 Agile Development We therefore argue for integrating the ethical key requirements into the process of developing AI systems, making a translation of Agile development focuses on short and iterative cycles to improve the general but abstract AI-HLEG ethics guidelines into specific but agility and flexibility in the development process [2]. Scrum [18] is custom product requirements that shape or constrain the design (eth- the dominant Agile methodology, especially in the segments served ical drivers), taking into account concerns from direct and indirect by smaller development teams [3]. Development in Scrum is done in human actors, and advocating that ethical design should take place short cycles (Sprints) in which concrete parts of the design are imple- in an applied context. mented into a usable product, while keeping a backlog of work that remains to be done. The main driver of development within sprints are so-called User Stories. 2 AN ETHICS DESIGN APPROACH User Stories express desired system functionality from the There are different approaches to software design, ranging between perspective of a particular user, expressing a particular desire in a the Waterfall model where the design is determined up front, to Ag- given context. Two examples are shown in figure 1. A User Story is ile approaches where there is continuous planning, designing and typically formulated in the following template: learning during development [2]. As pragmatics dictate, some larger projects start with iterations involving more design work, gradually “As a ... [actor] I want ... [functionality] in order to ... [de- transitioning to iterations that involve more implementation work. sire] given ... [context].” Ethical drivers are special in two ways. First, when designing a product, ethical considerations are usually not directly represented in User Stories have several characteristics: the form of a stakeholder. Therefore, it will typically involve special • they allow large projects to be divided into smaller parts that can focus to guard that the guidelines are given proper attention during be developed independently; design activities. Second, the ethical perspective (cf. [16]) typically • they are typically short and contain only those development steps affects the product as a whole, which implies that decisions based on that can be made in a short amount of time (i.e. days rather than ethical drivers are best made in the early stages of software design, weeks); when more global decisions are made. Both the extra design effort • they are especially useful for projects where requirements and de- and the global impact of ethical drivers create potential tension with sires change rapidly, or where these are misunderstood; and Agile development approaches. • they facilitate time estimation of tasks. Existing approaches to ethical development (e.g., Value-sensitive design (VSD) [9] or Design for Values [20]) focus primarily on The high-level design is created and formulated in Epics. An Epic the Waterfall model, where the design is known before engineering is a User Story that is too big to fit in a single development cycle 2 [17], and therefore has to be broken down (either at the start of the 1 High-level project, or in a later iteration of development) into smaller, more con- Epic crete User Stories. This breakdown is done through a process of User Story Mapping [14]. In later phases of development, User Stories can 2 AI-HLEG ETHICS GUIDELINES Epic be broken down further into even more contextualized features, if deemed necessary. Finally, at the start of each Sprint the priority of the User Stories is determined, for instance, by means of planning User Story poker [13]. Thereby the developers resolve the importance of each User Story, determining in which order they will be implemented in User Story the project. This priority determines which User Stories are imple- mented in the current Sprint, and which User Stories will have to 3 User Story Contextual wait for a future Sprint. User Story to Sprint backlog Epics User Story Malfunction Behaviour User Stories (by priority) Figure 2. At several steps during the development of AI systems, the Agile Release 1 design process should be informed by the HLEG-AI ethics guidelines: (1) creating Epics; (2) User Story mapping; (3) prioritization of User Stories. As a truck driver As an insurer I want my semi-autonomous I want semi-autonomous vehicles Release 2 Scrum also offers methods for trade-offs between guidelines to be vehicle to halt as soon as possible to optimize where to halt mapped. We propose to investigate the use of planning poker [13] as in order to provide safety given a in order to minimize financial loss malfunction given a malfunction a method to weigh, compare and prioritize user stories (as in step 3 of figure 2), rather than an estimate of the size of user story, the ethical impact could be considered to award points and open a discussion between developers. This approach also allows for technical knowl- edge to inform the ethical discussion, i.e. a developer may be aware Figure 1. Example layout of a backlog for the development of a of a recent technology developed capable of resolving two (or more) semi-autonomous truck, with high-level Epics and prioritized User Stories. conflicting values, or the reverse could happen, where the technolog- ical roadmap is driven by the need to resolve two conflicting values. As a next step, the Scrum approach to designing trustworthy AI At any stage during a project, the relation between the Epics and systems according to the ethics guidelines needs to be investigated the User Stories is kept visible on the Scrum board (see figure 1). further, empirically tested and adapted based on the results. In the The top row shows the Epics that have been identified for the cur- next section we show a preliminary example how the ethics guide- rent project; below each Epic are the related User Stories, indicating lines could be operationalized in Scrum. which User Stories are deemed necessary to create a releasable ver- sion of the project. 2.3 Example of Ethics Guidelines in Agile Design 2.2 An Agile Approach to Trustworthy AI One of the design concerns of AI systems is their behavior in excep- It is a key agile principle to postpone design decisions until as late tional circumstances, such as when component failures or adverse ex- as possible, allowing for just-in-time but well-informed decisions. ternal factors cause malfunctions. Consider the context of the semi- However, some overarching requirements, such as those sourced autonomous truck that is capable of assisting its human driver by from AI-HLEG guidelines, are better formulated once all major de- making decisions in emergency situations. In our scenario, the truck sign decisions are made. This requires careful consideration of where must make a decision about bringing the vehicle from a state of mov- and how to apply these requirements. ing to a full stop in the event of a system malfunction. The risk of an Figure 2 shows the translation of high-level Epics to contextu- adverse event regarding the safety of the driver is smaller for a halted alized User Stories in action while involving the AI-HLEG ethics truck than for a truck that is moving along with traffic. Therefore, the guidelines at each step in the process. default strategy of the AI may be to stop the truck as soon as possible User stories take a central place as a design artifact in Scrum. The in case of malfunction. However, it may be in the interest of some structure that is commonly used for user stories in Scrum closely stakeholders to override this default strategy. For example, an insurer aligns with the requirement for a specific and contextualized design may want the truck to stops at a point where the risk of economic element, in order to make the (comparative) value of the ethical re- loss of the truck is minimized, whereas it may be in the best interest quirements explicit; this is provided by the ’given ... [context]’ part of the driver to stop as soon as possible regardless of the hazard this of the User Story template. In user stories, the roles of the direct creates for other drivers of other vehicles or costs of repairing the stakeholders (e.g. users, subjects) and indirect stakeholders (e.g. so- truck. ciety, future generations) can be naturally represented in the ’As a As step 1 in figure 2 shows, the AI-HLEG guidelines can be used ... [actor]’ part of a User Story. The ethical requirement is naturally to evaluate the ethical impact for each of the stakeholders when spec- referred to, as all considerations that add value in user stories, in the ifying the Epics. This may yield new insights of conflicting values goal-part of the user story, described by the ’in order to ... [desire]’- and requirements, or new stakeholders (such as insurers or other phrases. drivers) that the design should take into account. During User Story 3 mapping, one User Story may put forward the driver’s perspective, informed by technological possibilities. and another may take the insurer’s perspective; the ethical consider- Scrum ceremonies (such as poker) transform AI-HLEG guidelines ations for each individual User Story can be aligned with the guide- in user stories. That process may be repeatable and reusable, so that lines as soon as it is created, cf. step 2. Conflicting interests emerge the same translation can be applied when the same ethical issues arise most clearly at this stage, prompting a resolution mechanism in order in different contexts. The structured process, when documented and to prioritize the items, cf. step 3. One such resolution mechanism is archived, is a resource that can be used to identify any emerging de- Scrum planning poker, where the impact of the key requirements is sign patterns. comparatively evaluated and discussed by the development team. In the emergent design approach that underlies agile software de- REFERENCES velopment, general requirements are stepwise translated into scenar- ios of increasing specificity in the development cycles. The default [1] AI Asilomar. Principles. future of life institute, 2018. strategy (stop as soon as possible) will be revisited and reconsidered [2] Kent Beck, Mike Beedle, Arie Van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew when User Stories are created from Epics. This leads to a continu- Hunt, Ron Jeffries, et al. The agile manifesto, 2001. ous need to reapply the AI-HLEG guidelines. Conflicting values will [3] A. Begel and N. Nagappan, ‘Usage and perceptions of agile software emerge as the set of user stories grows, exemplified by the sidebar development in an industrial context: An exploratory study’, in First In- position of the AI-HLEG ethics guidelines in figure 2. The iterative ternational Symposium on Empirical Software Engineering and Mea- surement (ESEM 2007), pp. 255–264, (Sep. 2007). application of guidelines stands in contrast with the typical large up- [4] European Commission, ‘Whitepaper on artificial intelligence - a euro- front designs of VSD and other waterfall strategies which are con- pean approach to excellence and trust’, B-1049 Brussels, (2020). sidered an anti-pattern for Scrum [6]. The arrows in Fig. 2 exemplify [5] Charles J Dunlap Jr, ‘Accountability and autonomous weapons: Much that the ethical guidelines play a role at different levels of design ab- ado about nothing’, Temp. Int’l & Comp. LJ, 30, 63, (2016). straction: at the highest level, when Epics are produced (arrow 1); [6] V. Eloranta, K. Koskimies, T. Mikkonen, and J. Vuorinen, ‘Scrum anti- patterns – an empirical study’, in 2013 20th Asia-Pacific Software En- during subsequent detailing, when User Stories are produced (arrow gineering Conference (APSEC), volume 1, pp. 503–510, (Dec 2013). 2); and at the lower levels, when User Stories are prioritized and the [7] Weam M Farid, ‘The normap methodology: Lightweight engineer- Sprint Backlog is organized (arrow 3). ing of non-functional requirements for agile processes’, in 2012 19th Asia-Pacific Software Engineering Conference, volume 1, pp. 322–325. IEEE, (2012). 3 DISCUSSION [8] Jessica Fjeld, Nele Achten, Hannah Hilligoss, Adam Nagy, and Mad- hulika Srikumar, ‘Principled artificial intelligence: Mapping consensus The analysis provided and methods proposed in this position paper in ethical and rights-based approaches to principles for ai’, Berkman Klein Center Research Publication, (2020-1), (2020). are part of ongoing applied research towards operationalizing ethical [9] Batya Friedman and David G Hendry, Value sensitive design: Shaping guidelines for AI into the practice of developing AI systems. Addi- technology with moral imagination, Mit Press, 2019. tional (empirical) research is planned in order to validate and extend [10] AI High-Level Expert Group, ‘Ethics guidelines for trustworthy AI’, the Agile framework for trustworthy AI presented here. B-1049 Brussels, (2019). [11] Naomi Jacobs and Alina Huldtgren, ‘Why value sensitive design needs We propose that describing requirements as user stories have the ethical commitments’, Ethics and Information Technology, 1–4, (2018). advantage of placing the human actor central in designing the how. [12] Anna Jobin, Marcello Ienca, and Effy Vayena, ‘Artificial intelli- This approach bridges the gap from the moral and regulatory func- gence: the global landscape of ethics guidelines’, arXiv preprint tions of ethics guidelines to the daily practice of implementing soft- arXiv:1906.11668, (2019). ware, by carrying over the key ethics requirements to the fine-grained [13] Viljan Mahnič and Tomaž Hovelja, ‘On using planning poker for es- timating user stories’, Journal of Systems and Software, 85(9), 2086– context where the what, why, and for whom can take on a meaning 2095, (2012). that is not evident at the abstract level of an AI system’s compre- [14] Jeff Patton and Peter Economy, User story mapping: discover the whole hensive design. The introduction of a separate Scrum ceremony to story, build the right product, O’Reilly Media, Inc., 2014. consider ethics may have the additional effect that teams may seek [15] Sundar Pichai, ‘Ai at google: our principles’, Google blog, (2018). [16] Nick Rozanski and Eóin Woods, Software Systems Architecture, Addi- the full breadth of the guidelines, just as planning poker stimulates son Wesley, Upper Saddle River, NJ, 2 edn., 2011. teams to investigate all influencing factors that influence planning. [17] Kenneth S Rubin, Essential Scrum: A practical guide to the most pop- Compliance with AI-HLEG recommendations requires a specific ular Agile process, Addison-Wesley, 2012. process to identify ethical drivers. In a waterfall process, abstract and [18] Ken Schwaber and Mike Beedle, Agile software development with system wide concerns are generally considered in the earlier abstract Scrum, volume 1, Prentice Hall Upper Saddle River, 2002. [19] Ibo Van de Poel, ‘Translating values into design requirements’, in Phi- designs. In agile processes, with their cyclical nature and incremen- losophy and engineering: Reflections on practice, principles and pro- tal development, development attention is generally focused on the cess, 253–266, Springer, (2013). requirements of a specific iteration. Ethical drivers are cross-cutting [20] Jeroen Van den Hoven, Pieter E Vermaas, and Ibo Van de Poel, Hand- with a scope higher than individual increments. This warrants a pro- book of ethics, values, and technological design: Sources, theory, val- ues and application domains, Springer, 2015. tected status in the design process so that the influence of ethical [21] Cédric Villani, Yann Bonnet, Bertrand Rondepierre, et al., For a mean- drivers on the design of the system is periodically evaluated, ensur- ingful artificial intelligence: Towards a French and European strategy, ing continuous attention to the requirements derived from AI-HLEG Conseil national du numérique, 2018. guidelines. [22] Ricardo Vinuesa, Hossein Azizpour, Iolanda Leite, Madeline Balaam, We argue that some conflicting ethical concerns only become more Virginia Dignum, Sami Domisch, Anna Felländer, Simone Daniela Langhans, Max Tegmark, and Francesco Fuso Nerini, ‘The role of arti- visible at the lower levels of design abstractions. Because Scrum ficial intelligence in achieving the sustainable development goals’, Na- cyclically combines high level design with low level design activ- ture Communications, 11(1), 1–10, (2020). ities in each sprint, the relevance of the HLEG guidelines remains constant throughout the development process. It is also at this imple- mentation level where the trade-offs between different key require- ments can be weighed against each other, and sometimes resolved, 4