An Agile Framework for Trustworthy AI
                               Stefan Leijnen1 and Huib Aldewereld1 and Rudy van Belkom2
                                      and Roland Bijvank1 and Roelant Ossewaarde1


Abstract.                                                                    requirements for AI systems to be considered trustworthy: (1) hu-
   The ethics guidelines put forward by the AI High Level Expert             man agency and oversight, (2) technical robustness and safety, (3)
Group (AI-HLEG) present a list of seven key requirements that                privacy and data governance, (4) transparency, (5) diversity, non-
Human-centered, trustworthy AI systems should meet. These guide-             discrimination and fairness, (6) societal and environmental well-
lines are useful for the evaluation of AI systems, but can be comple-        being, and (7) accountability. A ’Trustworthy AI Assessment List’
mented by applied methods and tools for the development of trust-            was developed in order to determine to what extent an application
worthy AI systems in practice. In this position paper we propose a           meets the requirements. The AI-HLEG guidelines can be considered
framework for translating the AI-HLEG ethics guidelines into the             a primary ethics directive for the development of trustworthy AI sys-
specific context within which an AI system operates. This approach           tems, due to the thought and expertise that went into creating it and
aligns well with a set of Agile principles commonly employed in              the support of the European Commission (EC) for a human-centered
software engineering.                                                        approach to AI.
                                                                                As a step towards ensuring compliance with this directives, con-
                                                                             ceivably through future legislation, the EC issued a Whitepaper on
1     INTRODUCTION                                                           Artificial Intelligence [4] on February 19th 2020. In this whitepaper
                                                                             the EC sets out proposals for promoting the development of AI in
Artificial intelligence has the potential to support the accomplish-
                                                                             Europe while ensuring fundamental human rights are respected. An
ment of some of human society’s deepest problems [22]. With AI
                                                                             important part of this white paper is the proposal to create a prior con-
systems, we are able to investigate options that we would normally
                                                                             formity assessment for high-risk AI applications, based on the Ethics
consider naive, but which could unexpectedly lead to major break-
                                                                             guidelines of the AI-HLEG. This legal framework should address the
throughs. Simultaneously, AI has the potential to disrupt societies
                                                                             risks for fundamental rights and safety.
through its impact on existing economic and social structures. Risks
involved in the deployment of this powerful technology include a re-
duction of control of the digital systems, the introduction of biases        1.2    Ex Ante Evaluation vs Continuous Design
based on gender or race, and a radical increase of societal inequality,
or accordingly to some, the end of the human race [5]. These risks           With the choice for a prior conformity assessment, the EC opts for an
may cause this key technological development that can aid humanity,          ex ante approach to aligning systems with the ethics guidelines, i.e. it
to be nullified by fear and distrust.                                        should be determined in advance whether AI applications are able to
                                                                             meet the guidelines. Particularly when exposing society to high-risk
                                                                             AI applications such as facial recognition or deep fake algorithms,
1.1    Ethics Guidelines for Trustworthy AI                                  thoughtful risk assessment and caution action is required [4]. How-
                                                                             ever, in order for AI systems to conform ex ante with these guide-
To exploit opportunities and prevent threats it is important to increase
                                                                             lines, methods and tools need to be developed that allow these guide-
the trustworthiness of AI and monitor its development. Ethical guide-
                                                                             lines to be integrated during the development of the AI system. For
lines are required for this. To this end, ethics codes and principles
                                                                             example, full transparency of a decision model that has been trained
have been published [8] by governments (e.g. [21]), private sector
                                                                             using machine learning methods may not be feasible, but during the
(e.g. [15]) and research institutes (e.g. [1]). Despite the clear agree-
                                                                             development cycle an understanding of what constitutes ’sufficient
ment that AI should be ethical, there is debate about what constitutes
                                                                             transparency’ can emerge, given a functional, ethical and technical
’ethical AI’ and what ethical requirements and technical standards
                                                                             context. So in theory, it is possible to check off every key require-
are needed to achieve it [12].
                                                                             ment of the list by conforming with each requirement to some extent
   The Ethics Guidelines for Trustworthy AI were presented by the
                                                                             thereby passing the ethical evaluation. In practice, the context and
AI-HLEG on April 8th 2019 [10]. The report builds on a draft that
                                                                             value of the key requirements become explicit in designing, develop-
was published in December 2018, on which over 500 comments were
                                                                             ing, training, testing and using AI systems.
made following an open consultation. The guidelines state seven key
                                                                                Moreover, although the seven key requirements are considered
1  Research group Intelligent Data Systems, HU University of Ap-             to be equally important [10] trade-offs can arise when integrating
  plied Sciences Utrecht, The Netherlands. Corresponding author: ste-        guidelines into practice. Beyond evaluating these trade-offs and doc-
  fan.leijnen@hu.nl. All authors contributed equally.                        umenting the considerations as suggested in the AI-HLEG guide-
2 STT Netherlands Study Centre for Technology Trends, The Hague, The
  Netherlands.
                                                                             lines, methods and tools are required to deal with these trade-offs
  Copyright c 2020 for this paper by its authors. Use permitted under Cre-   during development. The term ’trade-off’ suggests a compromise,
  ative Commons License Attribution 4.0 International (CC BY 4.0).           but a design choice does not necessarily need to constitute a zero-sum
game where increasing the value of one element naturally decreases              starts. High-level requirements are consequently translated into de-
another.                                                                        sign requirements, for instance through the use of value hierarchies
   Similarly to valuing ethical requirements individually, the balanc-          [19]. These abstract design requirements are the starting point of the
ing act between conflicting key requirements also becomes explicit in           engineering development cycle, and are then typically translated into
the context where they are applied. Consider for example the appar-             product features. It should be noted that VSD focuses on high-level
ent trade-off between privacy (key requirement 2) and safety (key re-           value conflicts, and tries to determine the possible (technical) de-
quirement 3): a customs officer using a smart scanner to inspect your           sign space allowed by the combination of values of the stakeholders.
travel bag at an airport will typically yield a different level of coop-        Tensions between the value conceptions at later stages of the devel-
eration than a baker using the same scanner to inspect your shopping            opment should be evaluated according to VSD, but it is not clearly
bag. While we may value safety over privacy at an airport, in a differ-         described how this should be done, as [11] already noticed VSD is
ent context we may feel the same act violates our privacy. Different            lacking a clear ethical perspective.
privacy standards may exist within contexts, as do the design op-                  Also, ethical considerations made in the abstract design can be
portunities to prevent conflicts or trade-offs between values. Design           (unintentionally) overturned in the design decisions made by the de-
decisions would thereby be ideally made in the most specific context            velopment team during the development sprints. Sufficient support
where values can be maximized in relation to each other.                        for ethical design has to be provided to the design phase as well.
   AI technologies affect many quality attributes, such as robustness,          This can be tackled in keeping sufficient focus on the ethical guide-
performance and security. Additional complexities they introduce are            lines during the development process.
that they are hard to introspect, hard to evaluate for side effects and            Moreover, current engineering practice focus on functional and
that their inner working is often not well understood by end users.             non-functional aspects of the system from a system point of view.
Designers and developers, who have the best understanding of the                Creating Human-centered AI systems requires keeping a clear fo-
systems they build, are best positioned to assess how technological             cus on the points of view concerning human actors. Apart from the
choices lead to the desired system behavior. Unlike some other qual-            already mentioned Value-sensitive design method, there are few to
ity attributes, ethical considerations are hard to quantify, test or com-       none development methodologies that focus on this form of ethi-
pare to external standards. This leaves a great responsibility for de-          cal engineering. In principle, Agile methodologies such as Scrum
velopers and designers to specifically employ forward thinking about            allow for ethical engineering processes, as they are flexible enough
ethical implications and heed how their decisions fit the desired eth-          to include any steps regarding value discovery, or weighing stake-
ical properties.                                                                holder concerns. Although the discovery of non-functional require-
   Finally, the design context matters because trustworthiness is a hu-         ments in Agile software processes has received attention from other
man concern. Although technology can be labeled as trustworthy, it is           researchers [7], it is still unclear how to prevent the neglect of the
the context in which this technology is placed that reflects on whether         ethical concerns of some stakeholders.
the technology is trustworthy and for the benefit of the human actors
concerned.
                                                                                2.1    Agile Development
   We therefore argue for integrating the ethical key requirements
into the process of developing AI systems, making a translation of              Agile development focuses on short and iterative cycles to improve
the general but abstract AI-HLEG ethics guidelines into specific but            agility and flexibility in the development process [2]. Scrum [18] is
custom product requirements that shape or constrain the design (eth-            the dominant Agile methodology, especially in the segments served
ical drivers), taking into account concerns from direct and indirect            by smaller development teams [3]. Development in Scrum is done in
human actors, and advocating that ethical design should take place              short cycles (Sprints) in which concrete parts of the design are imple-
in an applied context.                                                          mented into a usable product, while keeping a backlog of work that
                                                                                remains to be done. The main driver of development within sprints
                                                                                are so-called User Stories.
2   AN ETHICS DESIGN APPROACH                                                      User Stories express desired system functionality from the
There are different approaches to software design, ranging between              perspective of a particular user, expressing a particular desire in a
the Waterfall model where the design is determined up front, to Ag-             given context. Two examples are shown in figure 1. A User Story is
ile approaches where there is continuous planning, designing and                typically formulated in the following template:
learning during development [2]. As pragmatics dictate, some larger
projects start with iterations involving more design work, gradually            “As a ... [actor] I want ... [functionality] in order to ... [de-
transitioning to iterations that involve more implementation work.              sire] given ... [context].”
   Ethical drivers are special in two ways. First, when designing a
product, ethical considerations are usually not directly represented in         User Stories have several characteristics:
the form of a stakeholder. Therefore, it will typically involve special         • they allow large projects to be divided into smaller parts that can
focus to guard that the guidelines are given proper attention during              be developed independently;
design activities. Second, the ethical perspective (cf. [16]) typically         • they are typically short and contain only those development steps
affects the product as a whole, which implies that decisions based on             that can be made in a short amount of time (i.e. days rather than
ethical drivers are best made in the early stages of software design,             weeks);
when more global decisions are made. Both the extra design effort               • they are especially useful for projects where requirements and de-
and the global impact of ethical drivers create potential tension with            sires change rapidly, or where these are misunderstood; and
Agile development approaches.                                                   • they facilitate time estimation of tasks.
   Existing approaches to ethical development (e.g., Value-sensitive
design (VSD) [9] or Design for Values [20]) focus primarily on                     The high-level design is created and formulated in Epics. An Epic
the Waterfall model, where the design is known before engineering               is a User Story that is too big to fit in a single development cycle

                                                                            2
[17], and therefore has to be broken down (either at the start of the                                                                                               1


                                                                                                                                                                                     High-level
project, or in a later iteration of development) into smaller, more con-                                                                                                   Epic
crete User Stories. This breakdown is done through a process of User
Story Mapping [14]. In later phases of development, User Stories can
                                                                                                                                                                    2


                                                                                                                                        AI-HLEG ETHICS GUIDELINES
                                                                                                                                                                           Epic
be broken down further into even more contextualized features, if
deemed necessary. Finally, at the start of each Sprint the priority of
the User Stories is determined, for instance, by means of planning                                                                                                      User Story
poker [13]. Thereby the developers resolve the importance of each
User Story, determining in which order they will be implemented in
                                                                                                                                                                        User Story
the project. This priority determines which User Stories are imple-
mented in the current Sprint, and which User Stories will have to                                                                                                   3
                                                                                                                                                                        User Story


                                                                                                                                                                                     Contextual
wait for a future Sprint.
                                                                                                                                                                        User Story                to Sprint backlog
Epics


                                                                                                                                                                        User Story
                                                                                     Malfunction
                                                                                     Behaviour
User Stories (by priority)


                                                                                                                         Figure 2. At several steps during the development of AI systems, the Agile
                                                                                                         Release 1        design process should be informed by the HLEG-AI ethics guidelines: (1)
                                                                                                                          creating Epics; (2) User Story mapping; (3) prioritization of User Stories.


                                      As a truck driver                       As an insurer
                                I want my semi-autonomous          I want semi-autonomous vehicles       Release 2
                                                                                                                            Scrum also offers methods for trade-offs between guidelines to be
                             vehicle to halt as soon as possible        to optimize where to halt                        mapped. We propose to investigate the use of planning poker [13] as
                             in order to provide safety given a    in order to minimize financial loss
                                         malfunction                       given a malfunction                           a method to weigh, compare and prioritize user stories (as in step 3 of
                                                                                                                         figure 2), rather than an estimate of the size of user story, the ethical
                                                                                                                         impact could be considered to award points and open a discussion
                                                                                                                         between developers. This approach also allows for technical knowl-
                                                                                                                         edge to inform the ethical discussion, i.e. a developer may be aware
             Figure 1. Example layout of a backlog for the development of a                                              of a recent technology developed capable of resolving two (or more)
         semi-autonomous truck, with high-level Epics and prioritized User Stories.                                      conflicting values, or the reverse could happen, where the technolog-
                                                                                                                         ical roadmap is driven by the need to resolve two conflicting values.
                                                                                                                            As a next step, the Scrum approach to designing trustworthy AI
   At any stage during a project, the relation between the Epics and
                                                                                                                         systems according to the ethics guidelines needs to be investigated
the User Stories is kept visible on the Scrum board (see figure 1).
                                                                                                                         further, empirically tested and adapted based on the results. In the
The top row shows the Epics that have been identified for the cur-
                                                                                                                         next section we show a preliminary example how the ethics guide-
rent project; below each Epic are the related User Stories, indicating
                                                                                                                         lines could be operationalized in Scrum.
which User Stories are deemed necessary to create a releasable ver-
sion of the project.
                                                                                                                         2.3    Example of Ethics Guidelines in Agile Design
2.2                              An Agile Approach to Trustworthy AI
                                                                                                                         One of the design concerns of AI systems is their behavior in excep-
It is a key agile principle to postpone design decisions until as late                                                   tional circumstances, such as when component failures or adverse ex-
as possible, allowing for just-in-time but well-informed decisions.                                                      ternal factors cause malfunctions. Consider the context of the semi-
However, some overarching requirements, such as those sourced                                                            autonomous truck that is capable of assisting its human driver by
from AI-HLEG guidelines, are better formulated once all major de-                                                        making decisions in emergency situations. In our scenario, the truck
sign decisions are made. This requires careful consideration of where                                                    must make a decision about bringing the vehicle from a state of mov-
and how to apply these requirements.                                                                                     ing to a full stop in the event of a system malfunction. The risk of an
    Figure 2 shows the translation of high-level Epics to contextu-                                                      adverse event regarding the safety of the driver is smaller for a halted
alized User Stories in action while involving the AI-HLEG ethics                                                         truck than for a truck that is moving along with traffic. Therefore, the
guidelines at each step in the process.                                                                                  default strategy of the AI may be to stop the truck as soon as possible
    User stories take a central place as a design artifact in Scrum. The                                                 in case of malfunction. However, it may be in the interest of some
structure that is commonly used for user stories in Scrum closely                                                        stakeholders to override this default strategy. For example, an insurer
aligns with the requirement for a specific and contextualized design                                                     may want the truck to stops at a point where the risk of economic
element, in order to make the (comparative) value of the ethical re-                                                     loss of the truck is minimized, whereas it may be in the best interest
quirements explicit; this is provided by the ’given ... [context]’ part                                                  of the driver to stop as soon as possible regardless of the hazard this
of the User Story template. In user stories, the roles of the direct                                                     creates for other drivers of other vehicles or costs of repairing the
stakeholders (e.g. users, subjects) and indirect stakeholders (e.g. so-                                                  truck.
ciety, future generations) can be naturally represented in the ’As a                                                        As step 1 in figure 2 shows, the AI-HLEG guidelines can be used
... [actor]’ part of a User Story. The ethical requirement is naturally                                                  to evaluate the ethical impact for each of the stakeholders when spec-
referred to, as all considerations that add value in user stories, in the                                                ifying the Epics. This may yield new insights of conflicting values
goal-part of the user story, described by the ’in order to ... [desire]’-                                                and requirements, or new stakeholders (such as insurers or other
phrases.                                                                                                                 drivers) that the design should take into account. During User Story

                                                                                                                     3
mapping, one User Story may put forward the driver’s perspective,               informed by technological possibilities.
and another may take the insurer’s perspective; the ethical consider-              Scrum ceremonies (such as poker) transform AI-HLEG guidelines
ations for each individual User Story can be aligned with the guide-            in user stories. That process may be repeatable and reusable, so that
lines as soon as it is created, cf. step 2. Conflicting interests emerge        the same translation can be applied when the same ethical issues arise
most clearly at this stage, prompting a resolution mechanism in order           in different contexts. The structured process, when documented and
to prioritize the items, cf. step 3. One such resolution mechanism is           archived, is a resource that can be used to identify any emerging de-
Scrum planning poker, where the impact of the key requirements is               sign patterns.
comparatively evaluated and discussed by the development team.
   In the emergent design approach that underlies agile software de-
                                                                                REFERENCES
velopment, general requirements are stepwise translated into scenar-
ios of increasing specificity in the development cycles. The default             [1] AI Asilomar. Principles. future of life institute, 2018.
strategy (stop as soon as possible) will be revisited and reconsidered           [2] Kent Beck, Mike Beedle, Arie Van Bennekum, Alistair Cockburn, Ward
                                                                                     Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew
when User Stories are created from Epics. This leads to a continu-                   Hunt, Ron Jeffries, et al. The agile manifesto, 2001.
ous need to reapply the AI-HLEG guidelines. Conflicting values will              [3] A. Begel and N. Nagappan, ‘Usage and perceptions of agile software
emerge as the set of user stories grows, exemplified by the sidebar                  development in an industrial context: An exploratory study’, in First In-
position of the AI-HLEG ethics guidelines in figure 2. The iterative                 ternational Symposium on Empirical Software Engineering and Mea-
                                                                                     surement (ESEM 2007), pp. 255–264, (Sep. 2007).
application of guidelines stands in contrast with the typical large up-          [4] European Commission, ‘Whitepaper on artificial intelligence - a euro-
front designs of VSD and other waterfall strategies which are con-                   pean approach to excellence and trust’, B-1049 Brussels, (2020).
sidered an anti-pattern for Scrum [6]. The arrows in Fig. 2 exemplify            [5] Charles J Dunlap Jr, ‘Accountability and autonomous weapons: Much
that the ethical guidelines play a role at different levels of design ab-            ado about nothing’, Temp. Int’l & Comp. LJ, 30, 63, (2016).
straction: at the highest level, when Epics are produced (arrow 1);              [6] V. Eloranta, K. Koskimies, T. Mikkonen, and J. Vuorinen, ‘Scrum anti-
                                                                                     patterns – an empirical study’, in 2013 20th Asia-Pacific Software En-
during subsequent detailing, when User Stories are produced (arrow                   gineering Conference (APSEC), volume 1, pp. 503–510, (Dec 2013).
2); and at the lower levels, when User Stories are prioritized and the           [7] Weam M Farid, ‘The normap methodology: Lightweight engineer-
Sprint Backlog is organized (arrow 3).                                               ing of non-functional requirements for agile processes’, in 2012 19th
                                                                                     Asia-Pacific Software Engineering Conference, volume 1, pp. 322–325.
                                                                                     IEEE, (2012).
3   DISCUSSION                                                                   [8] Jessica Fjeld, Nele Achten, Hannah Hilligoss, Adam Nagy, and Mad-
                                                                                     hulika Srikumar, ‘Principled artificial intelligence: Mapping consensus
The analysis provided and methods proposed in this position paper                    in ethical and rights-based approaches to principles for ai’, Berkman
                                                                                     Klein Center Research Publication, (2020-1), (2020).
are part of ongoing applied research towards operationalizing ethical            [9] Batya Friedman and David G Hendry, Value sensitive design: Shaping
guidelines for AI into the practice of developing AI systems. Addi-                  technology with moral imagination, Mit Press, 2019.
tional (empirical) research is planned in order to validate and extend          [10] AI High-Level Expert Group, ‘Ethics guidelines for trustworthy AI’,
the Agile framework for trustworthy AI presented here.                               B-1049 Brussels, (2019).
                                                                                [11] Naomi Jacobs and Alina Huldtgren, ‘Why value sensitive design needs
   We propose that describing requirements as user stories have the
                                                                                     ethical commitments’, Ethics and Information Technology, 1–4, (2018).
advantage of placing the human actor central in designing the how.              [12] Anna Jobin, Marcello Ienca, and Effy Vayena, ‘Artificial intelli-
This approach bridges the gap from the moral and regulatory func-                    gence: the global landscape of ethics guidelines’, arXiv preprint
tions of ethics guidelines to the daily practice of implementing soft-               arXiv:1906.11668, (2019).
ware, by carrying over the key ethics requirements to the fine-grained          [13] Viljan Mahnič and Tomaž Hovelja, ‘On using planning poker for es-
                                                                                     timating user stories’, Journal of Systems and Software, 85(9), 2086–
context where the what, why, and for whom can take on a meaning                      2095, (2012).
that is not evident at the abstract level of an AI system’s compre-             [14] Jeff Patton and Peter Economy, User story mapping: discover the whole
hensive design. The introduction of a separate Scrum ceremony to                     story, build the right product, O’Reilly Media, Inc., 2014.
consider ethics may have the additional effect that teams may seek              [15] Sundar Pichai, ‘Ai at google: our principles’, Google blog, (2018).
                                                                                [16] Nick Rozanski and Eóin Woods, Software Systems Architecture, Addi-
the full breadth of the guidelines, just as planning poker stimulates
                                                                                     son Wesley, Upper Saddle River, NJ, 2 edn., 2011.
teams to investigate all influencing factors that influence planning.           [17] Kenneth S Rubin, Essential Scrum: A practical guide to the most pop-
   Compliance with AI-HLEG recommendations requires a specific                       ular Agile process, Addison-Wesley, 2012.
process to identify ethical drivers. In a waterfall process, abstract and       [18] Ken Schwaber and Mike Beedle, Agile software development with
system wide concerns are generally considered in the earlier abstract                Scrum, volume 1, Prentice Hall Upper Saddle River, 2002.
                                                                                [19] Ibo Van de Poel, ‘Translating values into design requirements’, in Phi-
designs. In agile processes, with their cyclical nature and incremen-                losophy and engineering: Reflections on practice, principles and pro-
tal development, development attention is generally focused on the                   cess, 253–266, Springer, (2013).
requirements of a specific iteration. Ethical drivers are cross-cutting         [20] Jeroen Van den Hoven, Pieter E Vermaas, and Ibo Van de Poel, Hand-
with a scope higher than individual increments. This warrants a pro-                 book of ethics, values, and technological design: Sources, theory, val-
                                                                                     ues and application domains, Springer, 2015.
tected status in the design process so that the influence of ethical
                                                                                [21] Cédric Villani, Yann Bonnet, Bertrand Rondepierre, et al., For a mean-
drivers on the design of the system is periodically evaluated, ensur-                ingful artificial intelligence: Towards a French and European strategy,
ing continuous attention to the requirements derived from AI-HLEG                    Conseil national du numérique, 2018.
guidelines.                                                                     [22] Ricardo Vinuesa, Hossein Azizpour, Iolanda Leite, Madeline Balaam,
   We argue that some conflicting ethical concerns only become more                  Virginia Dignum, Sami Domisch, Anna Felländer, Simone Daniela
                                                                                     Langhans, Max Tegmark, and Francesco Fuso Nerini, ‘The role of arti-
visible at the lower levels of design abstractions. Because Scrum                    ficial intelligence in achieving the sustainable development goals’, Na-
cyclically combines high level design with low level design activ-                   ture Communications, 11(1), 1–10, (2020).
ities in each sprint, the relevance of the HLEG guidelines remains
constant throughout the development process. It is also at this imple-
mentation level where the trade-offs between different key require-
ments can be weighed against each other, and sometimes resolved,

                                                                            4