Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 27 Linking Actions to Value Categories - a first Step in Categorization for Easier Value Elicitation Djoshua D. M. Moonen and Myrthe L. Tielman 1 Abstract. Computer systems are increasingly involved in making fully understand the concept. Moreover, the conversational capabil- decisions. Therefore, it is increasingly important that they understand ities of many automated systems are not yet capable of this type of our values. To make values usable, context is important, both of the conversation. So this information is difficult for a system to obtain individual and the actions they underlie. This work aims to study if it [6]. Therefore, most existing value-elicitation methods are based in is possible to make it easier to elicit an individual’s values by using human-human interaction [11], or are aimed at what values are im- the context of the action. Practically, we first held an expert survey portant in general [9]. (n = 7) to see if some values are more likely to underlie some actions In order to make this elicitation of an individual’s values easier, it than others. The results were positive on this score, so a second study is helpful to consider the second form of context, namely the action. (user, (n = 135)) was done showing that restricting the number of val- Most systems have attempted to elicit values in general. But values ues made it easier to elicit values from users while not unnecessarily can take on different meanings in different domains. For instance, limiting their expression. This work shows that when linking actions safety might mean something different for choosing a car than for to values, it is possible to make the elicitation easier by only show- choosing a doctor. Similarly, the choice to go to work is motivated ing the applicable options. This is an important step in being able to by different types of values than the choice to go to a party. This also incorporate values in computerized decision making. means that we could use this type of context to narrow the conversa- tion about values between a system and human. If we want to know what value underlies a certain action for a 1 Introduction specific individual, we could pose this as a question in which the Computer systems are increasingly helping us to make and stick user can pick from all possible values. However, this would mean to important decisions in life. Reminder systems, health apps and a very large answer space. And as mentioned, the action probably social-media blockers all function to help us change behavior in some also limits what values are most likely to underlie that choice. So it way [5, 7]. However, such systems often blindly stick to a single might be possible to use this context to limit the amount of possible goal, and do not truly understand the motivations behind our actions, values an individual has to pick from, for instance in the form of a nor the context in which we make our decisions. To help technology pre-selection of the list of values. However, as we are interested in the understand these motivations, values have been proposed [1]. Val- individual’s values, not just the most likely ones underlying a general ues represent the things we find important in life, and which guide action, it is also important to not limit the individual too much in what our decisions [8]. Therefore, they have long been taken into account they can express by making this pre-selection too small. In this paper, in system design [3]. However, to flexibly adapt to individual val- we wish to explore whether it is possible to make elicitation easier in ues, systems require values in the reasoning as well in the design. this way without limiting expression. In recent years, a number of systems have attempted to model this Thus, in this work we explore two things. Firstly, whether it is reasoning by linking values to our choices in some way [2, 10]. Ide- possible to make a pre-selection of values which are more likely to ally, such work will lead to systems that can more flexibly adapt their underlie a choice for a specific action. And secondly, whether a pre- decision making and take into account values in their reasoning [1]. selection like this makes it easier for users to select a value from a Values are general, abstract concepts. However, for a system to use list while not limiting them in their expressive ability. In section 2 them, they need to be made concrete. They need to be linked to ac- the first question is explored by means of an expert study. Section 3 tions [10], or to choices [2]. Often, this is also done by transforming explores the second question by means of a user study and 4 presents values into norms [3]. This concretization of values means that infor- the results. A discussion and conclusion based on the findings can be mation needs to be added about the context in which they are applied. found in section 5. We identify two main types of context. Firstly, the individual needs to be taken into account, as people have different values, as well as different views on what a value means for them. Secondly, what type 2 Value Categorization of choices or actions the value is applied to is relevant, values will take on different meanings in different domains. In order to make value-selection easier, we propose to make a pre- The first type of context is the individual, which means that infor- selection based on the type of activity the value promotes. Our hy- mation about values should ideally come from them. The most ob- pothesis is that different actions have different value types which of- vious source for this information are the users themselves, but peo- ten underlie them. For instance, the values which underlie people’s ple have often not explicitly thought about values, or do not even choice to go to work are probably different from the one to watch a movie. In order to study whether such a pre-selection can be made 1 Delft University of Technology and what it would be, an expert-study was performed. The goal of Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 28 this study was two-fold. Firstly, to see if there is agreement amongst choose from. However, more work is necessary to study if this pre- experts in what categories of values are most likely to underlie the selection truly does not limit users in the expression of their values, choice to perform a specific action. And secondly, if there is such as well as to know if it actually achieves its goal of making value agreement, what categories of values are most likely for what ac- selection easier. tions. 3 User Study 2.1 Participants The results from the expert study show the potential of using a pre- The study was conducted with 7 participants (71.4% male), recruited selection of possible values based on the action. The goal of this from research staff and PhD students of Delft University of Tech- pre-selection would be to make it easier for users to indicate what nology. All participants were familiar with or have worked on value- values underlie decisions to perform actions. However, it is impor- based topics. Average age was 33.4 (sd 7.2) and they had an average tant that people do not feel this pre-selection limits their freedom of of 3.83 years (sd 4.41) of experience with value-based research. expression, as the pre-selection is not meant to push users into giv- ing certain answers. To study these two aspects, an online between- subject user study was performed. Participants were asked what 2.2 Procedure value would most likely underlie an action. Half were only shown the pre-selection to pick from, while participants in the other condi- The participants were sent a survey along with instructions. The in- tion were shown the full list of values from Schwarz [8]. structions defined value as used by Schwarz (1992) including a de- tailed description of each of the 10 value categories [8]. Participants were asked to consider 40 actions, and for each indicate which top 3.1 Participants three of value categories would be most likely to underlie a person’s For this study, participants were recruited via Amazon Mechanical choice to perform those actions. The full list of actions can be seen Turk. 297 started the survey, and 231 completed it. Of these 231, in Table 1. These actions were selected in such a way that the list 64 did not answer the control question correctly and were, therefore, represented a diverse set of daily activities, and the authors felt all excluded. Of the 167 remaining, 8 filled in the survey twice, and the value categories were most likely to be covered at least once. data of their second time was deleted, leaving 159. One final partici- pant was excluded because they did not collect their payment, leaving 2.3 Measures us with 158 participants included in the initial analysis. When looking at this initial data, we noticed that some of the After the surveys were filled in, the anonymized data was aggregated. participants had only clicked once on the pages with the questions, This was done by counting the frequency of each value category in namely for going to the next page. This can be taken as evidence that the 1st, 2nd and 3rd places for each action. Then, first place was they did not look at the full drop down list of values, just leaving awarded a score of 4, second place a score of 2 and third place a score the first, default answer in place. In some cases, this might just in- of 1 for each time it appeared in said place. The scores were summed dicate that the default answer seemed correct, but some participants up such that every value category received an overall score per action. also did this for every question. In the end, it was decided to remove This formula was chosen such that a first place was worth a little participants that had answered 10 or more questions within a second more than a third and second place combined, and the same as two of seeing the page, as it would’ve been nearly impossible for them to second places combined. After this score was created, a threshold have fully read a question in that time. The threshold of 10 was chose of 9 was chosen in order to determine which categories were most due to it being over half of the questions. This way 23 participants relevant for each action. All categories scoring 9 or over were marked were removed. This made the final number of participants included as relevant. This threshold was chosen such that each action had at in the analysis 135. least has one value category above the threshold. 3.2 Procedure 2.4 Results The participants were asked to fill in a survey. The survey start- Table 1 shows the full results, marking each value category’s score ing with some general information, followed by asking for informed for each of the included actions. The rightmost column shows the consent of the participants. After obtaining consent the participants difference between the mean score and the maximum score per ac- were placed in 1 of 2 conditions after which 19 questions were asked tion. This number indicates how much agreement existed between where the amount of answers was dependant on the condition the par- experts, with higher numbers indicating more agreement. Further- ticipant were in. The 19 questions were asked in random order where more, it shows which value categories were marked by the experts as on each question the answers were also in random order. The survey being relevant (above the threshold of 9) in red/bold. concluded by asking the participants 5 questions on their experience From Table 1 the average distance from the highest score to the completing the survey. mean was computed, which is 11.4 on average. This indicates that for many actions a value category exists which scores visibly better 3.3 Measures than the rest. After all, to get an overall score of 11, at least 3 of the 7 participants needed to have scored one particular category in at least We measured the total time spent to complete the survey and the first 2nd place. To get this number as difference from the mean score, this click, last click, the total amount of clicks and time at which the ques- means the majority of the 7 experts agreed on the highest scoring tions was submitted. The difference between time of the first and last category. This consensus indicates that we might, indeed, use value click was used to measure the time actually spent on each of the ques- categories that are in Table 1 to pre-select what values a user can tions. This metric proved to be useful as some of the participants had Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 29 Table 1. Weighted numerical representation of action per value category. Achievement(AC), Benevolence(BE), Conformity(CO),F Hedonism(HE), Power(PO), Security(SE), Self-Direction(SD), Stimulation(ST), Tradition(TR), Universalism(UN). First place is worth 4 points, second 2 and third 1. The mean to highest represents the difference from the highest to the average score. Highlighted in red/bold are the value categories higher or equal then 9, so marked as relevant for that action Promoted activity AC BE CO HE PO SE SD ST TR UN Mean to highest Act politely 2 12 11 0 1 1 4 0 4 7 7,8 Buy something 8 4 4 11 1 4 5 4 1 0 6,8 Care for someone 2 20 1 2 2 4 2 4 0 6 15,7 Celebrate holiday 0 0 2 11 4 5 2 5 13 0 8,8 Communicate 5 4 3 0 4 2 3 8 1 12 7,8 Compete 10 0 2 1 7 1 2 8 4 0 6,5 Cook 9 0 0 13 0 10 4 2 3 1 8,8 Create something (e.g. painting) 10 0 0 6 1 0 11 12 1 1 7,8 Decide what to do 4 0 0 1 11 0 20 4 0 2 15,8 Do something exciting 6 0 2 16 1 1 2 14 0 0 11,8 Drink 4 0 2 20 0 4 3 4 4 1 15,8 Eat 0 2 6 12 0 14 3 0 5 0 9,8 Enjoy art 0 0 0 15 2 4 3 10 2 6 10,8 Exercise 13 0 0 2 2 11 9 5 0 0 8,8 Exercise influence 4 7 0 1 18 0 4 8 0 0 13,8 Follow a ceremony 0 0 11 4 0 3 1 1 20 2 15,8 Follow the law 0 1 22 4 0 7 0 1 4 3 17,8 Help someone 1 18 2 0 4 2 4 1 0 10 13,8 Learn 8 0 2 1 2 2 16 6 0 5 11,8 Make decisions for others 8 5 0 1 18 0 5 0 3 2 13,8 Make money 13 0 0 11 8 8 5 0 0 0 8,5 Meditate 2 7 1 5 1 2 16 4 4 0 11,8 Perform (e.g. a play) 11 4 1 2 2 0 6 15 0 1 10,8 Plan your day 10 0 2 4 3 1 18 0 2 0 14 Play games 2 0 3 11 0 0 6 14 5 1 9,8 Pray 2 2 1 0 0 8 6 4 17 2 12,8 Protect others 0 18 4 0 5 8 0 0 1 6 13,8 Protect your belongings 1 0 4 2 5 24 2 0 1 2 19,9 Protect yourself 2 1 2 0 7 20 4 0 5 1 15,8 Read 0 4 1 4 2 1 9 11 0 10 6,8 Relax 0 6 1 18 0 8 6 1 0 2 13,8 Repair something (e.g. car) 18 2 0 5 4 1 1 9 0 2 13,8 Sleep 0 1 0 11 3 16 8 2 0 0 11,9 Spend time with family 0 6 4 8 1 3 0 6 10 4 5,8 Spend time with friends 0 5 2 9 4 5 6 9 1 1 4,8 Study 10 0 0 5 2 0 12 6 1 6 7,8 Take responsibility 2 11 0 2 16 2 5 0 1 3 11,8 Travel 1 0 2 12 0 0 11 15 0 1 10,8 Watch movies 2 0 1 18 0 0 2 13 4 2 13,8 Work 11 0 1 0 4 8 11 6 1 0 6,8 taken breaks over 10 minutes long before the first click on a question 4 Results was made, so we could not look at total time spent on the page. The first 19 questions were regarding values, there the last 5 questions The data was analyzed with R version 3.6.1 and the analysis was split were about the participants’ experience taking the survey. These 5 into 3 parts. The first part is analyzing the time spent on questions consisted of 4 questions about the difficulty of the survey, followed about values. The second part is on the questions regarding difficulty by 1 question asking if the participant was missing the option for the of the survey. And the third and last part is on the perceived lack of answer they wanted to give. The first 4 questions regarding difficulty answers to the questions of the survey. of the survey used a 5-point Likert scale ranging from -2 (Extremely The time spent on the questions on values was analysed by using difficult) via 0 (Neither easy nor difficult) to 2 (Extremely easy). The the mean time spent per question. The Shapiro-Wilk normality test last question regarding missing answer options used a 4-point Likert was used, indicating that the data was not normally distributed (W = scale ranging from 1 (Only some of the questions) to 4 (All of the 0.77, p < 0.01). Therefore the Wilcoxon rank sum test with continu- questions). ity correction was used, indicating that a significant difference exists between conditions in the amount it took for people to answer what value was most relevant (W = 3068, p < 0.01). Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 30 Difficulty was tested with four questions. In order to create a single like. For a fully validated pre-selection of what value type corre- difficulty score, the questions had their internal cohesion tested using sponds to what action, more work would need to be done. However, Cronbach’s alpha and were found to be internally cohesive (α.83). our main intention was to study whether such a pre-selection was The Shapiro-Wilk normality test shows the data was not normally even possible in the first place and we believe this smaller sample distributed (W = 0.95, p < 0.01). Therefore the Wilcoxon rank sum was enough to show that this is indeed the case. Secondly, the ques- test with continuity correction was used, showing significant differ- tions about difficulty and perceived amount of missing answers used ence in the answers on questions regarding the difficulty of the survey self-reported data for the analysis. We do not fully know to what ex- between the two conditions (W = 1394.5, p < 0.01). tent people truly found it more difficult because of the long list, or The question regarding freedom of answers was analysed sepa- because the selection made values easier to think about. Moreover, rately. On average, people indicated that they could answer as they the results with respect to freedom of answers were all relatively wished for ’most of the answers’ (3) for both conditions (all answers: high, which might indicate a ceiling effect. Although we did not find M=2.95, SD=0.73, pre-selection: M=3.01, SD=0.86). As the data that a pre-selection limited people’s perceived freedom in choice, this was not normally distributed (Following Shapiro-Wilk W = 0.79, p might be because they simply could not think of anything else. How- < 0.01), the Wilcoxon rank sum test with continuity correction was ever, when presented with a full list some people might still pick used, showing no significant difference between the two groups re- things which were not in the pre-selection. As we did not show the garding their experience of missing answers (W = 2112.5, p = 0.455). same people both the full and the pre-selection lists, a direct compar- ison like this was not possible. 5 Discussion and Conclusion The results show that participants that received the pre-selection 5.3 Future Work spent significantly less time on average per value question, imply- Firstly, this paper focused on a pre-selection on values for ease of ing that it was easier to select an answer from the pre-selection. This use. At the moment, you need to have the pre-selection for each spe- was probably partly because there are less answers to consider, but cific action. To be able to scale up to any arbitrary set of actions it could also be because people already had had an answer in mind and would be worthwhile to explore the existence of a groupings of ac- it would take less time to find their answer. Overall this means that tions that share the same values. The possibility exists that values the survey with pre-selected answers was less of a time investment, can be extrapolated, making it easier for the system to scale in the and that it was potentially easier to complete. This implication is sup- amount of actions. Secondly, this paper only looks at actions to nar- ported by the results from the questionnaire, which also show that the row down a pre-selection of possible underlying values. However, participants that received the pre-selection found the survey signifi- in indicating what value underlies an action, more contextual factors cantly easier to complete. One concern with only presenting people might play a role. Things like time of day, weather and surrounding with a pre-selection would be that it limits people’s freedom of ex- actions might be relevant. But a good starting point for taking into pression. However, our results show no significant difference in the account more context might also be social situation. Social norms amount of times people wanted to pick a value which was missing are highly dependent on our values, so whether we perform an action from the list. Note that the average score of both conditions indi- with friends or with colleagues might change what value underlies it cated that they were able to find their value for ’most of the actions’. [4]. More work is necessary to see whether such additional context Therefore, we found no evidence that making a pre-selection lead to factors would allow for better pre-selections of values. Finally, this people feeling restricted in their expression. paper assumes that the answers filled in by the participants in the sur- veys are representative of their beliefs. However, talking about values 5.1 Contributions is difficult, and so is verifying whether what people say about their Values are abstract concepts, but when a system needs to use them, values matches with what they actually value in practise. Therefore, they need to be seen in the context of both the individual and what it would be interesting to see to what extent the answers given in the actions they are applied to. In this work, we use the context of these survey coincide with the values that the participants actually hold. actions to inform us about what values are most likely, in order to more easily elicit values from an individual. More specifically, this 5.4 Conclusion study shows that it is possible to present a pre-selected list of val- ues to participants based on the context of the action it is applied to. Values are increasingly being incorporated in technology, but their This pre-selected list makes the process of picking underlying values elicitation remains difficult. In this work, we explore whether it faster and easier to perform, without it affecting the freedom of ex- is possible to make value elicitation for specific actions easier by pression perceived by participants. This is important as this technique presenting people with a pre-selection containing only those values can be used by systems to learn what values underlay an individual’s most relevant to that action context. In an expert study, we found choice to perform an action. In this way, values can be used by sys- that there is indeed some consensus on what value categories are tem’s to adjust their advice and decision making processes, and to most likely to correspond to an action. This indicates that it is indeed align better with their users. Values form a large part of the moral possible to make a pre-selection of most relevant values based on the context in which people make decisions, so it is important that we actions that are looked into. Additionally, in a user study with such take steps to allow systems to understand these better [1]. a pre-selection we found that it made it easier for people to choose the most likely underlying value for an action, without diminishing their perceived freedom of choice. These results are important for 5.2 Limitations the process of value elicitation and through that of value-based Firstly, our pre-selection was based on a limited number of expert reasoning, which is becoming more important in today’s society participants. Although our results indicate that this was a good pre- where we increasingly interact with technology on a personal level. selection, we do not assume full consensus on what this should look Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Eleventh International Workshop Modelling and Reasoning in Context (MRC) @ECAI 2020 31 Acknowledgement: This work is part of the research programme CoreSAEP, with project number 639.022.416, which is financed by the Netherlands Organisation for Scientific Research (NWO). REFERENCES [1] Ethically Aligned Design - A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems, Version 2, The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems., 2017. [2] S. Cranefield, M. Winikoff, V. Dignum, and F. Dignum, ‘No pizza for you: Value-based plan selection in BDI agents’, in International Joint Conference on Artificial Intelligence, (2017). [3] Batya Friedman, Peter H. Kahn Jr., and Alan Borning, Human- Computer Interaction and Management Information Systems: Founda- tions Advances in Management Information Systems, Volume 5 (Ad- vances in Management Information Systems),, chapter Value Sensitive Design and Information Systems, 348–372, M.E. Sharpe, 2006. [4] Ilir Kola, Catholijn M. Jonker, and M. Birna van Riemsdijk, ‘Mode- model the social environment: Towards socially adaptive electronic partners’, in International Workshop Modelling and Reasoning in Con- text (MRC), Held at FAIM, (2018). AAMAS/IJCAI Workshop on Mod- eling and Reasoning in Context. [5] Eleonora Milić, Dragan Janković, and Aleksandar Milenković, ‘Health care domain mobile reminder for taking prescribed medications’, in ICT Innovations 2016, eds., Georgi Stojanov and Andrea Kulakov, pp. 173–181, Cham, (2018). Springer International Publishing. [6] Alina Pommeranz, Designing Human-Centered Systems for Reflective Decision Making, Ph.D. dissertation, Delft University of Technology, 2012. [7] Danielle E. Schoffman, Gabrielle Turner-McGrievy, Sonya J. Jones, and Sara Wilcox, ‘Mobile apps for pediatric obesity prevention and treatment, healthy eating, and physical activity promotion: just fun and games?’, Translational Behavioral Medicine, 3(3), 320–325, (2013). [8] Shalom H Schwartz, ‘Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries’, in Advances in experimental social psychology, volume 25, 1–65, Elsevier, (1992). [9] Shalom M. Schwarz, Gila Melech, Arielle Lehmann, Steven Burgess, Mari Harris, and Vicki Owens, ‘Extending the cross-cultural validity of the theory of basic human values with a different method of measure- ment’, Journal of Cross-Cultural Psychology, (2001). [10] M.L. Tielman, C.M. Jonker, and M.B. van Riemsdijk, ‘What should I do? Deriving norms from actions, values and context’, in International Workshop Modelling and Reasoning in Context (MRC), Held at FAIM, (2018). Under revision at the AAMAS/IJCAI Workshop on Modeling and Reasoning in Context. [11] Ibo van de Poel, Translating Values into Design Requirements, chap- ter Philosophy and Engineering: Reflections on Practice, Principles and Process, Springer, 2013. Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).