Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) Strength calculation of rewards Mariela Morveli-Espinoza, Ayslan T. Possebom and Cesar A. Tacla CPGEI - Federal University of Technology - Paraná, Brazil {morveli.espinoza, possebom, cesar.tacla}@gmail.com between the two agents. Therefore, as appropriate, a reward could be more e↵ective than a threat. In this Abstract work we study rewards, which have a positive nature as a proponent agent can entice an opponent of him to Persuasive negotiation involves negotiating do certain action by o↵ering to do another action as a using rhetorical arguments (such as threats, reward [1]. rewards, and appeals), which act as persuasive Let’s see the following persuasive negotiation elements that aim to force or convince an scenario where boss is an agent proponent, employee opponent to accept a given proposal. In the an agent opponent and the goal of boss is that case of rewards, these have a positive nature as employee works every weekend1 . Taking into account they use the argument that something positive the knowledge base of agent boss, the following will happen to the opponent if he accepts to rewards can be generated: do the requirement sent by the proponent. A proponent agent can generate more than • boss: if you work every weekend, you will receive one reward depending on the information he an interim payment. has modeled of his opponent. The problem • boss: if you work every weekend, you will have appears when the agent has to choose a more holidays. reward, to send to his opponent, among a set of rewards. One measure that could help him The question is which of these rewards will boss in his choice is the strength each reward has. choose to persuade employee to work every weekend? Thus, the goal of this work is to analyze the One way of knowing this is by calculating the strength rewards components and to propose a model of the generated rewards. According to Ramchurn et for calculating their strength. We propose two al.[9], a strong argument (in this case a reward) is one ways for calculating the strength of rewards that quickly convinces an opponent to do a proposal, depending on the kind of negotiation the agent while a weak argument is less persuasive. Therefore, is participating. The first proposal is to be calculating the strength of rewards is important in used when the agent negotiates only with one persuasive negotiation dialogues, since the quickness opponent, and the second when the agent of persuasion depends on it. negotiates with more than one opponent. Rewards are constructed using both proponent and opponent’s goals (for example, earning more money). 1 Introduction Some researches on this topic take into account the importance of the opponent’s goal for the agent Persuasive negotiation involves negotiating using opponent and the certainty level of the beliefs used rhetorical arguments, which act as persuasive elements for the argument generation [1][3]. However, there are that aim to force or convince an opponent to accept a other additional criteria that are necessary. Following, given proposal [9]. some examples of situations that show this need: Although some authors argue that threats are the 1. Agent boss knows that “visiting his parents” (g1 ) strongest rhetorical arguments ([11], [8]), the choice of is a more important goal for employee than “fixing which kind of argument will be used by a proponent his car” (g2 ). Considering only the importance, boss agent depends on the information he has modelled would use g1 for generating a reward. However, what about his opponents. According to Ramchurn et al.[9], happens if boss knows that g1 is less achievable than it also depends on the convenience of the proposal for the proponent and the degree of trust that exists 1 This scenario is inspired by the example presented in [1] 1 8 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) g2 since visiting his parents is not possible for the shows an analysis of the elements of a reward and our moment because employee has a spine disorder that proposed strength calculation model. In Section 6, the does not let him travel long distances? In cases like main related works are compared with our proposal. this, the importance is no longer the best or the unique Finally, Section 7 is devoted to some conclusions and criterion, related to the goal of the opponent. future works. 2. Agent boss has already done rewards before and he has rarely fulfilled it, and obviously employee knows about it. In this case, the strength of a reward is also influenced by the execution credibility that the 2 Belief-based goal processing model proponent has from the point of view of his opponent. In this section, the four stages of the goal processing Thus, even when the goal of an opponent is very model of Castelfranchi and Paglieri are presented2 . important and/or achievable, a low level of credibility The aim of this section is not to present in detail could diminish the value of the strength of a reward. the beliefs used in each stage. We focus on the In the first case, notice that besides importance, goals states and make clear when a goal is considered there exists another criterion to evaluate the quality active, pursuable, chosen, and executable, because of the goal of an opponent, because it does not matter these states will be used in the strength calculation how important a goal is if it is not possible to be model that is proposed in this work. Following a brief achieved. And in the second case, the credibility description of each stage: execution level of the proponent (from the point of view of the opponent, i.e. the execution level 1. Activation stage: In this stage, goals are the proponent believes the opponent has about him) activated by means of motivating beliefs. For example, should also be considered. if the agent has the belief that today is Thursday, With respect to the first criterion, to determine it activates the goal of going to the French class, or how achievable a goal is, we will use the belief-based the motivating belief that today is sunny activates the goal processing model proposed by Castelfranchi and goal of playing football. When a motivating belief is Paglieri [5]. It can be considered an extension of satisfied, the supported goal becomes active. An active the belief-desire-intention model (BDI) [4] model, but goal can also be seen as a desire. unlike it, in Castelfranchi and Paglieri’s model, the 2. Evaluation stage: In this stage, goals processing of goals is divided in four stages: (i) are evaluated using assessment beliefs. When there activation, (ii) evaluation, (iii) deliberation, and (iv) are no assessment beliefs for a certain goal, it checking; and the states a goal can adopt are: (i) becomes pursuable. Three types of assessment beliefs active (=desire), (ii) pursuable, (iii) chosen, and (iv) were defined: (i) those that control that there is executable (=intention). The state of a goal changes no impossibility for a goal be pursued; (ii) those when it passes from one stage to the next. Thus, when that control goals that are realized in the world it passes the activation stage it becomes active, when autonomously and without the direct intervention of it passes the evaluation stage it becomes pursuable, the agent; and (iii) those that control goals that have and so on. A goal is closer to be achieved when it is already realized, and that will remain as such. closer of passing the last stage. 3. Deliberation stage: The aim of this stage is Part of our proposal for calculating the strength of a to act as a filter on the basis of incompatibilities and reward considers the state of the goals of the opponent. preferences among pursuable goals. Goals that pass Depending on this state, a goal can be considered this stage are called chosen goals. These beliefs are more or less rewardable, a goal is considered more concerned with the di↵erent forms of incompatibility rewardable when it is closer of the executable state among goals that lead an agent to choose among and less rewardable when its state is active. Thus, the them. For dealing with incompatibilities, an agent aim of this article is to propose a model for calculating uses preference beliefs. the strength of rewards by taking into account new criteria, which will lead to a more accurate calculation. 4. Checking stage: The aim of this stage is to evaluate whether the agent knows how to achieve a The paper is organized as follows: Section 2 goal and if it is capable of performing the required presents the goal processing model on which our actions to achieve a chosen goal, in other words if the strength calculation model is primarily based. In agent has a plan and he is capable of executing it. Section 3 a negotiating agent architecture that Goals that pass this stage are called executable goals considers the necessary mental states, structures and and have the same characteristics of intentions. functions that support our proposal is defined. A formal definition of reward and the mechanism for its generation are presented in Section 4. Section 5 2 A more detailed version of this model is presented in [5]. 9 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) 3 The agent by . Let Importance(goi , opj ) = be a function that returns the importance of a given goal; the opponent In this section, we define the basic and compound opj is taken into account as more that one opponent structures, and functions of the agent, which are may have the same goal but with di↵erent importance necessary in order to be able to generate rewards and degree. Finally, let Op Goals(opj ) = {goi , ...gok } be a calculate their strengths3 . This architecture is based function that returns all the goals of a given opponent. on the Castelfranchi and Paglieri’s model. Before defining the structures of an agent, let L Definition 3.3. (Function) Besides the functions be a first order logical language which will be used to defined previously, an agent is also equipped with the represent the goals and beliefs of the agent. ^,_ and ¬ following function: denote the logical connectives conjunction, disjunction - EvalRecomp(goj ) is a function that takes as input and negation, and ` stands for the classical inference. a rewardable goal of an opponent and returns a reward Definition 3.1. (Basic structures) An agent has action. This function lets the proponent choose an five basic structures: adequate reward action, that is within the possibilities - K is the knowledge base of the agent; of the proponent, thereby, he can fulfill the o↵ered - O p stores the opponents of the agent; reward. - G = Ga [ Gp [ Gc [ Ge is the set of goals of the agent, such that Ga the set of active goals, Gp the set 4 Construction of rewards of pursuable goals, Gc the set of chosen goals and Ge A reward is constructed based on two goals: the set of executable goals. It holds that Gx \ Gy = ;, 1. An outsourced goal of the proponent: for x, y 2 {a, p, c, e} and x 6= y; This kind of goal needs the opponent involvement for - GO = GOs [ GOa [ GOp [ GOc is the set of goals being achieved. For example, the goal of agent boss of the opponent, such that GOs is the set of sleeping is that employee works every weekend, for this goal goals of the opponent4 , GOa is the set of active goals of be achieved it is necessary that employee executes the opponent, GOp the set of pursuable ones, and GOc the required action. Considering the goal processing the set of chosen ones. It holds that Gx \ Gy = ; for stages defined in Section 2 the state of this goal is x, y 2 {s, a, p, c} and x 6= y. Finally, let State(goi ) = z executable. be a function that returns the state of a given goal; for z 2 {0, 1, 2, 3} where 0 means that the goal is sleeping, Definition 4.1. (Outsourced goal) An outsourced 1 active, 2 pursuable and 3 that it is chosen; goal gi is an expression of the form gi (opk , gi0 ), such - Rws stores the rewards constructed by the agent. that, opk 2 O p and gi0 is an action that opk has to The logical definition of a reward is given in Section 4. execute. Let f irst(gi ) = opk and second(gi ) = gi0 be the functions that return each component of gi . Definition 3.2. (Compound structures) These store characteristics of the basic structures. 2. The goal of an opponent: goi 2 GO is a - O pdet = {(opi , )} such that opi 2 O p and is the goal that proponent agent knows its opponent wants execution credibility level of rewards the proponent to achieve. For example, boss knows that employee has from the point of view of opponent opi . Hereafter, wants to visit his parents. Besides knowing the goal of we denote that 2 [0, 1] such that is a real from his opponent, the proponent has to know the state of the given interval. Let Level Execrw (opi ) = be a that goal and its importance. function that returns the execution credibility level for The construction of a reward begins when (i) an a given opponent agent; outsourced goal gi passes all the goal processing stages - GOdet = {(goi , , opj )} such that goi 2 GO is a and becomes executable and, (ii) after a failed first goal of opponent opj 2 O p whose importance is given attempt of proponent agent to make his opponent to do the requested action gi0 . The process of construction 3 In this work, we assume that the agent has in advance of a reward is the following: the necessary information (importance and state of the goals, and the value of the credibility level of execution of 1. Function Op goals returns the set of rewardable rewards) for generating and calculating the strength of rewards. Some interesting works about opponent modelling related to goals the proponent knows about the opponent argumentation are [7, 10]. opj . Let Sgo = Op goals(opj ) be the returned 4 Sleeping is one of the states a goal may take that is proposed set. in [6]. A goal is in sleeping state when it has not been activated yet. In this work, we use the sleeping state to denote those 2. If Sgo 6= ; goals of the opponent that are not necessarily active but that are important for him. For example, a mother knows that o↵ering (a) For each goj 2 Sgo : chocolates for her son may be a good reward, however it does not mean that the son has the goal “obtaining chocolates” always i. Obtain rop = EvalRecomp(goj ) to active. know a reward option for goal goj . 10 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) ii. Generate the two rules necessary for fulfill his rewards is an aspect that also influences the constructing a reward. The first is an strength calculation. expression of the form rrw1 = gi0 ! rop The strength calculation of a reward depends on: and the second is rrw2 = rop ! goj . 1. The goal of the opponent goi (or iii. Construct a reward and save it in Rws. rewardable goal): two aspects are considered: (a) Its importance: Like in some related works ([1], Rewards in Rws are called candidates. After the [3]), we will take into account the importance the strength calculation, the strongest one is sent to his rewardable goal has for the opponent. opponent to try to persuade him. (b) Its state: Let’s recall that we use 0 to denote Following, we present the formal definition of a that a goal is sleeping, 1 to denote that it is active, 2 reward, this is based on the definition given in to denote that it is pursuable, and 3 to denote that it [1], with some modifications that consider the agent is chosen. architecture proposed in the previous section. 2. Execution credibility level: It is also important that the proponent agent be able to execute Definition 4.2. (Reward) A reward is a triple rw = its rewards, from the point of view of his opponent. hRrw, gi0 , goj i, where: This value (represented in the proponent) reflects what - Rrw = rrw1 [ rrw2 contains both reward rules, the proponent believes the opponent thinks about the - gi0 = second(gi ), such that gi 2 Ge , execution level of credibility of the proponent. - Rrw [ {gi0 } ` goj such that goj 2 GO . Considering these aspects, the formalization of our Let’s call Rrw and gi0 the support of the reward and proposal is defined as follows. goj its conclusion. Definition 5.1. (Basic strength of a reward) The Example 4.1. Let us define the mental state of boss: basic strength of a reward depends on the importance Ge = g1 , where: and the state of the rewardable goal. Let rw = g1 = make(employee,‘work(weekend)’) is an hRrw, gi0 , goj i be a reward, its the basic strength is outsourced goal, and therefore g10 = work(weekend) obtained applying: GOa = {go1 }, where go1 = visit(parents), GOc = {go2 }, where go2 = f ix(car), GOdet = {(go1 , 0.8, employee), (go2 , 0.6, employee)}, State(goj ) + Importance(goj ) O p = {employee}, O pdet = {(employee, 1)}, STbasic (rw) = num states 1 Rws = {} 2 Let us suppose that agent employee rejected to do (1) action g10 = work(weekend). Therefore, boss begins Where num states = 4 is the total number of states the process of construction of candidate rewards: that an opponent goal can have and function State return the state of the opponent’s goal (0=sleeping, 1. Sgo = Op goals(employee), so Sgo = {go1 , go2 } 1=active, 2=pursuable, and 3=chosen). 2. Sgo 6= ;, then A direct consequence of the above definition is that the value of the basic strength of a reward is a real (a) EvalRecomp(go1 ) = have(holidays) value between 0 and 1. Formally: Generate Rrw1 = {g10 ! have(holidays), Property 5.1. Let rw = hRrw, gi0 , goj i be a reward. have(holidays) ! go1 } STbasic (rw) 2 [0, 1], where 0 represents the minimum Construct rw1 = hRrw1 , g10 , go1 i value and 1 represents the maximum value the basic (b) EvalRecomp(go2 ) = give(inter paym) strength can have. Generate Rrw2 = {g20 ! give(inter paym), Proof Since the value of the basic strength is based give(inter paym) ! go2 } on the values of the importance and the normalization Construct rw2 = hRrw2 , g20 , go2 i of the states value of goj , and this values are of between 0 and 1, then the value of the basic strength Finally, Rws = {rw1 , rw2 } is also limited by 0 and 1. 5 Strength calculation When the proponent agent constructs a set of The strength of a reward is mainly based on the rewards only for one opponent, the value of the “value” that the rewardable goal has for the opponent. basic strength is enough to choose the reward that On the other hand, the credibility the proponent has will be sent. Nevertheless, a more exact value can in the face of his opponent(s) regarding his ability to be obtained if the execution credibility level is also 11 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) considered. This aspect is even more important when STcomb (rw4 ) = 0.55 ⇥ 0.7 = 0.41 the proponent agent generates rewards for more than Therefore, the best option for boss is to sent reward one opponent as it will let him know which opponent rw2 to his opponent employee. may be convinced more quickly when faced with one of his rewards. 6 Related works Definition 5.2. (Combined strength of a Servapali et al. [9] propose a model where reward) The combined strength of a reward depends the rhetorical strength of rewards varies during on the basic strength of the reward and the execution the negotiation depending on the environmental credibility level of the proponent. Let rw = conditions. For calculating the strength value, it is hrrwk , gi0 , goj i be a reward and opn 2 O p the opponent taken into account a set of world states an agent can whose rewardable goal is goj . The combined strength be carried to by using a certain reward. The intensity of rw is obtained applying: of the strength depends on the desirability of each of these states. For a fair calculation, an average over all possible states is used. STcomb (rw) = STbasic (rw) ⇥ Level Execrw (opn ) (2) In [1], a formal definition of rewards and an evaluation system are presented. For the evaluation Property 5.2. The maximum value of the combined of strength of rewards, the certainty of beliefs that strength of a reward is at most the value of its basic are used for the generation of the reward and the strength: STcomb (rw) 2 [0, STbasic (rw)]. importance of the goal of the opponent are considered. The same authors have other later articles about Example 5.1. Let us continue with example 4.1: rhetorical arguments ([2], [3]). In these works, the State(go1 ) = 1, Importance(go1 ) = 0.8 calculation of strength of rewards is done always State(go2 ) = 3, Importance(go2 ) = 0.6 by taking into account the two criteria previously Level Execrw (employee) = 1 is the execution mentioned. For our proposal, we made a deeper credibility level of agent boss from the point of view of analysis of the components of a reward and defined agent employee. new criteria for calculating its strength. Another Applying equation 1, the basic strengths of rewards di↵erence is in relation to the reward rules, meanwhile in Rws are STbasic (rw1 ) = 0.57 and STbasic (rw2 ) = in Amgoud and Prade [1] it is part of the knowledge 0.8. Since agent boss generated rewards only for one base since the beginning, in our work it is constructed opponent, he can choose the strongest one without from an own goal of the proponent agent and the calculating the combined strengths, even more when goals of the opponent, giving more flexibility for the in this case the values of the combined strengths are generation of rewards. the same of the basic strengths. Therefore, boss would sends rw2 because it is the strongest reward. 7 Conclusions and future works Example 5.2. Let us suppose that boss has This work makes a deeper analysis of the components another opponent: employee2. Let us also of a reward and considers new criteria for the suppose that boss has generated two rewards calculation of their strength. Using these criteria, for employee2 with the following basic strengths: two forms for calculating the strength of a reward STbasic (rw3 ) = 0.85, STbasic (rw4 ) = 0.55, with were proposed: the basic strength and the combined Level Execrw (employee2) = 0.7. strength calculation. These two di↵erent ways of Taking into consideration only the basic strengths, calculus is one of the advantages of our proposal as, the strongest reward is rw3 = 0.85. This means that depending on the need of the situation, the proponent boss could send this reward to employee2 as it seems can use either the basic or the combined equation. that it would be more e↵ective than sending a reward We also presented a process for rewards to employee. However, the execution credibility level construction and an agent architecture based on of boss (from the point of view of employee2) is the goal processing model of Castelfranchi and lower than the execution credibility level from the Paglieri. We believe that our proposed process gives point of view of employee. Thus, the combined more flexibility for the generation of rewards as strengths (equation 2) have the same values of the the rewards rules are generated dynamically from basic strengths for employee, but di↵erent values for the set goals of the opponent and by applying the employee2: EvalRecomp function, which evaluates the rewardable STcomb (rw1 ) = 0.57 ⇥ 1 = 0.57 goal to return an adequate reward. STcomb (rw2 ) = 0.8 ⇥ 1 = 0.8 As future works, we want to analyse the calculation STcomb (rw3 ) = 0.85 ⇥ 0.7 = 0.6 of strength of rewards from the point of view of the 12 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds) opponent. Besides the importance and the state of the rewardable goal, the utility of the reward for the opponent may be considered. We also want to work on experience-based calculation; this would be done after the proponent receives the answer of the opponent, which can be positive or negative. When it is negative, a recalculation of the initial strength should be done. References [1] Leila Amgoud and Henri Prade. Threat, reward and explanatory arguments: generation and evaluation. In Proceedings of the ECAI Workshop on Computational Models of Natural Argument, pages 7376, 2004. [2] Leila Amgoud and Henri Prade. Handling threats, rewards, and explanatory arguments in a unified setting. International journal of intelligent systems, 20(12):11951218, 2005. [3] Leila Amgoud and Henri Prade. Formal handling of threats and rewards in a negotiation dialogue. In Argumentation in Multi-Agent Systems, pages 88103. Springer, 2006. [4] Michael Bratman. Intention, plans, and practical reason. 1987. [5] Cristiano Castelfranchi and Fabio Paglieri. The role of beliefs in goal dynamics: Prolegomena to a constructive theory of intentions. Synthese, 155(2):237263, 2007. [6] Cristiano Castelfranchi. Reasons: Belief support and goal dynamics. Mathware & Soft Computing, 3(1-2):233247, 2008. [7] Christos Hadjinikolis, Yiannis Siantos, Sanjay Modgil, Elizabeth Black, and Peter McBurney. Opponent modelling in persuasion dialogues. In IJCAI, 2013. [8] Sarit Kraus, Katia Sycara, and Amir Evenchik. Reaching agreements through argumentation: a logical model and implementation. Artificial Intelligence, 104(1):169, 1998. [9] Sarvapali D Ramchurn, Nicholas R Jennings, and Carles Sierra. Persuasive negotiation for autonomous agents: A rhetorical approach. 2003. [10] Tjitze Rienstra, Matthias Thimm, and Nir Oren. Opponent models with uncertainty for strategic argumentation. In IJCAI, 2013. [11] Katia P Sycara. Persuasive argumentation in negotiation. Theory and decision, 28(3):203 242, 1990. 13