Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)


                                      Strength calculation of rewards
                    Mariela Morveli-Espinoza, Ayslan T. Possebom and Cesar A. Tacla
                       CPGEI - Federal University of Technology - Paraná, Brazil
                           {morveli.espinoza, possebom, cesar.tacla}@gmail.com


                                                                     between the two agents. Therefore, as appropriate, a
                                                                     reward could be more e↵ective than a threat. In this
                          Abstract                                   work we study rewards, which have a positive nature
                                                                     as a proponent agent can entice an opponent of him to
        Persuasive negotiation involves negotiating                  do certain action by o↵ering to do another action as a
        using rhetorical arguments (such as threats,                 reward [1].
        rewards, and appeals), which act as persuasive                  Let’s see the following persuasive negotiation
        elements that aim to force or convince an                    scenario where boss is an agent proponent, employee
        opponent to accept a given proposal. In the                  an agent opponent and the goal of boss is that
        case of rewards, these have a positive nature as             employee works every weekend1 . Taking into account
        they use the argument that something positive                the knowledge base of agent boss, the following
        will happen to the opponent if he accepts to                 rewards can be generated:
        do the requirement sent by the proponent.
        A proponent agent can generate more than                       • boss: if you work every weekend, you will receive
        one reward depending on the information he                       an interim payment.
        has modeled of his opponent. The problem
                                                                       • boss: if you work every weekend, you will have
        appears when the agent has to choose a
                                                                         more holidays.
        reward, to send to his opponent, among a set
        of rewards. One measure that could help him                      The question is which of these rewards will boss
        in his choice is the strength each reward has.               choose to persuade employee to work every weekend?
        Thus, the goal of this work is to analyze the                One way of knowing this is by calculating the strength
        rewards components and to propose a model                    of the generated rewards. According to Ramchurn et
        for calculating their strength. We propose two               al.[9], a strong argument (in this case a reward) is one
        ways for calculating the strength of rewards                 that quickly convinces an opponent to do a proposal,
        depending on the kind of negotiation the agent               while a weak argument is less persuasive. Therefore,
        is participating. The first proposal is to be                calculating the strength of rewards is important in
        used when the agent negotiates only with one                 persuasive negotiation dialogues, since the quickness
        opponent, and the second when the agent                      of persuasion depends on it.
        negotiates with more than one opponent.                          Rewards are constructed using both proponent and
                                                                     opponent’s goals (for example, earning more money).
1       Introduction                                                 Some researches on this topic take into account the
                                                                     importance of the opponent’s goal for the agent
Persuasive negotiation involves negotiating using
                                                                     opponent and the certainty level of the beliefs used
rhetorical arguments, which act as persuasive elements
                                                                     for the argument generation [1][3]. However, there are
that aim to force or convince an opponent to accept a
                                                                     other additional criteria that are necessary. Following,
given proposal [9].
                                                                     some examples of situations that show this need:
   Although some authors argue that threats are the
                                                                         1. Agent boss knows that “visiting his parents” (g1 )
strongest rhetorical arguments ([11], [8]), the choice of
                                                                     is a more important goal for employee than “fixing
which kind of argument will be used by a proponent
                                                                     his car” (g2 ). Considering only the importance, boss
agent depends on the information he has modelled
                                                                     would use g1 for generating a reward. However, what
about his opponents. According to Ramchurn et al.[9],
                                                                     happens if boss knows that g1 is less achievable than
it also depends on the convenience of the proposal
for the proponent and the degree of trust that exists                  1 This scenario is inspired by the example presented in [1]


                                                                 1


    8
                                            Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)


g2 since visiting his parents is not possible for the         shows an analysis of the elements of a reward and our
moment because employee has a spine disorder that             proposed strength calculation model. In Section 6, the
does not let him travel long distances? In cases like         main related works are compared with our proposal.
this, the importance is no longer the best or the unique      Finally, Section 7 is devoted to some conclusions and
criterion, related to the goal of the opponent.               future works.
   2. Agent boss has already done rewards before and
he has rarely fulfilled it, and obviously employee knows
about it. In this case, the strength of a reward is
also influenced by the execution credibility that the
                                                              2     Belief-based goal processing model
proponent has from the point of view of his opponent.
                                                              In this section, the four stages of the goal processing
Thus, even when the goal of an opponent is very
                                                              model of Castelfranchi and Paglieri are presented2 .
important and/or achievable, a low level of credibility
                                                              The aim of this section is not to present in detail
could diminish the value of the strength of a reward.
                                                              the beliefs used in each stage. We focus on the
   In the first case, notice that besides importance,
                                                              goals states and make clear when a goal is considered
there exists another criterion to evaluate the quality
                                                              active, pursuable, chosen, and executable, because
of the goal of an opponent, because it does not matter
                                                              these states will be used in the strength calculation
how important a goal is if it is not possible to be
                                                              model that is proposed in this work. Following a brief
achieved. And in the second case, the credibility
                                                              description of each stage:
execution level of the proponent (from the point
of view of the opponent, i.e. the execution level                1. Activation stage: In this stage, goals are
the proponent believes the opponent has about him)            activated by means of motivating beliefs. For example,
should also be considered.                                    if the agent has the belief that today is Thursday,
   With respect to the first criterion, to determine          it activates the goal of going to the French class, or
how achievable a goal is, we will use the belief-based        the motivating belief that today is sunny activates the
goal processing model proposed by Castelfranchi and           goal of playing football. When a motivating belief is
Paglieri [5]. It can be considered an extension of            satisfied, the supported goal becomes active. An active
the belief-desire-intention model (BDI) [4] model, but        goal can also be seen as a desire.
unlike it, in Castelfranchi and Paglieri’s model, the            2.    Evaluation stage: In this stage, goals
processing of goals is divided in four stages: (i)            are evaluated using assessment beliefs. When there
activation, (ii) evaluation, (iii) deliberation, and (iv)     are no assessment beliefs for a certain goal, it
checking; and the states a goal can adopt are: (i)            becomes pursuable. Three types of assessment beliefs
active (=desire), (ii) pursuable, (iii) chosen, and (iv)      were defined: (i) those that control that there is
executable (=intention). The state of a goal changes          no impossibility for a goal be pursued; (ii) those
when it passes from one stage to the next. Thus, when         that control goals that are realized in the world
it passes the activation stage it becomes active, when        autonomously and without the direct intervention of
it passes the evaluation stage it becomes pursuable,          the agent; and (iii) those that control goals that have
and so on. A goal is closer to be achieved when it is         already realized, and that will remain as such.
closer of passing the last stage.                                3. Deliberation stage: The aim of this stage is
   Part of our proposal for calculating the strength of a     to act as a filter on the basis of incompatibilities and
reward considers the state of the goals of the opponent.      preferences among pursuable goals. Goals that pass
Depending on this state, a goal can be considered             this stage are called chosen goals. These beliefs are
more or less rewardable, a goal is considered more            concerned with the di↵erent forms of incompatibility
rewardable when it is closer of the executable state          among goals that lead an agent to choose among
and less rewardable when its state is active. Thus, the       them. For dealing with incompatibilities, an agent
aim of this article is to propose a model for calculating     uses preference beliefs.
the strength of rewards by taking into account new
criteria, which will lead to a more accurate calculation.        4. Checking stage: The aim of this stage is to
                                                              evaluate whether the agent knows how to achieve a
   The paper is organized as follows: Section 2
                                                              goal and if it is capable of performing the required
presents the goal processing model on which our
                                                              actions to achieve a chosen goal, in other words if the
strength calculation model is primarily based. In
                                                              agent has a plan and he is capable of executing it.
Section 3 a negotiating agent architecture that
                                                              Goals that pass this stage are called executable goals
considers the necessary mental states, structures and
                                                              and have the same characteristics of intentions.
functions that support our proposal is defined. A
formal definition of reward and the mechanism for
its generation are presented in Section 4. Section 5              2 A more detailed version of this model is presented in [5].


                                                                                                                       9
    Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)


3        The agent                                                     by . Let Importance(goi , opj ) = be a function that
                                                                       returns the importance of a given goal; the opponent
In this section, we define the basic and compound
                                                                       opj is taken into account as more that one opponent
structures, and functions of the agent, which are
                                                                       may have the same goal but with di↵erent importance
necessary in order to be able to generate rewards and
                                                                       degree. Finally, let Op Goals(opj ) = {goi , ...gok } be a
calculate their strengths3 . This architecture is based
                                                                       function that returns all the goals of a given opponent.
on the Castelfranchi and Paglieri’s model.
   Before defining the structures of an agent, let L                   Definition 3.3. (Function) Besides the functions
be a first order logical language which will be used to                defined previously, an agent is also equipped with the
represent the goals and beliefs of the agent. ^,_ and ¬                following function:
denote the logical connectives conjunction, disjunction                   - EvalRecomp(goj ) is a function that takes as input
and negation, and ` stands for the classical inference.                a rewardable goal of an opponent and returns a reward
Definition 3.1. (Basic structures) An agent has                        action. This function lets the proponent choose an
five basic structures:                                                 adequate reward action, that is within the possibilities
   - K is the knowledge base of the agent;                             of the proponent, thereby, he can fulfill the o↵ered
   - O p stores the opponents of the agent;                            reward.
   - G = Ga [ Gp [ Gc [ Ge is the set of goals of the
agent, such that Ga the set of active goals, Gp the set                4      Construction of rewards
of pursuable goals, Gc the set of chosen goals and Ge                  A reward is constructed based on two goals:
the set of executable goals. It holds that Gx \ Gy = ;,                   1. An outsourced goal of the proponent:
for x, y 2 {a, p, c, e} and x 6= y;                                    This kind of goal needs the opponent involvement for
   - GO = GOs [ GOa [ GOp [ GOc is the set of goals                    being achieved. For example, the goal of agent boss
of the opponent, such that GOs is the set of sleeping                  is that employee works every weekend, for this goal
goals of the opponent4 , GOa is the set of active goals of             be achieved it is necessary that employee executes
the opponent, GOp the set of pursuable ones, and GOc                   the required action. Considering the goal processing
the set of chosen ones. It holds that Gx \ Gy = ; for                  stages defined in Section 2 the state of this goal is
x, y 2 {s, a, p, c} and x 6= y. Finally, let State(goi ) = z           executable.
be a function that returns the state of a given goal; for
z 2 {0, 1, 2, 3} where 0 means that the goal is sleeping,              Definition 4.1. (Outsourced goal) An outsourced
1 active, 2 pursuable and 3 that it is chosen;                         goal gi is an expression of the form gi (opk , gi0 ), such
   - Rws stores the rewards constructed by the agent.                  that, opk 2 O p and gi0 is an action that opk has to
The logical definition of a reward is given in Section 4.              execute. Let f irst(gi ) = opk and second(gi ) = gi0 be
                                                                       the functions that return each component of gi .
Definition 3.2. (Compound structures) These
store characteristics of the basic structures.                            2. The goal of an opponent: goi 2 GO is a
   - O pdet = {(opi , )} such that opi 2 O p and is the                goal that proponent agent knows its opponent wants
execution credibility level of rewards the proponent                   to achieve. For example, boss knows that employee
has from the point of view of opponent opi . Hereafter,                wants to visit his parents. Besides knowing the goal of
we denote that 2 [0, 1] such that is a real from                       his opponent, the proponent has to know the state of
the given interval. Let Level Execrw (opi ) = be a                     that goal and its importance.
function that returns the execution credibility level for                 The construction of a reward begins when (i) an
a given opponent agent;                                                outsourced goal gi passes all the goal processing stages
   - GOdet = {(goi , , opj )} such that goi 2 GO is a                  and becomes executable and, (ii) after a failed first
goal of opponent opj 2 O p whose importance is given                   attempt of proponent agent to make his opponent to
                                                                       do the requested action gi0 . The process of construction
    3 In this work, we assume that the agent has in advance
                                                                       of a reward is the following:
the necessary information (importance and state of the
goals, and the value of the credibility level of execution of              1. Function Op goals returns the set of rewardable
rewards) for generating and calculating the strength of rewards.
Some interesting works about opponent modelling related to                    goals the proponent knows about the opponent
argumentation are [7, 10].                                                    opj . Let Sgo = Op goals(opj ) be the returned
    4 Sleeping is one of the states a goal may take that is proposed
                                                                              set.
in [6]. A goal is in sleeping state when it has not been activated
yet. In this work, we use the sleeping state to denote those               2. If Sgo 6= ;
goals of the opponent that are not necessarily active but that are
important for him. For example, a mother knows that o↵ering                    (a) For each goj 2 Sgo :
chocolates for her son may be a good reward, however it does not
mean that the son has the goal “obtaining chocolates” always                         i. Obtain rop = EvalRecomp(goj ) to
active.                                                                                 know a reward option for goal goj .


    10
                                            Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)


             ii. Generate the two rules necessary for         fulfill his rewards is an aspect that also influences the
                 constructing a reward. The first is an       strength calculation.
                 expression of the form rrw1 = gi0 ! rop         The strength calculation of a reward depends on:
                 and the second is rrw2 = rop ! goj .            1.       The goal of the opponent goi (or
            iii. Construct a reward and save it in Rws.       rewardable goal): two aspects are considered:
                                                                 (a) Its importance: Like in some related works ([1],
    Rewards in Rws are called candidates. After the           [3]), we will take into account the importance the
strength calculation, the strongest one is sent to his        rewardable goal has for the opponent.
opponent to try to persuade him.                                 (b) Its state: Let’s recall that we use 0 to denote
    Following, we present the formal definition of a          that a goal is sleeping, 1 to denote that it is active, 2
reward, this is based on the definition given in              to denote that it is pursuable, and 3 to denote that it
[1], with some modifications that consider the agent          is chosen.
architecture proposed in the previous section.                   2.      Execution credibility level: It is also
                                                              important that the proponent agent be able to execute
Definition 4.2. (Reward) A reward is a triple rw =            its rewards, from the point of view of his opponent.
hRrw, gi0 , goj i, where:                                     This value (represented in the proponent) reflects what
   - Rrw = rrw1 [ rrw2 contains both reward rules,            the proponent believes the opponent thinks about the
   - gi0 = second(gi ), such that gi 2 Ge ,                   execution level of credibility of the proponent.
   - Rrw [ {gi0 } ` goj such that goj 2 GO .                     Considering these aspects, the formalization of our
   Let’s call Rrw and gi0 the support of the reward and       proposal is defined as follows.
goj its conclusion.
                                                              Definition 5.1. (Basic strength of a reward) The
Example 4.1. Let us define the mental state of boss:          basic strength of a reward depends on the importance
   Ge = g1 , where:                                           and the state of the rewardable goal. Let rw =
   g1 = make(employee,‘work(weekend)’) is an                  hRrw, gi0 , goj i be a reward, its the basic strength is
outsourced goal, and therefore g10 = work(weekend)            obtained applying:
   GOa = {go1 }, where go1 = visit(parents),
   GOc = {go2 }, where go2 = f ix(car),
   GOdet = {(go1 , 0.8, employee), (go2 , 0.6, employee)},                       State(goj )
                                                                                              + Importance(goj )
   O p = {employee}, O pdet = {(employee, 1)},                 STbasic (rw) = num   states 1
   Rws = {}                                                                                    2
   Let us suppose that agent employee rejected to do                                                            (1)
action g10 = work(weekend). Therefore, boss begins               Where num states = 4 is the total number of states
the process of construction of candidate rewards:             that an opponent goal can have and function State
                                                              return the state of the opponent’s goal (0=sleeping,
    1. Sgo = Op goals(employee), so Sgo = {go1 , go2 }        1=active, 2=pursuable, and 3=chosen).

    2. Sgo 6= ;, then                                            A direct consequence of the above definition is that
                                                              the value of the basic strength of a reward is a real
        (a) EvalRecomp(go1 ) = have(holidays)                 value between 0 and 1. Formally:
            Generate Rrw1 = {g10 ! have(holidays),
                                                              Property 5.1. Let rw = hRrw, gi0 , goj i be a reward.
            have(holidays) ! go1 }
                                                              STbasic (rw) 2 [0, 1], where 0 represents the minimum
            Construct rw1 = hRrw1 , g10 , go1 i               value and 1 represents the maximum value the basic
       (b) EvalRecomp(go2 ) = give(inter paym)                strength can have.
           Generate Rrw2 = {g20 ! give(inter paym),
                                                              Proof Since the value of the basic strength is based
           give(inter paym) ! go2 }
                                                              on the values of the importance and the normalization
           Construct rw2 = hRrw2 , g20 , go2 i                of the states value of goj , and this values are of
                                                              between 0 and 1, then the value of the basic strength
       Finally, Rws = {rw1 , rw2 }
                                                              is also limited by 0 and 1.
5      Strength calculation                                      When the proponent agent constructs a set of
The strength of a reward is mainly based on the               rewards only for one opponent, the value of the
“value” that the rewardable goal has for the opponent.        basic strength is enough to choose the reward that
On the other hand, the credibility the proponent has          will be sent. Nevertheless, a more exact value can
in the face of his opponent(s) regarding his ability to       be obtained if the execution credibility level is also


                                                                                                                  11
 Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)


considered. This aspect is even more important when                STcomb (rw4 ) = 0.55 ⇥ 0.7 = 0.41
the proponent agent generates rewards for more than                Therefore, the best option for boss is to sent reward
one opponent as it will let him know which opponent              rw2 to his opponent employee.
may be convinced more quickly when faced with one
of his rewards.                                                  6     Related works
Definition 5.2. (Combined strength of a                          Servapali et al.        [9] propose a model where
reward) The combined strength of a reward depends                the rhetorical strength of rewards varies during
on the basic strength of the reward and the execution            the negotiation depending on the environmental
credibility level of the proponent.            Let rw =          conditions. For calculating the strength value, it is
hrrwk , gi0 , goj i be a reward and opn 2 O p the opponent       taken into account a set of world states an agent can
whose rewardable goal is goj . The combined strength             be carried to by using a certain reward. The intensity
of rw is obtained applying:                                      of the strength depends on the desirability of each of
                                                                 these states. For a fair calculation, an average over all
                                                                 possible states is used.
 STcomb (rw) = STbasic (rw) ⇥ Level Execrw (opn ) (2)               In [1], a formal definition of rewards and an
                                                                 evaluation system are presented. For the evaluation
Property 5.2. The maximum value of the combined                  of strength of rewards, the certainty of beliefs that
strength of a reward is at most the value of its basic           are used for the generation of the reward and the
strength: STcomb (rw) 2 [0, STbasic (rw)].                       importance of the goal of the opponent are considered.
                                                                 The same authors have other later articles about
Example 5.1. Let us continue with example 4.1:                   rhetorical arguments ([2], [3]). In these works, the
   State(go1 ) = 1, Importance(go1 ) = 0.8                       calculation of strength of rewards is done always
   State(go2 ) = 3, Importance(go2 ) = 0.6                       by taking into account the two criteria previously
   Level Execrw (employee) = 1 is the execution                  mentioned. For our proposal, we made a deeper
credibility level of agent boss from the point of view of        analysis of the components of a reward and defined
agent employee.                                                  new criteria for calculating its strength. Another
   Applying equation 1, the basic strengths of rewards           di↵erence is in relation to the reward rules, meanwhile
in Rws are STbasic (rw1 ) = 0.57 and STbasic (rw2 ) =            in Amgoud and Prade [1] it is part of the knowledge
0.8. Since agent boss generated rewards only for one             base since the beginning, in our work it is constructed
opponent, he can choose the strongest one without                from an own goal of the proponent agent and the
calculating the combined strengths, even more when               goals of the opponent, giving more flexibility for the
in this case the values of the combined strengths are            generation of rewards.
the same of the basic strengths. Therefore, boss would
sends rw2 because it is the strongest reward.                    7     Conclusions and future works
Example 5.2. Let us suppose that boss has                        This work makes a deeper analysis of the components
another opponent:       employee2.      Let us also              of a reward and considers new criteria for the
suppose that boss has generated two rewards                      calculation of their strength. Using these criteria,
for employee2 with the following basic strengths:                two forms for calculating the strength of a reward
STbasic (rw3 ) = 0.85, STbasic (rw4 ) = 0.55, with               were proposed: the basic strength and the combined
Level Execrw (employee2) = 0.7.                                  strength calculation. These two di↵erent ways of
   Taking into consideration only the basic strengths,           calculus is one of the advantages of our proposal as,
the strongest reward is rw3 = 0.85. This means that              depending on the need of the situation, the proponent
boss could send this reward to employee2 as it seems             can use either the basic or the combined equation.
that it would be more e↵ective than sending a reward                We also presented a process for rewards
to employee. However, the execution credibility level            construction and an agent architecture based on
of boss (from the point of view of employee2) is                 the goal processing model of Castelfranchi and
lower than the execution credibility level from the              Paglieri. We believe that our proposed process gives
point of view of employee. Thus, the combined                    more flexibility for the generation of rewards as
strengths (equation 2) have the same values of the               the rewards rules are generated dynamically from
basic strengths for employee, but di↵erent values for            the set goals of the opponent and by applying the
employee2:                                                       EvalRecomp function, which evaluates the rewardable
   STcomb (rw1 ) = 0.57 ⇥ 1 = 0.57                               goal to return an adequate reward.
   STcomb (rw2 ) = 0.8 ⇥ 1 = 0.8                                    As future works, we want to analyse the calculation
   STcomb (rw3 ) = 0.85 ⇥ 0.7 = 0.6                              of strength of rewards from the point of view of the


 12
                                            Proceedings of CMNA 2016 - Floris Bex, Floriana Grasso, Nancy Green (eds)


opponent. Besides the importance and the state of
the rewardable goal, the utility of the reward for the
opponent may be considered.
   We also want to work on experience-based
calculation; this would be done after the proponent
receives the answer of the opponent, which can
be positive or negative. When it is negative, a
recalculation of the initial strength should be done.

References
[1] Leila Amgoud and Henri Prade. Threat, reward
   and explanatory arguments:        generation and
   evaluation. In Proceedings of the ECAI Workshop
   on Computational Models of Natural Argument,
   pages 7376, 2004.
[2] Leila Amgoud and Henri Prade. Handling threats,
   rewards, and explanatory arguments in a unified
   setting. International journal of intelligent systems,
   20(12):11951218, 2005.
[3] Leila Amgoud and Henri Prade. Formal handling
   of threats and rewards in a negotiation dialogue.
   In Argumentation in Multi-Agent Systems, pages
   88103. Springer, 2006.
[4] Michael Bratman. Intention, plans, and practical
   reason. 1987.
[5] Cristiano Castelfranchi and Fabio Paglieri. The
   role of beliefs in goal dynamics: Prolegomena
   to a constructive theory of intentions. Synthese,
   155(2):237263, 2007.
[6] Cristiano Castelfranchi. Reasons: Belief support
   and goal dynamics. Mathware & Soft Computing,
   3(1-2):233247, 2008.
[7] Christos Hadjinikolis, Yiannis Siantos, Sanjay
   Modgil, Elizabeth Black, and Peter McBurney.
   Opponent modelling in persuasion dialogues. In
   IJCAI, 2013.
[8] Sarit Kraus, Katia Sycara, and Amir Evenchik.
   Reaching agreements through argumentation:
   a logical model and implementation. Artificial
   Intelligence, 104(1):169, 1998.
[9] Sarvapali D Ramchurn, Nicholas R Jennings,
   and Carles Sierra. Persuasive negotiation for
   autonomous agents: A rhetorical approach. 2003.
[10] Tjitze Rienstra, Matthias Thimm, and Nir Oren.
   Opponent models with uncertainty for strategic
   argumentation. In IJCAI, 2013.
[11] Katia P Sycara. Persuasive argumentation in
   negotiation. Theory and decision, 28(3):203 242,
   1990.


                                                                                                                  13