=Paper= {{Paper |id=Vol-2659/persiani |storemode=property |title=Mediating joint intention with a dialogue management system |pdfUrl=https://ceur-ws.org/Vol-2659/persiani.pdf |volume=Vol-2659 |authors=Michele Persiani,Maitreyee Tewari |dblpUrl=https://dblp.org/rec/conf/ecai/PersianiT20 }} ==Mediating joint intention with a dialogue management system== https://ceur-ws.org/Vol-2659/persiani.pdf
Mediating Joint Intentions with a Dialogue Management System
                         Michele Persiani                                                      Maitreyee Tewari
                         Umeå University                                                        Umeå University
                          Umeå, Sweden                                                           Umeå, Sweden
                        michelep@cs.umu.se                                                     maittewa@cs.umu.se

ABSTRACT                                                                            Joint Intention π                        Joint Intention π

A necessary skill which enables machines to take part in decision
making processes with their users is the ability to participate in                 prepare
                                                                                                                         prepare
                                                                                                                                        fry eggs
                                                                                                                          salad
the mediation of joint intentions. This paper presents a formal-                    salad

isation of an architecture to create and mediate joint intentions
with an artificial agent. The proposed system is loosely based on               I want to                                               Ok, I can fry
                                                                             prepare a salad                                              some eggs
the framework of we-intentions and embodied on a combination of
Plan Recognition techniques to identify the user intention, and a
Reinforcement Learning network which learns how to best interact
with the inferred intention.
                                                                                     Joint Intention π                       Joint Intention π


KEYWORDS                                                                                                                  prepare
                                                                                 prepare                                                  set
                                                                                                  set                      salad
Joint Intentions, Robotics, Goal Recognition, Reinforcement Learn-                salad
                                                                                                 table
                                                                                                                                         table
ing
                                                                            No, you can set
                                                                           the table instead                                                     OK


1   INTRODUCTION
The socio-technological evolution of human society has motivated
the integration of robots in social and personal spaces. Hence, it
is becoming a pressuring requirement for social robotics to under-
stand human intentions and adapt to social values and needs.           Figure 1: Creation of a joint intention with a robot. During
   Among other reasons, humans interact to understand and me-          its turn each participant adds and removes tasks (or primi-
diate intentions with other human participants [16]. A successful      tive actions) from the shared intention π . Participants spec-
mediation of intention enable participants to decide profitable col-   ify what they or the other will do until a common agreement
laboration, to manage expectations, or to decide whether to trust      is met.
the other participant. Natural language dialogues are among the
primitive modes [4] of human-human interaction, and are also
consistently used to mediate intentions. Dialogue management              To answer the research question, the formalisation of joint inten-
strategies have exploited joint intention theory for building team     tions in the context of shared task planning, and defining dialogue
dialogues [15]. However, this work views joint intentions with an      act functions [6] was done. This formalism offered a turn-based
accent on joint task planning [13] for a human and a robot partici-    interaction scheme that allows two participant (human and a robot)
pant, rather than on the communicative protocols being involved.       to mediate an intention regarding a shared task.
   The objective of this work is to model mediation of intentions
for Human-Robot Interaction (HRI) in a household scenario, and is      2      METHODOLOGY
loosely based on the framework of we-intentions [17]. Within the
scenario, we explore the cases where a person could need assistance        Some of the previous work [7, 11] proposed team rationality for
from a robot such as: in cooking, finding different objects in the     building collaborative multi-agent systems, for example, in [11] the
house, preparing for a visit to the supermarket, doctor or a friend.   authors used Shared Plan [5] and Propose Trees to model collabora-
For instance, the person might say “I want to prepare a salad” to      tion as multi-agent planning problem, where a rational team will
a robot, possibly having an intention for the robot to help her in     perform an action only if the benefits from performing an action
cooking the dinner.                                                    is less than its cost. In [2] the authors formalised communication
   Hence, we explore the following research question: how to cre-      protocols using joint intention theory. The authors used joint per-
ate joint intention with machines? The motivation behind this          sistent goals and persistent weak achievement goals to build joint
research question belongs to desired specifications of AI systems,     intentions, and speech acts such as request, offer, inform, confirm,
including the need of an integrated cognition and collaboration        refuse, acknowledge, and standing-offer for their mediation.
mechanism, and a natural interaction between human and AI sys-
tems. Ultimately, it is about investigating the boundaries between     Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons
the Eco-system of AI with that of human-beings.                        License Attribution 4.0 International (CC BY 4.0).
                                                                                                                                 Michele Persiani and Maitreyee Tewari


    As later described, we propose certain assumptions to lift some                  that translates as there is always a joint intention between x and
of the complexity that previous research utilizes in the—context                     y. This assumption, while being quite strong, is quite reasonable
of joint intention theory. We believe that such complexities, while                  for our context as the proposed DM is specifically tailored to me-
theoretically sound, make implementations on real systems difficult                  diate joint intentions. During every dialogue a joint intention is
and brittle; for this reason, we utilize a simplification of previous                always obtained, and when the user leaves the conversation there
work’s formalization for our needs. The rest of the section provides                 is always an intention that was formed and is shared with the DM.
our simplified formalisation of mediating joint intention theory                     Furthermore, it is always the case that the user utilises the DM to
and attempts to briefly reason about the constraints posed.                          instantiate joint intentions. We do not take into consideration the
    Our proposed approach is based on predicate logic combined                       cases in which the joint intention is bootstrapped or terminated as
with planning, and is influenced by logic based semantics proposed                   for example shown in [1].
in [1, 2, 14]. Agents are represented by x, y, . . . x 1, x 2, . . . y1, y2, . . .      Following the given definitions, we propose an interaction mech-
and their actions by a 1, a 2, an . An intention of a single agent x is a            anism that allows two participants to collaboratively build O, by
plan π = {a 0, a 1, ..., an } of actions together with a goal д the agent            being able to add or remove actions from it. Currently, we have
is committed to [16] and the intention is partially observed through                 the following assumptions: 1) for every trial two participants are
O ⊆ π.                                                                               present, that is a human user and a Dialogue Manager (DM), that
    Know(x, p) ≡ p ∧ Bel(x, p) represents the knowledge of agents                    can be integrated for example in a house robot. 2) the DM is mod-
and MutBel(x, y, p) that x and y share a mutual belief about p—In                    elled to be user initiated, which always proposes the first action
our formulation an agent’s intention is represented by the predicate                 that will enter the set O.
Intend(x, д, O) while a joint intention JointIntend(x, y, д, O). An                     Having an observed set O in a form of a partial plan, the DM can
agent has an intention if following holds:                                           infer the most likely full intention π by utilizing plan recognition
                                                                                     techniques as later described. This inference is based on the current
                                                                                     state of the world that we assume to be available to the DM in the
                 Intend(x, д, O) ≡ Know(x, ∃π O ⊆ π ∧
                                                                                     form of truth predicates. A possible architecture for maintaining
                                Goal(π ) = д∧                                 (1)    an updated world description is not provided by this paper but can
                                Commit(x, π ))                                       be for example implemented as in [3].

i.e. not only is true that the agent has an intention and is committed
to it, but the agent also has a belief about it. The set O is an explicit
                                                                                     2.1     Goal Recognition and plan generation
subset of π for which it is known that the agent already committed                   At every turn of the dialogue the agent is required to infer the joint
to it, and contains past observations or declarations about future                   plan π to be able to participate in its mediation. For this purpose, we
commitments about π .                                                                utilize plan recognition techniques based on the Planning Domain
    Eq. 2 expresses that to provide agents an intention doesn’t require              Definition Language (PDDL) [8]. PDDL belongs to the group of
to explicit their full intention π , but only a part of it (see Figure 1),           planning techniques known as classical planning, and allows to
with the full intention being instead inferred by grounding the                      easily create non-hierarchical task domains.
observed commitments in the task space.                                                 For a given task domain we select the set of goals G as possible
    A joint intention is an intention shared by the agents x and y                   goals a user can pursue. Example of possible goals for Figure 1
with the same goal д. Therefore, a joint intention is a plan π =                     can be to prepare dinner, lunch or breakfast. Plan recognition is
{a x 0, ay0, a x 1, ..., ayn } together with a goal д where the actions              achieved by a modified version of the method proposed in [10] with
in π can be allocated to either participants x or y. Furthermore,                    the following differences: 1) we allow the PDDL planner to plan
the involved agents have a mutual believe MutBel about each oth-                     using partially instantiated actions1 , and 2) the observations O are
ers’ commitments. Hence, two agents hold a joint intention if the                    treated as a set rather than a sequence. Given an eventually empty
following holds true:                                                                set of observations O, goal recognition is performed as:

                                                                                                                                 C(∅, д)
                                                                                                                 д̂ = arдmax                                        (4)
     JointIntend(x, y, д, O) ≡ Intend(x, д, O) ∧ Intend(y, д, O)∧                                                        д ∈G    C(O, д)
                  MutBel(x, y, JointIntend(x, y, д, O))                       (2)
                                                                                     where C(O, д) is the cost of a plan achieving д and constrained to
By this formulation x and y are allowed to have separate beliefs and                 contain O, C(∅, д) is the cost of an optimal plan achieving д without
inference mechanisms through which they find π ; but are bound                                                                C(∅,д)
                                                                                     being constrained by O. Hence, 0 ≤ C(O ,д) ≤ 1 gives indication
to have the same goal and observed commitments. Notice that this
is a simplification of how joint intentions have been previously                     on how costly it is to deviate from an optimal plan achieving д for
formalized in the literature, and to which we invite the reader. Nev-                compliance with O. Finally, for an inferred goal д̂ we obtain π as
ertheless, this formalization is sufficient for our purpose of creating              the optimal plan achieving д while being constrained to contain
a dialogue manager that allows mediation of joint intentions. In                     the observations O.
this context of a dialogue between—two agents x and y we further
make the following assumption:                                                       1 We define a PDDL action as partially instantiated if not all of its arguments are
                                                                                     grounded in the task domain. An action is fully instantiated when all arguments are
                      |= ∃O∃д JointIntend(x, y, д, O)                         (3)    grounded.
Mediating Joint Intentions with a Dialogue Management System


                                                                                          Dialogue Act               Precondition              Effect
                                                                                          Offer a , x, a             a <θ ∧a < R ∧a < R        a∈θ
                                                                                          Offer д , x, д̂            ∅                         д = д̂
                                                                                          Counter-offer, x, a 1, a 2 a 1 ∈ θ a 2 < R ∧ a 2 < O a 1 < θ ∧ a 2 ∈ θ
                                                                                          Accept, x, a               a∈θ                       a <θ ∧a ∈ O
                                                                                          Reject, x, a               a∈θ                       a <θ ∧a ∈ R
                                                                                         Table 1: Speech acts for the SDS that allow to mediate actions
                                                                                         with respect to the sets of offered, rejected and accepted com-
                                                                                         mitments.



Figure 2: A traditional Spoken Dialogue Management Sys-                                     Since a dialogue policy based on Finite-State-Machines is not
tem.                                                                                     realizable as it would need to consider all possible intentions, also
                                                                                         based on the state of the task, we propose to learn the DM dialogue
                                                                                         policy with Reinforcement Learning methods. This approach is not
                                                                                         new in the context of dialogue management, and by this method
                                                                                         the user is simulated by an Agenda [12].

2.2     Mediation of Joint Intention                                                     2.3    Learning the agent strategy with
 The agent and the user have to use a medium to communicate their                               Reinforcement Learning (RL)
joint intention, and to negotiate goals д and commitments O. In                          At every turn, a Q-Network [18] evaluates the current inferred π
order to do that, we formalise a finite-state negotiation dialogue                       together with the actions in the sets θ, R and O and the current
strategy with following dialogue acts: (offer, counter-offer, accept,                    PDDL state, selecting which dialogue act to perform by an ϵ-greedy
and reject). The dialogue strategies will be implemented with a                          policy computed on the dialogue acts expected return. In RL, agents
spoken dialogue management system (SDS) 2.                                               learn which policy to adopt by maximising the reward they receive
   Traditionally, an SDS consists of speech synthesis that recognises                    during each episode. The current version of the reward function is:
and generates speech, natural language understanding (NLU) trans-
forms the human generated natural language to knowledge for the                                                           π ∩π    C(π, д)
                                                                                                            R = −αT + β        +γ                             (5)
machine. A dialogue manager (DM) makes the decision based on                                                              π ∪π    C(π, д)
the NLU and other components such as previous history, database
                                                                                         where π and д form the user’s original desired joint intention (held
etc, and natural language generation (NLG) receives the decision
                                                                                         by the Agenda). The first terms penalises every turn that the inter-
from the DM, transforms it to human understandable format and
                                                                                         action takes, hence making interactions as short as possible. The
sends to speech synthesizer.
                                                                                         second term evaluates how the final resulting intention is similar
   When the user generates its first utterance, it is transformed
                                                                                         to the one the user had as objective for the interaction. The third
from speech to text and arrives at the NLU component. The NLU
                                                                                         term evaluates instead the cost the resulting mediated intention
transforms the text to knowledge (semantic roles) and assigns a
                                                                                         has, compared to the user’s original one. α, β and γ determine how
dialogue act offer 2 . An offer from the user instantiates the plan
                                                                                         the three components of the reward function are weighted. Notice
π by performing plan recognition, and creates a joint intention as
                                                                                         that the system cannot access π and д, that are instead only used at
described by JointIntend.
                                                                                         the end of every interaction for evaluation. Thus, the the DM learns
   We define five dialogue acts Offer a , Offer д , Counter-offer, Accept
                                                                                         to mediate and improve the unobservable user intention π, д.
and Reject with which both user and DM can mediate the intention’s
goal д and commitments O. Table 1 contains the effects of these
dialogue acts with respect to three sets: θ is a set of offers, R and                    3     FUTURE WORK
O are respectively the sets of rejected and accepted commitments.
An Offera represents offer about an action, Offerg indicates offer                           The research is still in its early stages and we are currently imple-
about a goal, Counter-offer is an action a 1 is not accepted and an                      menting the described system. We developed the goal recognition
alternative a 2 is instead proposed. An Accept and Reject can be used                    and the reinforcement learning components together with a simple
to accept or reject proposed commitments.                                                user Agenda. The Agenda is based on PDDL and simulates how the
                                                                                         user would modify the joint intention during its turn, while having
                                                                                         as objective a randomly generated joint intention.
                                                                                             Initial experiments gave positive results, in the sense that the RL
                                                                                         is able to learn the structure of the problem for simple scenarios, and
                                                                                         successfully maximises the possible rewards. Several investigations
                                                                                         are needed and are still open: what is the Q-Network learning?
2 At this stage we define a finite state SDS and only the user can initiate a dialogue   Does our current setting allows any generalisation? The current
only using offer together with a goal or an action.                                      implementation requires hundreds of episodes to converge. Can
                                                                                                                                        Michele Persiani and Maitreyee Tewari


the process be made faster/simpler? How to facilitate the online                          [6] Bunt Harry, Petukhova Volha, Traum David, and Alexandersson Jan. 2017. Di-
adaptation over real users?                                                                   alogue Act Annotation with the ISO 24617-2 Standard. Springer International
                                                                                              Publishing, 109–135.
   Encapsulation of the joint intention model with SDS is still to                        [7] Ece Kamar, Ya’akov Gal, and Barbara J Grosz. 2009. Incorporating helpful behavior
be implemented. For early prototypes of the system we plan to                                 into collaborative planning. In Proceedings of The 8th International Conference on
                                                                                              Autonomous Agents and Multiagent Systems (AAMAS). Springer Verlag.
implement the dialogue manager as described in Section 2.2. Later                         [8] Drew McDermott. 2003. The Formal Semantics of Processes in PDDL. In Proc.
versions could see the implementation of a more complete SDS                                  ICAPS Workshop on PDDL. ICAPS Workshop, Trento, Italy, 8.
through for example a POMDP model [9]. This could allow to have                           [9] Michael Frederick McTear, Zoraida Callejas, and David Griol. 2016. The conversa-
                                                                                              tional interface. Vol. 6. Springer, Spain.
dialogues that are not strictly related to the mediation of the joint                    [10] Miguel Ramírez and Hector Geffner. 2010. Probabilistic plan recognition using
intention, but rather more flexible and intuitive for the user. Finally,                      off-the-shelf classical planners. In Twenty-Fourth AAAI Conference on Artificial
investigation about the soundness of this approach in real scenarios                          Intelligence. MIT Press, USA, 6.
                                                                                         [11] Timothy W. Rauenbusch and Barbara J. Grosz. 2003. A Decision Making Pro-
for example in user studies is still to be performed.                                         cedure for Collaborative Planning. In Proceedings of the Second International
                                                                                              Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS ’03).
ACKNOWLEDGMENTS                                                                               Association for Computing Machinery, 1106–1107.
                                                                                         [12] Jost Schatzmann, Blaise Thomson, and Steve Young. 2007. Statistical user simula-
This work has received funding from the European Union’s Horizon                              tion with a hidden agenda. In Proceedings of the 8th SIGdial Workshop on Discourse
                                                                                              and Dialogue. 273–282.
2020 research and innovation program under the Marie Skłodowska-                         [13] David P. Schweikard and Hans Bernhard Schmid. 2013. Collective Intentionality.
Curie grant agreement No 721619 for the SOCRATES project.                                     In The Stanford Encyclopedia of Philosophy (summer 2013 ed.), Edward N. Zalta
                                                                                              (Ed.). Metaphysics Research Lab, Stanford University, USA.
                                                                                         [14] Ira A Smith, Philip R Cohen, Jeffrey M Bradshaw, Mark Greaves, and Heather
REFERENCES                                                                                    Holmback. 1998. Designing conversation policies using joint intention theory. In
 [1] Philip R Cohen. 2019. Foundations of Collaborative Task-Oriented Dialogue:               Proceedings International Conference on Multi Agent Systems (Cat. No. 98EX160).
     What’s in a Slot?. In Proceedings of the 20th Annual SIGdial Meeting on Discourse        IEEE, 269–276.
     and Dialogue. 198–209.                                                              [15] Rajah Subramanian, Sanjeev Kumar, and Philip Cohen. 2006. Integrating Joint
 [2] Philip R Cohen and Hector J Levesque. 1990. Intention is choice with commitment.         Intention Theory, Belief Reasoning, and Communicative Action for Generating
     Artificial intelligence 42, 2-3 (1990), 213–261.                                         Team-Oriented Dialogue.. In Proceedings of the National Conference on Artificial
 [3] Sandra Devin and Rachid Alami. 2016. An implemented theory of mind to im-                Intelligence, Vol. 21. MIT Press, Boston, USA, 6.
     prove human-robot shared plans execution. In 2016 11th ACM/IEEE International       [16] Michael Tomasello, Malinda Carpenter, Josep Call, Tanya Behne, and Henrike
     Conference on Human-Robot Interaction (HRI). IEEE, 319–326.                              Moll. 2005. Understanding and sharing intentions: The origins of cultural cogni-
 [4] New World Encyclopedia. 2017. Dialogue — New World Encyclopedia,. //www.                 tion. Behavioral and Brain Sciences 28, 5 (2005), 675–691.
     newworldencyclopedia.org/p/index.php?title=Dialogue&oldid=1007366 [Online;          [17] Raimo Tuomela. 2005. We-intentions revisited. Philosophical Studies 125, 3 (2005),
     accessed 24-January-2020].                                                               327–369.
 [5] Barbara Grosz and Sarit Kraus. 1996. Collaborative plans for complex group          [18] Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learn-
     action. Artificial Intelligence (1996).                                                  ing with double q-learning. In Thirtieth AAAI conference on artificial intelligence.
                                                                                              MIT Press, Arizona, USA, 7.