<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reinforcement learning of dialogue coherence and relevance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sultan Alahmari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommy Yuan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Kudenko</string-name>
          <email>daniel.kudenko@york.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of York, Department of Computer Science</institution>
          ,
          <addr-line>Deramore Lane York YO10 5GH, UK smsa500, tommy.yuan and</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In multi-agent systems, agents communicate with each other, using argumentation as one type of communication. Agents argue to resolve conflicts between them. In our previous work, an agent learnt how to argue in an abstract argumentation system and a dialogue game based argumentation. This research looks at improving the agent's performance as well as the coherence and relevance of the dialogue. We use a reinforcement learning method to encourage our agent to improve its performance and the coherence of the dialogue. We propose a new formula that motivated the agent to achieve a higher reward based on three different attributes: number of moves, number of contradictions and number of focus switches. The results were promising and the agent is able to learn how to argue and reach a good level of performance with regard to winning the argumentation and generating high quality dialogue contribution.</p>
      </abstract>
      <kwd-group>
        <kwd>Multi-agent systems</kwd>
        <kwd>Argumentation</kwd>
        <kwd>Dialogue game</kwd>
        <kwd>Persuasion dialogue</kwd>
        <kwd>Quality measures</kwd>
        <kwd>Reinforcement learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In the past few decades, argumentation has played an important role and has been
widely studied in artificial intelligence (AI) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A review of further significant research
in the field can be seen from Prakken [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. One significant development was Dung’s [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
introduction of the abstract argumentation framework which assumes a directed graph
to represent arguments as nodes and attack relations as arcs.
      </p>
      <p>
        In our previous work [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1–3</xref>
        ] we used Dung’s framework and allowed different agents
to play an argument game [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. One of the agents was based on reinforcement learning
(RL) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and the aim of our research was to allow agents to learn how to argue against
different baseline agents. Some limitations were revealed in our earlier work [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1–3</xref>
        ]. One
of which was not being able to generalise policy for different abstract argumentation
graphs. The reason for this is difficult to discern. Some patterns that could help our
learning agent transfer experience from one domain to another are hard to learn without
reference to the internal structure of the arguments.
      </p>
      <p>
        Therefore, this motivated us to move to propositional-logic based representation and
a richer dialogue model [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. We assert that argument patterns, i.e. argument schemes
and sources of evidence, could encourage our RL agent to learn [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and in our work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
we used an influential logic-based dialogue model the “DE” model [
        <xref ref-type="bibr" rid="ref25 ref26 ref29">25, 26, 29</xref>
        ]. There
are some advantages in adopting the DE model rather than another model, for example
it allows enough room for strategic formations and a strategy is essential for an agent
to make high quality dialogue contributions. DE computational agents have been built
with hard-coded heuristic strategies [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], so they can be directly used as baseline agents.
In addition, the DE has simple dialogue rules that control the evolving dialogue [
        <xref ref-type="bibr" rid="ref25 ref26">25,26</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the performance of our RL agent showed promising result, improving over
hard-coded agents. We also demonstrated that our RL agent was able to learn to win an
argument game against baseline agents with the minimum number of moves. It would
be worthwhile investigating whether the dialogue contributions made by the RL agents
are of high quality in terms of coherence and relevance. Indeed, this will contribute to
improving the agent’s performance by learning how to win with the minimum number
of moves in a fluent and coherent manner.
      </p>
      <p>The rest of this paper is organised as follows: Section 2 introduces reinforcement
learning for the DE dialogue game, Section 3 discusses the measuring of dialogue
coherence and relevance, Section 4 introduces the experiment and discusses the results,
Section 5 gives conclusions and our planned future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Reinforcement learning for a DE dialogue game</title>
      <p>
        The DE dialogue model [
        <xref ref-type="bibr" rid="ref24 ref26 ref29">24, 26, 29</xref>
        ] was developed by Yuan [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] based on Mackenzie’s
DC system [
        <xref ref-type="bibr" rid="ref10 ref14">10, 14</xref>
        ]. The DE dialogue game defines the rules for participants making
moves [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. There are five move types in the DE game, namely Assertion, Questions,
Challenges, Withdrawals and Resolution demands [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. The DE model allows each
agent to have its own public commitment store which contains statements that have
been stated or accepted by a speaker [
        <xref ref-type="bibr" rid="ref26 ref4">4, 26</xref>
        ]. The commitment store has two lists, an
assertion list, which contains the statements that have been explicitly asserted by the
speaker and a concession list, which contains the statements that have been implicitly
accepted by the speaker [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. There are commitment rules that are used to update the
commitment store, they are quoted from [
        <xref ref-type="bibr" rid="ref24 ref26 ref29">24, 26, 29</xref>
        ] as follows:
1. Initial commitment,CR0: The initial commitment of each participant is null.
2. Withdrawals,CRW : After the withdrawal of P , the statement P is not included in
the move.
3. Statements,CRS :After a statement P , unless the preceding event was a challenge,
P is included in the move maker’s assertion list and the dialogue partner’s
concession list, and :P will be removed from the move maker’s concession list if it is
there.
4. Defence,CRY S :After a statement P , if the preceding event was Why Q?, P and If
P then Q are included in the move maker’s assertion list and the dialogue partner’s
concession list, and :P and :(If P then Q) are removed from the move maker’s
concession list if they are there.
5. Challenges,CRY : A challenge of P results in P being removed from the store of
the move maker’s if it is there.
      </p>
      <p>
        Dialogue rules that an agent must follow during the dialogue are taken from [
        <xref ref-type="bibr" rid="ref24 ref26 ref27 ref29">24, 26,
27, 29</xref>
        ] as follows:
1. RF ROM : Each participant or agent can make one of the permitted types of move
in turn.
2. RRE P S T AT : Mutual commitment may not be asserted until answering the
question or challenge.
3. RQU E S T : The possible answers to question P can be “P”, “:P” or “No
commitment”.
4. RC H ALL: “Why P?” can be answered by withdrawal of P, a statement to the
challenger or resolution demand for any commitments of the challenger which imply
P.
5. RRE S OLV E : A resolution demand can happen only if the opponent has
inconsistent statements in the commitment store.
6. RRE S OLU T I ON : A resolution demand has to be followed by withdrawal of one
of the offending conjuncts or affirmation of the disputed consequent.
7. RLE GALC H AL: The agent can challenge the opponent “Why P?”, unless P is on
the assertion list of the opponent’s dialogue.
      </p>
      <p>
        There are different reasons for adopting the DE dialogue model in this paper. One
of the reasons is that the model leaves enough room for the agent to do strategy
formation [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. The strategy is the main core for the agent to make high quality
dialogue contributions. In addition, the computational agents used in the DE dialogue
model were built with hard-coded heuristic strategies, hence the model has shown
benefits over other models because of its computational tractability and simple dialogue
rules [
        <xref ref-type="bibr" rid="ref25 ref26 ref27 ref29">25–27, 29</xref>
        ]. The DE model was also built with propositional logic, which we
consider to be a move forward from the abstract level of argumentation to the internal
level of the argument [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. Thus, the DE model is more sophisticated and richer, since
the dialogue state can be represented using different aspects, such as the commitment
store, and different move types, for example, questions. We would expect the DE game
to facilitate an effective learning experience for a computational agent, improving both
dialogue coherence and relevance.
      </p>
      <p>
        Before engaging our RL agent in the DE dialogue model, we will briefly review
reinforcement learning and the structure of our agent. Reinforcement learning is one of
the most common areas of machine learning [
        <xref ref-type="bibr" rid="ref18 ref22">18, 22</xref>
        ]. Agents interact with an
environment in order to map states with a particular action and receive rewards, as illustrated
in Figure 1 [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The agent explores a policy which involves mapping a state with an
action by using trial and error [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the DE dialogue model, our RL agent needs to
engage with the model in order to play the dialogue game with other computational
agents. As a result, agents need to persuade each other what they believe.
      </p>
      <p>
        To design the RL agent it is necessary to identify the state action and reward for our
agent. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we identify these properties in detail. The state will be (previousmove [
CS1 [ CS2) (where CS1 is the commitment store for the proponent and CS2 is the
commitment store for the opponent) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Actions are defined as the available move
types in the DE model [
        <xref ref-type="bibr" rid="ref25 ref29">25, 29</xref>
        ] which are assert, question, challenge, withdraw and
resolution demand as well as move contents, this means actions are also defined as the
move content which is a proposition or conjunct of propositions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Therefore, the RL
agent aims to map a state with a particular action to identify the policy [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The RL
agent will receive a reward based on the action that it takes, so that positive actions
receive positive reward and vice versa. The RL agent focuses on maximising its utility
based on the long-term reward gained through repeated episodes during the game. The
reward function in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is defined to allow the RL agent to seek to win with the minimum
number of moves as in Equation (1):
      </p>
      <p>W
R = 100 + (1)</p>
      <p>
        L
such that W is the number of moves in a first winning episode (the benchmark), L is the
number of moves in the current episode. Hence when L is at the minimum value, the
reward will tend to increase. In addition, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used the Q-learning algorithm (Equation 2):
Q(st; at)
      </p>
      <p>Q(st; at) + [rt+1 +
max Q(st; at+1)
a</p>
      <p>Q(st; at)]
(2)
3</p>
    </sec>
    <sec id="sec-3">
      <title>Measuring dialogue coherence and relevance</title>
      <p>
        This is our previous work in this area. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we showed that the reinforcement learning
(RL) agent played against different baseline agents, based on the DE dialogue game.
The results were promising and the performance of the RL agent rapidly increased
against the baseline agent in figure 2 and 3 respectively.
      </p>
      <p>
        In addition, the reward shaping let the agent win the game with the minimum
number of moves. It was then thought worthwhile to test whether the agent maintained
coherence and relevance in a dialogue. The literature has a number of different
approaches to measure the dialogue, for example persuasion dialogue [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], measuring an
agent’s uncertainty negotiation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], measuring the argument strength through applying
the concept of value of a game, defined in game theory [
        <xref ref-type="bibr" rid="ref13 ref15">13, 15</xref>
        ] and measuring dialogue
games based on the external agent’s point of view [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        In particular, Amgound et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed three different measures: the quality of
the exchanged arguments, the agent’s behaviour, such as coherence and aggressiveness,
and measuring the quality of the dialogue itself, for example for relevance. Amgoud et
al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] argue that these measures are of great importance, because they can be used as
guidelines for protocols between participants in order to make high quality dialogue.
Weide [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] supports these measures being used as benchmarks for the agent to decide
which dialogue move they should choose. Based on our work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] we consider two
criteria from [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]: coherence and relevance. We believe that coherence and relevance can
be used to assess the RL agent learning to argue in the minimum number of moves, and
to contribute high quality dialogue against different baseline agents. In addition, our
system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is based on persuasion dialogue and requires measuring and evaluating the
quality of the dialogue in different aspects. Coherence [
        <xref ref-type="bibr" rid="ref16 ref5 ref8">5,8,16</xref>
        ] is based on a persuasion
dialogue where an agent attempts to defend what it believes and does not contradict
itself. So, in this paper we introduce a formula to measure the percentage of incoherence
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to evaluate how the agent was incoherent in a dialogue. This formula (Equation 3)
depends on how many times the agent has contradicted itself during the dialogue with
respect to the number of moves for the agent.
      </p>
      <p>AgentIncoherent =</p>
      <sec id="sec-3-1">
        <title>N umberOf Contradiction N umberOf M oves</title>
        <p>
          Relevance in dialogue concerns an agent making a move that does not deviate from
the subject of the dialogue [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. However, in the DE dialogue game all moves are related
to the subject [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]; therefore, it was necessary to find a way to measure the relevance
of both agents in our system in another way. Since one of the strategies in the DE
dialogue game [
          <xref ref-type="bibr" rid="ref27 ref29">27, 29</xref>
          ] allowed agents to change their current focus, it was considered
of interest to minimise changing the current focus of both agents. It is argued that if the
agent switches focus a large number of times, it would make the dialogue less fluent.
Therefore, we used a new formula (Equation 4) to measure the relevance of each agent,
based on how many times that agent switched focus during the dialogue, with respect
to the number of moves:
        </p>
        <p>AgentIrrelevance =</p>
      </sec>
      <sec id="sec-3-2">
        <title>N umberOf SwitchingF ocus N umberOf M oves</title>
        <p>
          Therefore, we made the RL agent learn to argue by considering coherence and
relevance, as well as winning in the minimum number of moves [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
(3)
(4)
In this section we discuss the experiments conducted between the RL agent and the
baseline agent. The baseline agent that we considered was hard coded with strategic
heuristics [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. The agents adopting the DE dialogue game can use different strategies
e.g. a heuristic or using random moves. [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. The heuristic agent was developed based
on heuristics in [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] and the random agent makes random moves with respect to the
DE dialogue game rules. The RL agent plays the dialogue game against both heuristic
and random agents. This allowed us to evaluate the coherence and relevance, as well
as observing whether the RL agent could win with the minimum number of moves. It
also allowed us to measure the quality of the dialogue with respect to coherence and
relevance.
        </p>
        <p>To measure the coherence of the agent we looked at the number of contradictions
each agent had in their commitment store. Hence, Equation (3) measures coherence for
the agent, which means the less contradictions made by the agent the more coherence in
the agent’s dialogue. On the other hand, relevance in Equation (4) measures the number
of occasions that an agent did not address the previous move, which in effect is a focus
switch. Therefore, the less number of focus switching means more focused the agent
dialogue.</p>
        <p>The RL agent first played against the heuristic agent. The reward function in
Equation 1 was used initially. The game was played 4000 times, each time is considered as a
debate episode between two agents. We allow RL agent to test the learned policy after
every 100 episodes and the test will repeat 10 times to avoid randomness. The dialogue
quality measures as defined in Equations 3 and 4 are used to visualise the results
after taking an average every 500 episodes for representation purpose as in Figure 4 for
incoherence and Figure 5 for irrelevance</p>
        <p>In Figure 4, the x axis represents the number of episodes and the y axis represents
the incoherence measure as specified in Equation 3. For the relevant measurement in
Figure 5, the x axis represents the number of episodes and the y axis represents the
irrelevance measure as specified in Equation 4. The results show that the RL agent did not
make any performance improvement in either coherence or relevance. One conclusion,
however, can be drawn from this, which is the proposed measures for coherence and
relevance are independent of the number of moves, which means with the number of
moves in an winning episode minimised, the coherence and relevance measures remain
the same.</p>
        <p>This encouraged us to incorporate coherence and relevance measures into the
reward function in Equation 1 so that our RL agent was able to improve coherence and
relevance as well as minimise the number of moves. Therefore, we have reshaped the
reward function as shown in the following equation:</p>
        <p>R = 100 +
such that:
* R is the reward function.
* M 1 is the number of moves in the first episode.
* M n is the number of moves in the current episode.
* C1 is the number of contradictions in the first episode.
* Cn is the number of contradictions in the current episode.
* SF 1 is the number of switching focus in the first episode.
* SF n is the number of switching focus in the current episode.</p>
        <p>The design of the reward function was to motivate the RL agent to choose moves
which minimises the number of moves, contradictions and focus switches. The agent
was awarded 100 for winning the game. After running the experiment between the RL
agent and the heuristic agent, the results can be seen in Figures 6 and 7.</p>
        <p>For the coherence measurement in Figure 6, the learning agent shows the learning
curve in improving the coherence where the incoherence is decreased gradually. This
means the new reward function, as in Equation 5, encourages the RL agent to maximise
coherence in the dialogue. Whereas, the heuristic agent was able to maintain coherence
in the dialogue. For the relevance measurement in Figure 7, it was surprising to see
that the RL agent was able to maintain better relevance than the heuristic agent in the
first appearance. By investigating the dialogue transcripts, it was found that the heuristic
agent asked a large number of questions, therefore making the learning agent constantly
stay on focus by passively responding to the questions. It is therefore worthwhile to
experiment the RL agent with a random agent and then study the consequence.</p>
        <p>We have done similar experiment between the RL agent and a random agent and
the results are shown in Figure 8 and 9. Figure 8 confirms the result shown in Figure 6.
Figure 9 shows the RL agent with improved performance (i.e. decrease of irrelevance)
well above the random agent.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and future direction</title>
      <p>We have proposed two quality measures for argumentative dialogue namely the fluency
and coherence. We have incorporated the quality measures into the reward function of
our RL agent and carried out a number of experiments. We conclude that the RL agent
can learn to improve its performance with regard to coherence and fluency against both
heuristic and the random agent. Different weights will be applied for experiment with
the features in equation 5.</p>
      <p>
        We are also planning to generalise our approach for different argument domains.We
are building the new argument domain in BREXIT and investigating transfer learning
techniques [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. We will test whether our RL agent can apply what has been learned in
one domain, e.g. Capital punishment to a new domain such as BREXIT.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Sultan</given-names>
            <surname>Alahmari</surname>
          </string-name>
          , Tommy Yuan, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kudenko</surname>
          </string-name>
          .
          <article-title>Reinforcement learning for abstract argumentation: Q-learning approach</article-title>
          .
          <source>In Adaptive and Learning Agents workshop (at AAMAS</source>
          <year>2017</year>
          ),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Sultan</given-names>
            <surname>Alahmari</surname>
          </string-name>
          , Tommy Yuan, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kudenko</surname>
          </string-name>
          .
          <article-title>Reinforcement learning for argumentation: Describing a phd research</article-title>
          .
          <source>In Proceedings of the 17th Workshop on Computational Models of Natural Argument (CMNA17)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Sultan</given-names>
            <surname>Alahmari</surname>
          </string-name>
          , Tommy Yuan, and Daniel Kudenko.
          <article-title>Policy generalisation in reinforcement learning for abstract argumentation</article-title>
          .
          <source>In Proceedings of the 18th Workshop on Computational Models of Natural Argument (CMNA18)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Sultan</given-names>
            <surname>Alahmari</surname>
          </string-name>
          , Tommy Yuan, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kudenko</surname>
          </string-name>
          .
          <article-title>Reinforcement learning for dialogue game based argumentation</article-title>
          .
          <source>In Accepted of the 19th Workshop on Computational Models of Natural Argument (CMNA19)</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Leila</given-names>
            <surname>Amgoud and Florence Dupin De Saint Cyr</surname>
          </string-name>
          .
          <article-title>Measures for persuasion dialogs: A preliminary investigation</article-title>
          .
          <source>Frontiers in Artificial Intelligence and Applications</source>
          ,
          <volume>172</volume>
          :
          <fpage>13</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Leila</given-names>
            <surname>Amgoud and Florence Dupin de Saint-Cyr</surname>
          </string-name>
          .
          <article-title>On the quality of persuasion dialogs</article-title>
          .
          <source>Studies in Logic, Grammer and Rheroric</source>
          ,
          <volume>12</volume>
          (
          <issue>36</issue>
          ):
          <fpage>69</fpage>
          -
          <lpage>98</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Trevor</surname>
            <given-names>JM</given-names>
          </string-name>
          <string-name>
            <surname>Bench-Capon</surname>
            and
            <given-names>Paul E</given-names>
          </string-name>
          <string-name>
            <surname>Dunne</surname>
          </string-name>
          .
          <article-title>Argumentation in artificial intelligence</article-title>
          .
          <source>Artificial intelligence</source>
          ,
          <volume>171</volume>
          (
          <fpage>10</fpage>
          -15):
          <fpage>619</fpage>
          -
          <lpage>641</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Lauri</given-names>
            <surname>Carlson</surname>
          </string-name>
          .
          <article-title>Dialogue games: An approach to discourse anaphora</article-title>
          . Dordrecht: Reidel,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Phan</given-names>
            <surname>Minh Dung</surname>
          </string-name>
          .
          <article-title>On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games</article-title>
          .
          <source>Artificial intelligence</source>
          ,
          <volume>77</volume>
          (
          <issue>2</issue>
          ):
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jim</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Mackenzie</surname>
          </string-name>
          .
          <article-title>Question-begging in non-cumulative systems</article-title>
          .
          <source>Journal of philosophical logic</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>117</fpage>
          -
          <lpage>133</lpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Omar</surname>
            <given-names>Marey</given-names>
          </string-name>
          , Jamal Bentahar, Rachida Dssouli, and
          <string-name>
            <given-names>Mohamed</given-names>
            <surname>Mbarki</surname>
          </string-name>
          .
          <article-title>Measuring and analyzing agents' uncertainty in argumentation-based negotiation dialogue games</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <fpage>306</fpage>
          -
          <lpage>320</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Omar</surname>
            <given-names>Marey</given-names>
          </string-name>
          , Jamal Bentahar, and
          <string-name>
            <surname>Abdeslam</surname>
          </string-name>
          En-Nouaary.
          <article-title>On the measurement of negotiation dialogue games</article-title>
          .
          <source>In SoMeT</source>
          , pages
          <fpage>223</fpage>
          -
          <lpage>244</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Paul-Amaury Matt</surname>
            and
            <given-names>Francesca</given-names>
          </string-name>
          <string-name>
            <surname>Toni</surname>
          </string-name>
          .
          <article-title>A game-theoretic measure of argument strength for abstract argumentation</article-title>
          .
          <source>In European Workshop on Logics in Artificial Intelligence</source>
          , pages
          <fpage>285</fpage>
          -
          <lpage>297</lpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. David John Moore.
          <article-title>Dialogue game theory for intelligent tutoring systems</article-title>
          .
          <source>PhD thesis</source>
          , Leeds Metropolitan University,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. J v Neumann.
          <article-title>Zur theorie der gesellschaftsspiele</article-title>
          .
          <source>Mathematische annalen</source>
          ,
          <volume>100</volume>
          (
          <issue>1</issue>
          ):
          <fpage>295</fpage>
          -
          <lpage>320</lpage>
          ,
          <year>1928</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Henry</given-names>
            <surname>Prakken</surname>
          </string-name>
          .
          <article-title>Coherence and flexibility in dialogue games for argumentation</article-title>
          .
          <source>J. Log. Comput.</source>
          ,
          <volume>15</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1009</fpage>
          -
          <lpage>1040</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Henry</given-names>
            <surname>Prakken</surname>
          </string-name>
          .
          <article-title>Historical overview of formal argumentation</article-title>
          .
          <source>IfCoLog Journal of Logics and their Applications</source>
          ,
          <volume>4</volume>
          (
          <issue>8</issue>
          ):
          <fpage>2183</fpage>
          -
          <lpage>2262</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Richard S Sutton and
          <string-name>
            <given-names>Andrew G</given-names>
            <surname>Barto</surname>
          </string-name>
          .
          <article-title>Reinforcement learning: An introduction</article-title>
          , volume
          <volume>1</volume>
          . MIT press Cambridge,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. Richard S Sutton and
          <string-name>
            <given-names>Andrew G</given-names>
            <surname>Barto</surname>
          </string-name>
          .
          <article-title>Reinforcement learning: An introduction</article-title>
          . MIT press Cambridge,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Matthew</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Taylor and Peter Stone</surname>
          </string-name>
          .
          <article-title>Transfer learning for reinforcement learning domains: A survey</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>10</volume>
          (Jul):
          <fpage>1633</fpage>
          -
          <lpage>1685</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Thomas L van der Weide</surname>
          </string-name>
          .
          <article-title>Arguing to motivate decisions</article-title>
          .
          <source>PhD thesis</source>
          , Utrecht University,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>Marco</given-names>
            <surname>Wiering</surname>
          </string-name>
          and Martijn Van Otterlo.
          <article-title>Reinforcement learning</article-title>
          . Adaptation, learning, and optimization,
          <volume>12</volume>
          :
          <fpage>51</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          .
          <article-title>An introduction to multiagent systems</article-title>
          . John Wiley &amp; Sons,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <article-title>Computational agents as a test-bed to study the philosophical dialogue model” de”: A development of mackenzie's dc</article-title>
          .
          <source>Informal Logic</source>
          ,
          <volume>23</volume>
          (
          <issue>3</issue>
          ),
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <article-title>A human-computer debating system prototype and its dialogue strategies</article-title>
          .
          <source>International Journal of Intelligent Systems</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):
          <fpage>133</fpage>
          -
          <lpage>156</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <article-title>A human-computer dialogue system for educational debate: A computational dialectics approach</article-title>
          .
          <source>International Journal of Artificial Intelligence in Education</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ):
          <fpage>3</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <article-title>Assessing debate strategies via computational agents</article-title>
          .
          <source>Argument and Computation</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <fpage>215</fpage>
          -
          <lpage>248</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>Chris</given-names>
            <surname>Reed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Ravenscroft</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Maudet</surname>
          </string-name>
          .
          <article-title>Informal logic dialogue games in human-computer dialogue</article-title>
          .
          <source>The Knowledge Engineering Review</source>
          ,
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <fpage>159</fpage>
          -
          <lpage>174</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <given-names>Tommy</given-names>
            <surname>Yuan</surname>
          </string-name>
          .
          <article-title>Human-Computer Debate, a Computational Dialectics Approach</article-title>
          .
          <source>PhD thesis</source>
          ,
          <source>Unpublished Doctoral Dissertation</source>
          , Leeds Metropolitan University,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>