<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reinforcement Learning for Argumentation: Describing a PhD research</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sultan Alahmari</string-name>
          <email>smsa500@york.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tommy Yuan</string-name>
          <email>tommy.yuan@york.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Kudenko</string-name>
          <email>daniel.kudenko@york.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of York, Department of Computer Science</institution>
          ,
          <addr-line>Deramore Lane, Heslington, York, YO10 5GH</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>76</fpage>
      <lpage>78</lpage>
      <abstract>
        <p>reward for the action taken as seen in gure 1. To make an</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        OVERVIEW
Arti cial intelligence (AI) is increasingly studied in many
elds such as philosophy, law and decision making. One of
the approaches to AI is the use of agent and multi-agent
systems. Agents are key element for building complex
largescale distributed systems[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In multi-agent systems, each
agent interacts with the environment and communicates
with other agents in order to achieve the designated goal.
Communication means to share and exchange information,
cooperate and coordinate with each other in order to achieve
a common goal.
      </p>
      <p>
        Argumentation is a type of communication between agents
and a process attempting to form an agreement about what
to believe. There has been increasing research in
argumentation and dialogue systems in the past decade[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. The agent
as a dialogue participant needs sophisticated dialogue
strategies in order to make high quality dialogue contributions.
By reviewing the state of art literature in computerised
dialogue systems (e.g.[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ];[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]), it is observed that their dialogue
strategies (i.e. strategic heuristics) are hardwired into the
computational agent. One of the main issues with this is that
an agent might be incapable of dealing with new dialogue
situations that have not been coded, and indeed this is an
impossible task given the dynamic nature of argumentation.
It would be ideal to make an agent search for an optimal
strategy by itself e.g. via trial and error, and thus the agent
with the best strategy will win the argument [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Machine learning has an important role to play in order
to meet these challenges. To make agents learn the dialogue
strategies, it would be more exible for them to make an
argument through exploration (trial and error). It is believed
that learning can make agents more exible to adapt to new
environments and new dialogue situations. One of the popular
machine learning approaches with regards to learning agents
is known as reinforcement learning (RL).</p>
      <p>
        Reinforcement learning focuses on how to map an action
for each state by interacting with the environment and
observing the state change[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Sutton and Barto[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] de ne
reinforcement learning as an agent learning what to do and
how to connect each situation with an action to maximise
the cumulative reward. The learner or agent is not told what
action should be taken, rather the learner needs to explore a
policy that yields the maximum cumulative reward by
trying them out. In reinforcement learning, the agent interacts
with the environment by taking an action and receiving a
agent learn to argue there is a need to identify states, actions,
environment and the rewards. In this research abstract
argumentation systems (AAS) is initially used [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to represent
the argumentation. Reasons are listed as follows:
(1) It has the ability to represent informal human
reasoning in a way that a computer can perform calculation.
In this way, argumentation bridges the gap between
human and machine reasoning[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
(2) It is easier to compute acceptable arguments in
order to evaluate variance argument semantics e.g.
grounded extension.
(3) It provides a great opportunity for the agent to
explore the relationship between arguments.
(4) It is a powerful method to solve problems since it
can be easily implemented in logic programming[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
The classical state representation of agents in literature (e.g.
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ];[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) involves states being represented as nodes in the
argumentation graph and action by the attack relation between
arguments.
      </p>
      <p>
        The main objective of our research is to investigate whether
reinforcement learning agent can be used to create an
argumentation AI with improved performance and e ciency
comparable to state-of-the-art systems. Performance is related
to how well the agent learns over time. The measurement
of performance for a good argumentation; for instance, is
whether argumentation can be won or lost or how many
arguments from a learning agent obtains accepted against other
heuristic strategy agents. E ciency is related to whether the
agent can learn within a limited or insu cient time. So the
aim is to nd out if the agent can learn rapidly or not. It
should also ensure that the agent obtains full knowledge from
the environment so as to be able to use an e cient method
to nd an optimal decision for each state [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>The light of this hypothesis, the following steps will be
taken:
(1) Initially, a basic abstract argument game model is
used due to its simplicity in implementing arguments.
This in turn makes it possible to investigate how
reinforcement learning can be applicable to a simple
dialogue scenario.
(2) Evaluation of an argumentation setting with a human
or another AI agent by observing learning
performance over time.
(3) Investigating suitable means for reinforcement
learning of a complicated dialogue scenario and studying
the results in order to generalise the RL method.
Complicated dialogue scenario involves more move
types e.g. questions, challenges, assertion, withdrawal
and moves from abstract argument level to
propositional level.</p>
      <p>
        This work will also investigate other di erent scenario
such as backtracking ([
        <xref ref-type="bibr" rid="ref14">14</xref>
        ];[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]), arguments content, weight
of individual argument amongst others[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Additionally,
challenging issue such as states representation[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as well as reward
function will also be explored.
      </p>
      <p>
        To prove the hypothesis, we have built the argumentation
software to facilitate experiment for reinforcement learning
agent arguing against di erent agents. A software testbed,
Argumento+, named after its predecessor Argumento as
reported in[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], has been built using the Java programming
language. Argumento+ contains the RL agent as well as
three other agents namely, random, maximum probability
utility and minimum probability utility agent for the sake
of the evaluation. The agents play abstract argument games.
RL agent plays game against them to maximise the
cumulative reward by winning more games. Indeed, if RL agent
win the game, it will receive rewards based on the number
of acceptable arguments i.e. grounded extensions. We
considered grounded extension because it contains an argument
that has no doubt in comparison with other arguments [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
Consequently, it will be a more acceptable argument.
      </p>
      <p>
        We have performed an initial experiment to investigate
whether RL agent learns to argue against baseline agents
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. RL agent adopts a commonly used RL method, that
is Q-learning algorithm. The aim of Q-learning is to allow
an agent to learn through experience and map each state
with an action by choosing the maximum value from the
Q-table which is updated after each episode. The initial
experiment and evaluation generally encourage the adopting
of reinforcement learning agent in argumentation with a
long term delayed reward which are considered as grounded
extensions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In the future, this work will attempt to suggest ways to
improve the RL agent performance by carrying out further
works on the initial experiment results. The state
representation of the arguments still needs to be more sophisticated
in order to make each one of them unique [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and as a
result this will make it easy for the agent to distinguish
between states. Even though initial suggestion pointed at
making the state a combination of the current state and
previous state, it is still di cult to uniquely identify each
state. To sort out this issue, it will be worth
investigating if this can be resolved by representing each state as:
(levelOf T ree; agentID; currentState; previousState)
Backtracking ([
        <xref ref-type="bibr" rid="ref14">14</xref>
        ];[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]) will also be considered to improve
the simple argument game by developing some game rules
in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Moreover, to make the game more competitive and
e ective it is important to make the agent consider the
opponent's strategy[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Hence, the learning agent needs to consider
how to learn to argue with the opponent by expanding its
knowledge base with new arguments. In addition, in
complex argumentation scenario, we need to consider moving
from high level abstraction to the argument contents by
using propositional logic. Weighted arguments will also be
considered in this research since some arguments are more
important than others. We will consider choosing a suitable
argument model for the complicated scenario. There are many
models; for instance, Prakken's dialogue game P ersuasion
with dispute [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Bench-Capon T DG dialogue game ([
        <xref ref-type="bibr" rid="ref2">2</xref>
        ];[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]),
DC by Mackenzie [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Utilisation by Moore in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and DE
system (Yuan et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]), all of these models will be critically
reviewed.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Sultan</given-names>
            <surname>Alahmari</surname>
          </string-name>
          , Tommy Yuan, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kudenko</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Reinforcement learning for abstract argumentation: Q-learning approach</article-title>
          .
          <source>In Adaptive and Learning Agents workshop (at AAMAS</source>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Trevor</surname>
            <given-names>JM</given-names>
          </string-name>
          <string-name>
            <surname>Bench-Capon</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Speci cation and implementation of Toulmin dialogue game</article-title>
          .
          <source>In Proceedings of JURIX</source>
          , Vol.
          <volume>98</volume>
          . 5{
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Trevor</surname>
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Bench-Capon</surname>
            ,
            <given-names>T</given-names>
          </string-name>
          <string-name>
            <surname>Geldard</surname>
          </string-name>
          , and Paul H Leng.
          <year>2000</year>
          .
          <article-title>A method for the computational modelling of dialectical argument with dialogue games</article-title>
          .
          <source>Arti cial Intelligence and Law</source>
          <volume>8</volume>
          ,
          <issue>2</issue>
          (
          <year>2000</year>
          ),
          <volume>233</volume>
          {
          <fpage>254</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Heriberto</given-names>
            <surname>Cuayahuitl</surname>
          </string-name>
          , Simon Keizer, and
          <string-name>
            <given-names>Oliver</given-names>
            <surname>Lemon</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Strategic dialogue management via deep reinforcement learning</article-title>
          .
          <source>arXiv preprint arXiv:1511.08099</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Phan</given-names>
            <surname>Minh Dung</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games</article-title>
          .
          <source>Arti cial intelligence 77</source>
          ,
          <issue>2</issue>
          (
          <year>1995</year>
          ),
          <volume>321</volume>
          {
          <fpage>357</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Paul E Dunne</given-names>
            , Anthony Hunter,
            <surname>Peter McBurney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Simon</given-names>
            <surname>Parsons</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Weighted argument systems: Basic de nitions, algorithms, and complexity results</article-title>
          .
          <source>Arti cial Intelligence</source>
          <volume>175</volume>
          ,
          <issue>2</issue>
          (
          <year>2011</year>
          ),
          <volume>457</volume>
          {
          <fpage>486</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Katie</given-names>
            <surname>Long</surname>
          </string-name>
          <string-name>
            <surname>Genter</surname>
          </string-name>
          , Santiago Ontan~on, and
          <string-name>
            <given-names>Ashwin</given-names>
            <surname>Ram</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Learning Opponent Strategies through First Order Induction.</article-title>
          .
          <source>In FLAIRS Conference</source>
          .
          <volume>1</volume>
          {
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Piotr</surname>
            <given-names>S</given-names>
          </string-name>
          <string-name>
            <surname>Kosmicki</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>A platform for the evaluation of automated argumentation strategies</article-title>
          .
          <source>In International Conference on Rough Sets and Current Trends in Computing</source>
          . Springer,
          <volume>494</volume>
          {
          <fpage>503</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Ryszard</given-names>
            <surname>Kowalczyk</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Intelligent Agent Technology Research</article-title>
          . https://www.swinburne.edu.au/ict/success/ research-projects-and
          <article-title>-grants/intelligent-agent/</article-title>
          . (
          <year>2014</year>
          ). [Online; accessed 06-April-2017].
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jim</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Mackenzie</surname>
          </string-name>
          .
          <year>1979</year>
          .
          <article-title>Question-begging in non-cumulative systems</article-title>
          .
          <source>Journal of philosophical logic 8</source>
          ,
          <issue>1</issue>
          (
          <year>1979</year>
          ),
          <volume>117</volume>
          {
          <fpage>133</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Sanjay</surname>
            <given-names>Modgil</given-names>
          </string-name>
          , Francesca Toni, Floris Bex, Ivan Bratko, Carlos I Chesnevar,
          <article-title>Wolfgang Dvorak, Marcelo A Falappa, Xiuyi Fan, Sarah Alice Gaggl, Alejandro J Garc a, and others</article-title>
          .
          <year>2013</year>
          .
          <article-title>The added value of argumentation</article-title>
          . In Agreement technologies. Springer,
          <volume>357</volume>
          {
          <fpage>403</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>David John Moore.</surname>
          </string-name>
          <year>1993</year>
          .
          <article-title>Dialogue game theory for intelligent tutoring systems</article-title>
          .
          <source>Ph.D. Dissertation</source>
          . Leeds Metropolitan University.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Henry</given-names>
            <surname>Prakken</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Relating protocols for dynamic dispute with logics for defeasible argumentation</article-title>
          .
          <source>Synthese</source>
          <volume>127</volume>
          ,
          <issue>1</issue>
          (
          <year>2001</year>
          ),
          <volume>187</volume>
          {
          <fpage>219</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Henry</given-names>
            <surname>Prakken</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Argumentation Logics: Games for abstract argumentation</article-title>
          . http://www.sta .science.uu.nl/ prakk101/al/ chongqing10.html. (
          <year>2010</year>
          ). [Online; accessed 01-April-2017].
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Richard</surname>
            <given-names>S</given-names>
          </string-name>
          <string-name>
            <surname>Sutton and Andrew G Barto</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Reinforcement learning: An introduction</article-title>
          . Vol.
          <volume>1</volume>
          . MIT press Cambridge.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Gerard</surname>
            <given-names>AW</given-names>
          </string-name>
          <string-name>
            <surname>Vreeswik and Henry Prakken</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Credulous and sceptical argument games for preferred semantics</article-title>
          .
          <source>In European Workshop on Logics in Arti cial Intelligence</source>
          . Springer,
          <volume>239</volume>
          {
          <fpage>253</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Eric</given-names>
            <surname>Wiewiora</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>E cient Exploration for Reinforcement Learning</article-title>
          .
          <source>Ph.D. Dissertation</source>
          . Citeseer.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>An introduction to multiagent systems</article-title>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Wooldridge</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>An introduction to multiagent systems</article-title>
          . John Wiley &amp; Sons.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Computational Agents as a Test-Bed to Study the Philosophical Dialogue Model" DE": A Development of Mackenzie's DC</article-title>
          .
          <source>Informal Logic</source>
          <volume>23</volume>
          ,
          <issue>3</issue>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A human{ computer debating system prototype and its dialogue strategies</article-title>
          .
          <source>International Journal of Intelligent Systems</source>
          <volume>22</volume>
          ,
          <issue>1</issue>
          (
          <year>2007</year>
          ),
          <volume>133</volume>
          {
          <fpage>156</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A humancomputer dialogue system for educational debate: A computational dialectics approach</article-title>
          .
          <source>International Journal of Arti cial Intelligence in Education 18</source>
          ,
          <issue>1</issue>
          (
          <year>2008</year>
          ),
          <volume>3</volume>
          {
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , Jenny Schulze, Joseph Devereux, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Reed</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Towards an arguing agents competition: Building on argumento</article-title>
          .
          <source>In Proceedings of IJCAI2008 Workshop on Computational Models of Natural Argument.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Tangming</surname>
            <given-names>Yuan</given-names>
          </string-name>
          , Vigar Svansson, David Moore,
          <string-name>
            <given-names>and Alec</given-names>
            <surname>Grierson</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A computer game for abstract argumentation</article-title>
          .
          <source>In Proceedings of the 7th Workshop on Computational Models of Natural Argument (CMNA07).</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>