Reinforcement Learning for Argumentation: Describing a PhD
                              research
            Sultan Alahmari                               Tommy Yuan                             Daniel Kudenko
             University of York                        University of York                       University of York
      Department of Computer Science            Department of Computer Science           Department of Computer Science
      Deramore Lane, Heslington, York,          Deramore Lane, Heslington, York,         Deramore Lane, Heslington, York,
              YO10 5GH, UK                              YO10 5GH, UK                             YO10 5GH, UK
            smsa500@york.ac.uk                      tommy.yuan@york.ac.uk                   daniel.kudenko@york.ac.uk
1     OVERVIEW                                                      reward for the action taken as seen in figure 1. To make an
Artificial intelligence (AI) is increasingly studied in many
fields such as philosophy, law and decision making. One of
the approaches to AI is the use of agent and multi-agent
systems. Agents are key element for building complex large-
scale distributed systems[9]. In multi-agent systems, each
agent interacts with the environment and communicates
with other agents in order to achieve the designated goal.
Communication means to share and exchange information,
cooperate and coordinate with each other in order to achieve        Figure 1: Reinforcement learning agent-environment
a common goal.                                                      interaction
   Argumentation is a type of communication between agents
and a process attempting to form an agreement about what            agent learn to argue there is a need to identify states, actions,
to believe. There has been increasing research in argumenta-        environment and the rewards. In this research abstract ar-
tion and dialogue systems in the past decade[23]. The agent         gumentation systems (AAS) is initially used [5] to represent
as a dialogue participant needs sophisticated dialogue strate-      the argumentation. Reasons are listed as follows:
gies in order to make high quality dialogue contributions.               (1) It has the ability to represent informal human reason-
By reviewing the state of art literature in computerised dia-                 ing in a way that a computer can perform calculation.
logue systems (e.g.[21];[22]), it is observed that their dialogue             In this way, argumentation bridges the gap between
strategies (i.e. strategic heuristics) are hardwired into the                 human and machine reasoning[11].
computational agent. One of the main issues with this is that            (2) It is easier to compute acceptable arguments in or-
an agent might be incapable of dealing with new dialogue                      der to evaluate variance argument semantics e.g.
situations that have not been coded, and indeed this is an                    grounded extension.
impossible task given the dynamic nature of argumentation.               (3) It provides a great opportunity for the agent to
It would be ideal to make an agent search for an optimal                      explore the relationship between arguments.
strategy by itself e.g. via trial and error, and thus the agent          (4) It is a powerful method to solve problems since it
with the best strategy will win the argument [8].                             can be easily implemented in logic programming[5].
   Machine learning has an important role to play in order
                                                                    The classical state representation of agents in literature (e.g.
to meet these challenges. To make agents learn the dialogue
                                                                    [19];[4]) involves states being represented as nodes in the ar-
strategies, it would be more flexible for them to make an
                                                                    gumentation graph and action by the attack relation between
argument through exploration (trial and error). It is believed
                                                                    arguments.
that learning can make agents more flexible to adapt to new
                                                                       The main objective of our research is to investigate whether
environments and new dialogue situations. One of the popular
                                                                    reinforcement learning agent can be used to create an argu-
machine learning approaches with regards to learning agents
                                                                    mentation AI with improved performance and efficiency com-
is known as reinforcement learning (RL).
                                                                    parable to state-of-the-art systems. Performance is related
   Reinforcement learning focuses on how to map an action
                                                                    to how well the agent learns over time. The measurement
for each state by interacting with the environment and ob-
                                                                    of performance for a good argumentation; for instance, is
serving the state change[15]. Sutton and Barto[15] define
                                                                    whether argumentation can be won or lost or how many argu-
reinforcement learning as an agent learning what to do and
                                                                    ments from a learning agent obtains accepted against other
how to connect each situation with an action to maximise
                                                                    heuristic strategy agents. Efficiency is related to whether the
the cumulative reward. The learner or agent is not told what
                                                                    agent can learn within a limited or insufficient time. So the
action should be taken, rather the learner needs to explore a
                                                                    aim is to find out if the agent can learn rapidly or not. It
policy that yields the maximum cumulative reward by try-
                                                                    should also ensure that the agent obtains full knowledge from
ing them out. In reinforcement learning, the agent interacts
                                                                    the environment so as to be able to use an efficient method
with the environment by taking an action and receiving a
                                                                    to find an optimal decision for each state [17].


 76                                                                     18th Workshop on Computational Models of Natural Argument
                                                                                    Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                        16th July 2017, London, UK
  The light of this hypothesis, the following steps will be            making the state a combination of the current state and
taken:                                                                 previous state, it is still difficult to uniquely identify each
                                                                       state. To sort out this issue, it will be worth investigat-
    (1) Initially, a basic abstract argument game model is
                                                                       ing if this can be resolved by representing each state as:
        used due to its simplicity in implementing arguments.
                                                                       (levelOf T ree, agentID, currentState, previousState)
        This in turn makes it possible to investigate how
        reinforcement learning can be applicable to a simple
                                                                       Backtracking ([14];[16]) will also be considered to improve
        dialogue scenario.
                                                                       the simple argument game by developing some game rules
    (2) Evaluation of an argumentation setting with a human
                                                                       in [18]. Moreover, to make the game more competitive and
        or another AI agent by observing learning perfor-
                                                                       effective it is important to make the agent consider the oppo-
        mance over time.
                                                                       nent’s strategy[7]. Hence, the learning agent needs to consider
    (3) Investigating suitable means for reinforcement learn-
                                                                       how to learn to argue with the opponent by expanding its
        ing of a complicated dialogue scenario and studying
                                                                       knowledge base with new arguments. In addition, in com-
        the results in order to generalise the RL method.
                                                                       plex argumentation scenario, we need to consider moving
        Complicated dialogue scenario involves more move
                                                                       from high level abstraction to the argument contents by
        types e.g. questions, challenges, assertion, withdrawal
                                                                       using propositional logic. Weighted arguments will also be
        and moves from abstract argument level to proposi-
                                                                       considered in this research since some arguments are more
        tional level.
                                                                       important than others. We will consider choosing a suitable
   This work will also investigate other different scenario            argument model for the complicated scenario. There are many
such as backtracking ([14];[16]), arguments content, weight            models; for instance, Prakken’s dialogue game P ersuasion
of individual argument amongst others[6]. Additionally, chal-          with dispute [13], Bench-Capon T DG dialogue game ([2];[3]),
lenging issue such as states representation[1] as well as reward       DC by Mackenzie [10], Utilisation by Moore in [12] and DE
function will also be explored.                                        system (Yuan et al. [20]), all of these models will be critically
   To prove the hypothesis, we have built the argumentation            reviewed.
software to facilitate experiment for reinforcement learning
agent arguing against different agents. A software testbed,            REFERENCES
Argumento+, named after its predecessor Argumento as                    [1] Sultan Alahmari, Tommy Yuan, and Daniel Kudenko. 2017. Re-
reported in[24], has been built using the Java programming                  inforcement learning for abstract argumentation: Q-learning ap-
                                                                            proach. In Adaptive and Learning Agents workshop (at AAMAS
language. Argumento+ contains the RL agent as well as                       2017).
three other agents namely, random, maximum probability                  [2] Trevor JM Bench-Capon. 1998. Specification and implementation
utility and minimum probability utility agent for the sake                  of Toulmin dialogue game. In Proceedings of JURIX, Vol. 98.
                                                                            5–20.
of the evaluation. The agents play abstract argument games.             [3] Trevor J. M. Bench-Capon, T Geldard, and Paul H Leng. 2000. A
RL agent plays game against them to maximise the cumu-                      method for the computational modelling of dialectical argument
lative reward by winning more games. Indeed, if RL agent                    with dialogue games. Artificial Intelligence and Law 8, 2 (2000),
                                                                            233–254.
win the game, it will receive rewards based on the number               [4] Heriberto Cuayáhuitl, Simon Keizer, and Oliver Lemon. 2015.
of acceptable arguments i.e. grounded extensions. We con-                   Strategic dialogue management via deep reinforcement learning.
                                                                            arXiv preprint arXiv:1511.08099 (2015).
sidered grounded extension because it contains an argument              [5] Phan Minh Dung. 1995. On the acceptability of arguments and its
that has no doubt in comparison with other arguments [19].                  fundamental role in nonmonotonic reasoning, logic programming
Consequently, it will be a more acceptable argument.                        and n-person games. Artificial intelligence 77, 2 (1995), 321–357.
                                                                        [6] Paul E Dunne, Anthony Hunter, Peter McBurney, Simon Parsons,
   We have performed an initial experiment to investigate                   and Michael Wooldridge. 2011. Weighted argument systems:
whether RL agent learns to argue against baseline agents                    Basic definitions, algorithms, and complexity results. Artificial
[1]. RL agent adopts a commonly used RL method, that                        Intelligence 175, 2 (2011), 457–486.
                                                                        [7] Katie Long Genter, Santiago Ontañón, and Ashwin Ram. 2011.
is Q-learning algorithm. The aim of Q-learning is to allow                  Learning Opponent Strategies through First Order Induction.. In
an agent to learn through experience and map each state                     FLAIRS Conference. 1–2.
                                                                        [8] Piotr S Kośmicki. 2010. A platform for the evaluation of auto-
with an action by choosing the maximum value from the                       mated argumentation strategies. In International Conference
Q-table which is updated after each episode. The initial                    on Rough Sets and Current Trends in Computing. Springer,
experiment and evaluation generally encourage the adopting                  494–503.
                                                                        [9] Ryszard Kowalczyk. 2014.            Intelligent Agent Technol-
of reinforcement learning agent in argumentation with a                     ogy Research.       https://www.swinburne.edu.au/ict/success/
long term delayed reward which are considered as grounded                   research-projects-and-grants/intelligent-agent/. (2014). [Online;
extensions [1].                                                             accessed 06-April-2017].
                                                                       [10] Jim D Mackenzie. 1979. Question-begging in non-cumulative
   In the future, this work will attempt to suggest ways to                 systems. Journal of philosophical logic 8, 1 (1979), 117–133.
improve the RL agent performance by carrying out further               [11] Sanjay Modgil, Francesca Toni, Floris Bex, Ivan Bratko, Car-
                                                                            los I Chesnevar, Wolfgang Dvořák, Marcelo A Falappa, Xiuyi
works on the initial experiment results. The state represen-                Fan, Sarah Alice Gaggl, Alejandro J Garcı́a, and others. 2013.
tation of the arguments still needs to be more sophisticated                The added value of argumentation. In Agreement technologies.
in order to make each one of them unique [1], and as a                      Springer, 357–403.
                                                                       [12] David John Moore. 1993. Dialogue game theory for intelligent
result this will make it easy for the agent to distinguish                  tutoring systems. Ph.D. Dissertation. Leeds Metropolitan Uni-
between states. Even though initial suggestion pointed at                   versity.
                                                                   2


    18th Workshop on Computational Models of Natural Argument                                                                        77
    Floris Bex, Floriana Grasso, Nancy Green (eds)
    16th July 2017, London, UK
[13] Henry Prakken. 2001. Relating protocols for dynamic dispute
     with logics for defeasible argumentation. Synthese 127, 1 (2001),
     187–219.
[14] Henry Prakken. 2010. Argumentation Logics: Games for abstract
     argumentation. http://www.staff.science.uu.nl/∼prakk101/al/
     chongqing10.html. (2010). [Online; accessed 01-April-2017].
[15] Richard S Sutton and Andrew G Barto. 1998. Reinforcement
     learning: An introduction. Vol. 1. MIT press Cambridge.
[16] Gerard AW Vreeswik and Henry Prakken. 2000. Credulous and
     sceptical argument games for preferred semantics. In European
     Workshop on Logics in Artificial Intelligence. Springer, 239–253.
[17] Eric Wiewiora. 2004. Efficient Exploration for Reinforcement
     Learning. Ph.D. Dissertation. Citeseer.
[18] Michael Wooldridge. 2002. An introduction to multiagent sys-
     tems. John Wiley & Sons.
[19] Michael Wooldridge. 2009. An introduction to multiagent sys-
     tems. John Wiley & Sons.
[20] Tangming Yuan, David Moore, and Alec Grierson. 2003. Compu-
     tational Agents as a Test-Bed to Study the Philosophical Dialogue
     Model” DE”: A Development of Mackenzie’s DC. Informal Logic
     23, 3 (2003).
[21] Tangming Yuan, David Moore, and Alec Grierson. 2007. A human–
     computer debating system prototype and its dialogue strategies.
     International Journal of Intelligent Systems 22, 1 (2007), 133–
     156.
[22] Tangming Yuan, David Moore, and Alec Grierson. 2008. A human-
     computer dialogue system for educational debate: A computa-
     tional dialectics approach. International Journal of Artificial
     Intelligence in Education 18, 1 (2008), 3–26.
[23] Tangming Yuan, Jenny Schulze, Joseph Devereux, and Chris
     Reed. 2008. Towards an arguing agents competition: Building on
     argumento. In Proceedings of IJCAI2008 Workshop on Compu-
     tational Models of Natural Argument.
[24] Tangming Yuan, Viðar Svansson, David Moore, and Alec Gri-
     erson. 2007. A computer game for abstract argumentation. In
     Proceedings of the 7th Workshop on Computational Models of
     Natural Argument (CMNA07).


                                                                         3


 78                                                                          18th Workshop on Computational Models of Natural Argument
                                                                                         Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                             16th July 2017, London, UK