=Paper=
{{Paper
|id=Vol-2048/paper13
|storemode=property
|title=Reinforcement Learning for Argumentation: Describing a PhD Research
|pdfUrl=https://ceur-ws.org/Vol-2048/paper13.pdf
|volume=Vol-2048
|authors=Sultan Alahmari,Tommy Yuan,Daniel Kudenko
|dblpUrl=https://dblp.org/rec/conf/icail/AlahmariYK17
}}
==Reinforcement Learning for Argumentation: Describing a PhD Research==
Reinforcement Learning for Argumentation: Describing a PhD
research
Sultan Alahmari Tommy Yuan Daniel Kudenko
University of York University of York University of York
Department of Computer Science Department of Computer Science Department of Computer Science
Deramore Lane, Heslington, York, Deramore Lane, Heslington, York, Deramore Lane, Heslington, York,
YO10 5GH, UK YO10 5GH, UK YO10 5GH, UK
smsa500@york.ac.uk tommy.yuan@york.ac.uk daniel.kudenko@york.ac.uk
1 OVERVIEW reward for the action taken as seen in figure 1. To make an
Artificial intelligence (AI) is increasingly studied in many
fields such as philosophy, law and decision making. One of
the approaches to AI is the use of agent and multi-agent
systems. Agents are key element for building complex large-
scale distributed systems[9]. In multi-agent systems, each
agent interacts with the environment and communicates
with other agents in order to achieve the designated goal.
Communication means to share and exchange information,
cooperate and coordinate with each other in order to achieve Figure 1: Reinforcement learning agent-environment
a common goal. interaction
Argumentation is a type of communication between agents
and a process attempting to form an agreement about what agent learn to argue there is a need to identify states, actions,
to believe. There has been increasing research in argumenta- environment and the rewards. In this research abstract ar-
tion and dialogue systems in the past decade[23]. The agent gumentation systems (AAS) is initially used [5] to represent
as a dialogue participant needs sophisticated dialogue strate- the argumentation. Reasons are listed as follows:
gies in order to make high quality dialogue contributions. (1) It has the ability to represent informal human reason-
By reviewing the state of art literature in computerised dia- ing in a way that a computer can perform calculation.
logue systems (e.g.[21];[22]), it is observed that their dialogue In this way, argumentation bridges the gap between
strategies (i.e. strategic heuristics) are hardwired into the human and machine reasoning[11].
computational agent. One of the main issues with this is that (2) It is easier to compute acceptable arguments in or-
an agent might be incapable of dealing with new dialogue der to evaluate variance argument semantics e.g.
situations that have not been coded, and indeed this is an grounded extension.
impossible task given the dynamic nature of argumentation. (3) It provides a great opportunity for the agent to
It would be ideal to make an agent search for an optimal explore the relationship between arguments.
strategy by itself e.g. via trial and error, and thus the agent (4) It is a powerful method to solve problems since it
with the best strategy will win the argument [8]. can be easily implemented in logic programming[5].
Machine learning has an important role to play in order
The classical state representation of agents in literature (e.g.
to meet these challenges. To make agents learn the dialogue
[19];[4]) involves states being represented as nodes in the ar-
strategies, it would be more flexible for them to make an
gumentation graph and action by the attack relation between
argument through exploration (trial and error). It is believed
arguments.
that learning can make agents more flexible to adapt to new
The main objective of our research is to investigate whether
environments and new dialogue situations. One of the popular
reinforcement learning agent can be used to create an argu-
machine learning approaches with regards to learning agents
mentation AI with improved performance and efficiency com-
is known as reinforcement learning (RL).
parable to state-of-the-art systems. Performance is related
Reinforcement learning focuses on how to map an action
to how well the agent learns over time. The measurement
for each state by interacting with the environment and ob-
of performance for a good argumentation; for instance, is
serving the state change[15]. Sutton and Barto[15] define
whether argumentation can be won or lost or how many argu-
reinforcement learning as an agent learning what to do and
ments from a learning agent obtains accepted against other
how to connect each situation with an action to maximise
heuristic strategy agents. Efficiency is related to whether the
the cumulative reward. The learner or agent is not told what
agent can learn within a limited or insufficient time. So the
action should be taken, rather the learner needs to explore a
aim is to find out if the agent can learn rapidly or not. It
policy that yields the maximum cumulative reward by try-
should also ensure that the agent obtains full knowledge from
ing them out. In reinforcement learning, the agent interacts
the environment so as to be able to use an efficient method
with the environment by taking an action and receiving a
to find an optimal decision for each state [17].
76 18th Workshop on Computational Models of Natural Argument
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK
The light of this hypothesis, the following steps will be making the state a combination of the current state and
taken: previous state, it is still difficult to uniquely identify each
state. To sort out this issue, it will be worth investigat-
(1) Initially, a basic abstract argument game model is
ing if this can be resolved by representing each state as:
used due to its simplicity in implementing arguments.
(levelOf T ree, agentID, currentState, previousState)
This in turn makes it possible to investigate how
reinforcement learning can be applicable to a simple
Backtracking ([14];[16]) will also be considered to improve
dialogue scenario.
the simple argument game by developing some game rules
(2) Evaluation of an argumentation setting with a human
in [18]. Moreover, to make the game more competitive and
or another AI agent by observing learning perfor-
effective it is important to make the agent consider the oppo-
mance over time.
nent’s strategy[7]. Hence, the learning agent needs to consider
(3) Investigating suitable means for reinforcement learn-
how to learn to argue with the opponent by expanding its
ing of a complicated dialogue scenario and studying
knowledge base with new arguments. In addition, in com-
the results in order to generalise the RL method.
plex argumentation scenario, we need to consider moving
Complicated dialogue scenario involves more move
from high level abstraction to the argument contents by
types e.g. questions, challenges, assertion, withdrawal
using propositional logic. Weighted arguments will also be
and moves from abstract argument level to proposi-
considered in this research since some arguments are more
tional level.
important than others. We will consider choosing a suitable
This work will also investigate other different scenario argument model for the complicated scenario. There are many
such as backtracking ([14];[16]), arguments content, weight models; for instance, Prakken’s dialogue game P ersuasion
of individual argument amongst others[6]. Additionally, chal- with dispute [13], Bench-Capon T DG dialogue game ([2];[3]),
lenging issue such as states representation[1] as well as reward DC by Mackenzie [10], Utilisation by Moore in [12] and DE
function will also be explored. system (Yuan et al. [20]), all of these models will be critically
To prove the hypothesis, we have built the argumentation reviewed.
software to facilitate experiment for reinforcement learning
agent arguing against different agents. A software testbed, REFERENCES
Argumento+, named after its predecessor Argumento as [1] Sultan Alahmari, Tommy Yuan, and Daniel Kudenko. 2017. Re-
reported in[24], has been built using the Java programming inforcement learning for abstract argumentation: Q-learning ap-
proach. In Adaptive and Learning Agents workshop (at AAMAS
language. Argumento+ contains the RL agent as well as 2017).
three other agents namely, random, maximum probability [2] Trevor JM Bench-Capon. 1998. Specification and implementation
utility and minimum probability utility agent for the sake of Toulmin dialogue game. In Proceedings of JURIX, Vol. 98.
5–20.
of the evaluation. The agents play abstract argument games. [3] Trevor J. M. Bench-Capon, T Geldard, and Paul H Leng. 2000. A
RL agent plays game against them to maximise the cumu- method for the computational modelling of dialectical argument
lative reward by winning more games. Indeed, if RL agent with dialogue games. Artificial Intelligence and Law 8, 2 (2000),
233–254.
win the game, it will receive rewards based on the number [4] Heriberto Cuayáhuitl, Simon Keizer, and Oliver Lemon. 2015.
of acceptable arguments i.e. grounded extensions. We con- Strategic dialogue management via deep reinforcement learning.
arXiv preprint arXiv:1511.08099 (2015).
sidered grounded extension because it contains an argument [5] Phan Minh Dung. 1995. On the acceptability of arguments and its
that has no doubt in comparison with other arguments [19]. fundamental role in nonmonotonic reasoning, logic programming
Consequently, it will be a more acceptable argument. and n-person games. Artificial intelligence 77, 2 (1995), 321–357.
[6] Paul E Dunne, Anthony Hunter, Peter McBurney, Simon Parsons,
We have performed an initial experiment to investigate and Michael Wooldridge. 2011. Weighted argument systems:
whether RL agent learns to argue against baseline agents Basic definitions, algorithms, and complexity results. Artificial
[1]. RL agent adopts a commonly used RL method, that Intelligence 175, 2 (2011), 457–486.
[7] Katie Long Genter, Santiago Ontañón, and Ashwin Ram. 2011.
is Q-learning algorithm. The aim of Q-learning is to allow Learning Opponent Strategies through First Order Induction.. In
an agent to learn through experience and map each state FLAIRS Conference. 1–2.
[8] Piotr S Kośmicki. 2010. A platform for the evaluation of auto-
with an action by choosing the maximum value from the mated argumentation strategies. In International Conference
Q-table which is updated after each episode. The initial on Rough Sets and Current Trends in Computing. Springer,
experiment and evaluation generally encourage the adopting 494–503.
[9] Ryszard Kowalczyk. 2014. Intelligent Agent Technol-
of reinforcement learning agent in argumentation with a ogy Research. https://www.swinburne.edu.au/ict/success/
long term delayed reward which are considered as grounded research-projects-and-grants/intelligent-agent/. (2014). [Online;
extensions [1]. accessed 06-April-2017].
[10] Jim D Mackenzie. 1979. Question-begging in non-cumulative
In the future, this work will attempt to suggest ways to systems. Journal of philosophical logic 8, 1 (1979), 117–133.
improve the RL agent performance by carrying out further [11] Sanjay Modgil, Francesca Toni, Floris Bex, Ivan Bratko, Car-
los I Chesnevar, Wolfgang Dvořák, Marcelo A Falappa, Xiuyi
works on the initial experiment results. The state represen- Fan, Sarah Alice Gaggl, Alejandro J Garcı́a, and others. 2013.
tation of the arguments still needs to be more sophisticated The added value of argumentation. In Agreement technologies.
in order to make each one of them unique [1], and as a Springer, 357–403.
[12] David John Moore. 1993. Dialogue game theory for intelligent
result this will make it easy for the agent to distinguish tutoring systems. Ph.D. Dissertation. Leeds Metropolitan Uni-
between states. Even though initial suggestion pointed at versity.
2
18th Workshop on Computational Models of Natural Argument 77
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK
[13] Henry Prakken. 2001. Relating protocols for dynamic dispute
with logics for defeasible argumentation. Synthese 127, 1 (2001),
187–219.
[14] Henry Prakken. 2010. Argumentation Logics: Games for abstract
argumentation. http://www.staff.science.uu.nl/∼prakk101/al/
chongqing10.html. (2010). [Online; accessed 01-April-2017].
[15] Richard S Sutton and Andrew G Barto. 1998. Reinforcement
learning: An introduction. Vol. 1. MIT press Cambridge.
[16] Gerard AW Vreeswik and Henry Prakken. 2000. Credulous and
sceptical argument games for preferred semantics. In European
Workshop on Logics in Artificial Intelligence. Springer, 239–253.
[17] Eric Wiewiora. 2004. Efficient Exploration for Reinforcement
Learning. Ph.D. Dissertation. Citeseer.
[18] Michael Wooldridge. 2002. An introduction to multiagent sys-
tems. John Wiley & Sons.
[19] Michael Wooldridge. 2009. An introduction to multiagent sys-
tems. John Wiley & Sons.
[20] Tangming Yuan, David Moore, and Alec Grierson. 2003. Compu-
tational Agents as a Test-Bed to Study the Philosophical Dialogue
Model” DE”: A Development of Mackenzie’s DC. Informal Logic
23, 3 (2003).
[21] Tangming Yuan, David Moore, and Alec Grierson. 2007. A human–
computer debating system prototype and its dialogue strategies.
International Journal of Intelligent Systems 22, 1 (2007), 133–
156.
[22] Tangming Yuan, David Moore, and Alec Grierson. 2008. A human-
computer dialogue system for educational debate: A computa-
tional dialectics approach. International Journal of Artificial
Intelligence in Education 18, 1 (2008), 3–26.
[23] Tangming Yuan, Jenny Schulze, Joseph Devereux, and Chris
Reed. 2008. Towards an arguing agents competition: Building on
argumento. In Proceedings of IJCAI2008 Workshop on Compu-
tational Models of Natural Argument.
[24] Tangming Yuan, Viðar Svansson, David Moore, and Alec Gri-
erson. 2007. A computer game for abstract argumentation. In
Proceedings of the 7th Workshop on Computational Models of
Natural Argument (CMNA07).
3
78 18th Workshop on Computational Models of Natural Argument
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK