Reinforcement Learning for Argumentation: Describing a PhD research Sultan Alahmari Tommy Yuan Daniel Kudenko University of York University of York University of York Department of Computer Science Department of Computer Science Department of Computer Science Deramore Lane, Heslington, York, Deramore Lane, Heslington, York, Deramore Lane, Heslington, York, YO10 5GH, UK YO10 5GH, UK YO10 5GH, UK smsa500@york.ac.uk tommy.yuan@york.ac.uk daniel.kudenko@york.ac.uk 1 OVERVIEW reward for the action taken as seen in figure 1. To make an Artificial intelligence (AI) is increasingly studied in many fields such as philosophy, law and decision making. One of the approaches to AI is the use of agent and multi-agent systems. Agents are key element for building complex large- scale distributed systems[9]. In multi-agent systems, each agent interacts with the environment and communicates with other agents in order to achieve the designated goal. Communication means to share and exchange information, cooperate and coordinate with each other in order to achieve Figure 1: Reinforcement learning agent-environment a common goal. interaction Argumentation is a type of communication between agents and a process attempting to form an agreement about what agent learn to argue there is a need to identify states, actions, to believe. There has been increasing research in argumenta- environment and the rewards. In this research abstract ar- tion and dialogue systems in the past decade[23]. The agent gumentation systems (AAS) is initially used [5] to represent as a dialogue participant needs sophisticated dialogue strate- the argumentation. Reasons are listed as follows: gies in order to make high quality dialogue contributions. (1) It has the ability to represent informal human reason- By reviewing the state of art literature in computerised dia- ing in a way that a computer can perform calculation. logue systems (e.g.[21];[22]), it is observed that their dialogue In this way, argumentation bridges the gap between strategies (i.e. strategic heuristics) are hardwired into the human and machine reasoning[11]. computational agent. One of the main issues with this is that (2) It is easier to compute acceptable arguments in or- an agent might be incapable of dealing with new dialogue der to evaluate variance argument semantics e.g. situations that have not been coded, and indeed this is an grounded extension. impossible task given the dynamic nature of argumentation. (3) It provides a great opportunity for the agent to It would be ideal to make an agent search for an optimal explore the relationship between arguments. strategy by itself e.g. via trial and error, and thus the agent (4) It is a powerful method to solve problems since it with the best strategy will win the argument [8]. can be easily implemented in logic programming[5]. Machine learning has an important role to play in order The classical state representation of agents in literature (e.g. to meet these challenges. To make agents learn the dialogue [19];[4]) involves states being represented as nodes in the ar- strategies, it would be more flexible for them to make an gumentation graph and action by the attack relation between argument through exploration (trial and error). It is believed arguments. that learning can make agents more flexible to adapt to new The main objective of our research is to investigate whether environments and new dialogue situations. One of the popular reinforcement learning agent can be used to create an argu- machine learning approaches with regards to learning agents mentation AI with improved performance and efficiency com- is known as reinforcement learning (RL). parable to state-of-the-art systems. Performance is related Reinforcement learning focuses on how to map an action to how well the agent learns over time. The measurement for each state by interacting with the environment and ob- of performance for a good argumentation; for instance, is serving the state change[15]. Sutton and Barto[15] define whether argumentation can be won or lost or how many argu- reinforcement learning as an agent learning what to do and ments from a learning agent obtains accepted against other how to connect each situation with an action to maximise heuristic strategy agents. Efficiency is related to whether the the cumulative reward. The learner or agent is not told what agent can learn within a limited or insufficient time. So the action should be taken, rather the learner needs to explore a aim is to find out if the agent can learn rapidly or not. It policy that yields the maximum cumulative reward by try- should also ensure that the agent obtains full knowledge from ing them out. In reinforcement learning, the agent interacts the environment so as to be able to use an efficient method with the environment by taking an action and receiving a to find an optimal decision for each state [17]. 76 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK The light of this hypothesis, the following steps will be making the state a combination of the current state and taken: previous state, it is still difficult to uniquely identify each state. To sort out this issue, it will be worth investigat- (1) Initially, a basic abstract argument game model is ing if this can be resolved by representing each state as: used due to its simplicity in implementing arguments. (levelOf T ree, agentID, currentState, previousState) This in turn makes it possible to investigate how reinforcement learning can be applicable to a simple Backtracking ([14];[16]) will also be considered to improve dialogue scenario. the simple argument game by developing some game rules (2) Evaluation of an argumentation setting with a human in [18]. Moreover, to make the game more competitive and or another AI agent by observing learning perfor- effective it is important to make the agent consider the oppo- mance over time. nent’s strategy[7]. Hence, the learning agent needs to consider (3) Investigating suitable means for reinforcement learn- how to learn to argue with the opponent by expanding its ing of a complicated dialogue scenario and studying knowledge base with new arguments. In addition, in com- the results in order to generalise the RL method. plex argumentation scenario, we need to consider moving Complicated dialogue scenario involves more move from high level abstraction to the argument contents by types e.g. questions, challenges, assertion, withdrawal using propositional logic. Weighted arguments will also be and moves from abstract argument level to proposi- considered in this research since some arguments are more tional level. important than others. We will consider choosing a suitable This work will also investigate other different scenario argument model for the complicated scenario. There are many such as backtracking ([14];[16]), arguments content, weight models; for instance, Prakken’s dialogue game P ersuasion of individual argument amongst others[6]. Additionally, chal- with dispute [13], Bench-Capon T DG dialogue game ([2];[3]), lenging issue such as states representation[1] as well as reward DC by Mackenzie [10], Utilisation by Moore in [12] and DE function will also be explored. system (Yuan et al. [20]), all of these models will be critically To prove the hypothesis, we have built the argumentation reviewed. software to facilitate experiment for reinforcement learning agent arguing against different agents. A software testbed, REFERENCES Argumento+, named after its predecessor Argumento as [1] Sultan Alahmari, Tommy Yuan, and Daniel Kudenko. 2017. Re- reported in[24], has been built using the Java programming inforcement learning for abstract argumentation: Q-learning ap- proach. In Adaptive and Learning Agents workshop (at AAMAS language. Argumento+ contains the RL agent as well as 2017). three other agents namely, random, maximum probability [2] Trevor JM Bench-Capon. 1998. Specification and implementation utility and minimum probability utility agent for the sake of Toulmin dialogue game. In Proceedings of JURIX, Vol. 98. 5–20. of the evaluation. The agents play abstract argument games. [3] Trevor J. M. Bench-Capon, T Geldard, and Paul H Leng. 2000. A RL agent plays game against them to maximise the cumu- method for the computational modelling of dialectical argument lative reward by winning more games. Indeed, if RL agent with dialogue games. Artificial Intelligence and Law 8, 2 (2000), 233–254. win the game, it will receive rewards based on the number [4] Heriberto Cuayáhuitl, Simon Keizer, and Oliver Lemon. 2015. of acceptable arguments i.e. grounded extensions. We con- Strategic dialogue management via deep reinforcement learning. arXiv preprint arXiv:1511.08099 (2015). sidered grounded extension because it contains an argument [5] Phan Minh Dung. 1995. On the acceptability of arguments and its that has no doubt in comparison with other arguments [19]. fundamental role in nonmonotonic reasoning, logic programming Consequently, it will be a more acceptable argument. and n-person games. Artificial intelligence 77, 2 (1995), 321–357. [6] Paul E Dunne, Anthony Hunter, Peter McBurney, Simon Parsons, We have performed an initial experiment to investigate and Michael Wooldridge. 2011. Weighted argument systems: whether RL agent learns to argue against baseline agents Basic definitions, algorithms, and complexity results. Artificial [1]. RL agent adopts a commonly used RL method, that Intelligence 175, 2 (2011), 457–486. [7] Katie Long Genter, Santiago Ontañón, and Ashwin Ram. 2011. is Q-learning algorithm. The aim of Q-learning is to allow Learning Opponent Strategies through First Order Induction.. In an agent to learn through experience and map each state FLAIRS Conference. 1–2. [8] Piotr S Kośmicki. 2010. A platform for the evaluation of auto- with an action by choosing the maximum value from the mated argumentation strategies. In International Conference Q-table which is updated after each episode. The initial on Rough Sets and Current Trends in Computing. Springer, experiment and evaluation generally encourage the adopting 494–503. [9] Ryszard Kowalczyk. 2014. Intelligent Agent Technol- of reinforcement learning agent in argumentation with a ogy Research. https://www.swinburne.edu.au/ict/success/ long term delayed reward which are considered as grounded research-projects-and-grants/intelligent-agent/. (2014). [Online; extensions [1]. accessed 06-April-2017]. [10] Jim D Mackenzie. 1979. Question-begging in non-cumulative In the future, this work will attempt to suggest ways to systems. Journal of philosophical logic 8, 1 (1979), 117–133. improve the RL agent performance by carrying out further [11] Sanjay Modgil, Francesca Toni, Floris Bex, Ivan Bratko, Car- los I Chesnevar, Wolfgang Dvořák, Marcelo A Falappa, Xiuyi works on the initial experiment results. The state represen- Fan, Sarah Alice Gaggl, Alejandro J Garcı́a, and others. 2013. tation of the arguments still needs to be more sophisticated The added value of argumentation. In Agreement technologies. in order to make each one of them unique [1], and as a Springer, 357–403. [12] David John Moore. 1993. Dialogue game theory for intelligent result this will make it easy for the agent to distinguish tutoring systems. Ph.D. Dissertation. Leeds Metropolitan Uni- between states. Even though initial suggestion pointed at versity. 2 18th Workshop on Computational Models of Natural Argument 77 Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK [13] Henry Prakken. 2001. Relating protocols for dynamic dispute with logics for defeasible argumentation. Synthese 127, 1 (2001), 187–219. [14] Henry Prakken. 2010. Argumentation Logics: Games for abstract argumentation. http://www.staff.science.uu.nl/∼prakk101/al/ chongqing10.html. (2010). [Online; accessed 01-April-2017]. [15] Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1. MIT press Cambridge. [16] Gerard AW Vreeswik and Henry Prakken. 2000. Credulous and sceptical argument games for preferred semantics. In European Workshop on Logics in Artificial Intelligence. Springer, 239–253. [17] Eric Wiewiora. 2004. Efficient Exploration for Reinforcement Learning. Ph.D. Dissertation. Citeseer. [18] Michael Wooldridge. 2002. An introduction to multiagent sys- tems. John Wiley & Sons. [19] Michael Wooldridge. 2009. An introduction to multiagent sys- tems. John Wiley & Sons. [20] Tangming Yuan, David Moore, and Alec Grierson. 2003. Compu- tational Agents as a Test-Bed to Study the Philosophical Dialogue Model” DE”: A Development of Mackenzie’s DC. Informal Logic 23, 3 (2003). [21] Tangming Yuan, David Moore, and Alec Grierson. 2007. A human– computer debating system prototype and its dialogue strategies. International Journal of Intelligent Systems 22, 1 (2007), 133– 156. [22] Tangming Yuan, David Moore, and Alec Grierson. 2008. A human- computer dialogue system for educational debate: A computa- tional dialectics approach. International Journal of Artificial Intelligence in Education 18, 1 (2008), 3–26. [23] Tangming Yuan, Jenny Schulze, Joseph Devereux, and Chris Reed. 2008. Towards an arguing agents competition: Building on argumento. In Proceedings of IJCAI2008 Workshop on Compu- tational Models of Natural Argument. [24] Tangming Yuan, Viðar Svansson, David Moore, and Alec Gri- erson. 2007. A computer game for abstract argumentation. In Proceedings of the 7th Workshop on Computational Models of Natural Argument (CMNA07). 3 78 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK