1. Introduction

Neuro-Symbolic Agent with ASP for Robust Exception Learning in Text-Based Games

Kinjal Basu

IBM Research

Text-based games (TBGs) present a significant challenge in natural language processing (NLP) by requiring reinforcement learning (RL) agents to combine language comprehension with reasoning. A primary dificulty for these agents is achieving generalization across multiple games, particularly in handling both familiar and novel objects. While deep RL approaches excel in scenarios with known objects, they struggle with unseen ones. Commonsense-augmented RL agents address this but often lack interpretability and transferability. To address these limitations, we propose a neuro-symbolic framework that integrates symbolic reasoning with neural RL models. Our approach utilizes inductive logic programming (ILP) to learn symbolic rules dynamically. We show that this hybrid agent outperforms pure neural agents in handling familiar objects. Additionally, we introduce a novel generalization approach based on information gain and WordNet that helps our agent to excel in the test sets with unseen objects as well.

eol>Answer Set Programming Reinforcement Learning Text-Based Games

1. Introduction

using WordNet, enabling the agent to handle unseen objects in the OOD test set, resulting in superior performance compared to state-of-the-art methods on TWC.

2. Methodology

Symbolic Policy insert(X, fridge) :- apple(X)

Learner This paper presents a hybrid approach to reinforcement learning (bRyLc)ominbitnexint-gbnaseeudralgaanmdessy(mTbBoGlisc) tt()ssaecsOaebresreyarivnadgtiisaohnw:arseYhdoeura’pvpaelnedenatefrreiddgae.k.i.tcYhoeun.arYeou agents. Neural agents excel in explo- action (ak ): insert red apple into dishwasher ration, while symbolic agents spe- reward (rk ) : 0 cialize in learning interpretable poli- action (at ): insert red apple into fridge cies based on rewards. By leverag- reward (rt ) : +1 ing both, the system aims to achieve < a1..k..t , s1..k..t , r1..k..t > more efective results. The symbolic agent learns logic-based poli- > <stao,rsa,gre> cSieetsParnodgraapmplmieisntgh(eAmSPvi)asaonlvAern,swwitehr ,,rsa< SPtoorliacgye insert(X, fridge) :- fruit(X) the neural agent stepping in when the symbolic agent cannot provide Okobirstaecnrhgveean.t.i.o.n:YouYoaur’evecaernrtyeirsnetagdteaa(s) suitable actions. A key goal of the action (a): insert orange into fridge system is to ensure robust perfor- reward (r) : +1 mance on both seen and unseen (outof-distribution) entities. To address this, the paper introduces a novel approach for policy generalization, dynamically generating generalized rules using WordNet hypernym relations.

3. Experiments and Results

Data and Experimental Setup: To test our neuro-symbolic agent, we choose TW-Cooking domain [3], which requires both exploration and exploitation. As the name suggests, this game suit is about collecting various cooking ingredients and preparing a meal following an in-game recipe. To showcase the generalization capability, we have tested our neuro-symbolic and neuro-symbolic + generalization agents on TWC games with OOD data. With the help of TWC framework [4], we have generated a set of games with 3 diferent dificulty levels: easy, medium, and hard.

Results: We have tested the cooking game with Neuro-Symbolic Rules and compared the results with the baseline model (LSTM-A2C). Table 1 illustrates the results, where ‘L’ means dificulty level. For TWC, table 2 shows the comparison results of all the 4 settings along with the baseline model (text-only agent). We compared our agents in two diferent test sets - (i) IN distribution: that has the same entities as the training dataset, and (ii) OUT distribution: that has new entities, which have not been included in the training set. Comparison Results for the TW-Cooking Domain

Level

L-1 L-2 L-3 L-4 # Steps

Neural + Symbolic Rules (Neuro-Symbolic)

Results for TWC - Within Distribution (IN) and Out-of-Distribution (OUT) Games. We report the number of steps taken by the agent (lower is better) and the normalized scores (higher is better). Text Only Neural + Symbolic Rules Neural + Generalized Rules (Exhaustive) Neural + Generalized Rules (IG Hyp. Lvl. 2) Neural + Generalized Rules (IG Hyp. Lvl. 3) 15.12 ± 1.95 17.39 ± 3.01 12.86 ± 3.04 10.59 ± 1.3 9.55 ± 2.34

N. Score Text Only 16.66 ± 1.74 0.92 ± 0.03

Neural + Symbolic Rules 21.19 ± 0.87 0.84 ± 0.06 OUT Neural + Generalized Rules (Exhaustive) 14.65 ± 2.18 0.91 ± 0.05

Neural + Generalized Rules (IG Hyp. Lvl. 2) 15.08 ± 1.2 0.91 ± 0.02 Neural + Generalized Rules (IG Hyp. Lvl. 3) 12.72 ± 1.22 0.92 ± 0.02

4. Conclusion

This paper presents a neuro-symbolic approach that integrates a non-monotonic reasoning (NMR) symbolic agent with a neural agent in a text-based reinforcement learning (RL) environment. We introduce a novel approach for rule generalization based on information gain. Our method not only yields encouraging results in the TW-Cooking domain and TWC games but also produces interpretable and transferable policies. using deep reinforcement learning, arXiv preprint arXiv:1506.08941 (2015). plorer: Exploration-guided reasoning for textual reinforcement learning, arXiv preprint A. Trischler, W. Hamilton, Learning dynamic belief graphs to generalize on text-based games, Advances in Neural Information Processing Systems 33 (2020) 3045–3057.

[1]

Narasimhan ,

Kulkarni ,

Barzilay , Language understanding for text-based games [3]

Adhikari ,

Yuan , M. -

A. Côté , M.

Zelinka , M. -

A. Rondeau , R.

Laroche , P.

Poupart , J.

Tang ,