<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neuro-Symbolic Agent with ASP for Robust Exception Learning in Text-Based Games</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kinjal Basu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>IBM Research</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Text-based games (TBGs) present a significant challenge in natural language processing (NLP) by requiring reinforcement learning (RL) agents to combine language comprehension with reasoning. A primary dificulty for these agents is achieving generalization across multiple games, particularly in handling both familiar and novel objects. While deep RL approaches excel in scenarios with known objects, they struggle with unseen ones. Commonsense-augmented RL agents address this but often lack interpretability and transferability. To address these limitations, we propose a neuro-symbolic framework that integrates symbolic reasoning with neural RL models. Our approach utilizes inductive logic programming (ILP) to learn symbolic rules dynamically. We show that this hybrid agent outperforms pure neural agents in handling familiar objects. Additionally, we introduce a novel generalization approach based on information gain and WordNet that helps our agent to excel in the test sets with unseen objects as well.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Answer Set Programming</kwd>
        <kwd>Reinforcement Learning</kwd>
        <kwd>Text-Based Games</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>using WordNet, enabling the agent to handle unseen objects in the OOD test set, resulting in
superior performance compared to state-of-the-art methods on TWC.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Symbolic Policy insert(X, fridge) :- apple(X)</p>
      <p>Learner
This paper presents a hybrid
approach to reinforcement learning
(bRyLc)ominbitnexint-gbnaseeudralgaanmdessy(mTbBoGlisc) tt()ssaecsOaebresreyarivnadgtiisaohnw:arseYhdoeura’pvpaelnedenatefrreiddgae.k.i.tcYhoeun.arYeou
agents. Neural agents excel in explo- action (ak ): insert red apple into dishwasher
ration, while symbolic agents spe- reward (rk ) : 0
cialize in learning interpretable poli- action (at ): insert red apple into fridge
cies based on rewards. By leverag- reward (rt ) : +1
ing both, the system aims to achieve &lt; a1..k..t , s1..k..t , r1..k..t &gt;
more efective results. The
symbolic agent learns logic-based poli- &gt; &lt;stao,rsa,gre&gt;
cSieetsParnodgraapmplmieisntgh(eAmSPvi)asaonlvAern,swwitehr ,,rsa&lt; SPtoorliacgye insert(X, fridge) :- fruit(X)
the neural agent stepping in when
the symbolic agent cannot provide Okobirstaecnrhgveean.t.i.o.n:YouYoaur’evecaernrtyeirsnetagdteaa(s)
suitable actions. A key goal of the action (a): insert orange into fridge
system is to ensure robust perfor- reward (r) : +1
mance on both seen and unseen
(outof-distribution) entities. To address
this, the paper introduces a novel
approach for policy generalization,
dynamically generating generalized rules using WordNet hypernym relations.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <p>Data and Experimental Setup: To test our neuro-symbolic agent, we choose TW-Cooking
domain [3], which requires both exploration and exploitation. As the name suggests, this
game suit is about collecting various cooking ingredients and preparing a meal following an
in-game recipe. To showcase the generalization capability, we have tested our neuro-symbolic
and neuro-symbolic + generalization agents on TWC games with OOD data. With the help of
TWC framework [4], we have generated a set of games with 3 diferent dificulty levels: easy,
medium, and hard.</p>
      <p>Results: We have tested the cooking game with Neuro-Symbolic Rules and compared the results
with the baseline model (LSTM-A2C). Table 1 illustrates the results, where ‘L’ means dificulty
level. For TWC, table 2 shows the comparison results of all the 4 settings along with the baseline
model (text-only agent). We compared our agents in two diferent test sets - (i) IN distribution:
that has the same entities as the training dataset, and (ii) OUT distribution: that has new entities,
which have not been included in the training set.
Comparison Results for the TW-Cooking Domain</p>
      <sec id="sec-3-1">
        <title>Level</title>
        <p>L-1
L-2
L-3
L-4
# Steps</p>
      </sec>
      <sec id="sec-3-2">
        <title>Neural + Symbolic Rules (Neuro-Symbolic)</title>
        <p>Results for TWC - Within Distribution (IN) and Out-of-Distribution (OUT) Games. We report the
number of steps taken by the agent (lower is better) and the normalized scores (higher is better).
Text Only
Neural + Symbolic Rules
Neural + Generalized Rules (Exhaustive)
Neural + Generalized Rules (IG Hyp. Lvl. 2)
Neural + Generalized Rules (IG Hyp. Lvl. 3)
15.12 ± 1.95
17.39 ± 3.01
12.86 ± 3.04
10.59 ± 1.3
9.55 ± 2.34</p>
        <p>N. Score
Text Only 16.66 ± 1.74 0.92 ± 0.03</p>
        <p>Neural + Symbolic Rules 21.19 ± 0.87 0.84 ± 0.06
OUT Neural + Generalized Rules (Exhaustive) 14.65 ± 2.18 0.91 ± 0.05</p>
        <p>Neural + Generalized Rules (IG Hyp. Lvl. 2) 15.08 ± 1.2 0.91 ± 0.02
Neural + Generalized Rules (IG Hyp. Lvl. 3) 12.72 ± 1.22 0.92 ± 0.02</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This paper presents a neuro-symbolic approach that integrates a non-monotonic reasoning
(NMR) symbolic agent with a neural agent in a text-based reinforcement learning (RL)
environment. We introduce a novel approach for rule generalization based on information gain. Our
method not only yields encouraging results in the TW-Cooking domain and TWC games but
also produces interpretable and transferable policies.
using deep reinforcement learning, arXiv preprint arXiv:1506.08941 (2015).
plorer: Exploration-guided reasoning for textual reinforcement learning, arXiv preprint
A. Trischler, W. Hamilton, Learning dynamic belief graphs to generalize on text-based
games, Advances in Neural Information Processing Systems 33 (2020) 3045–3057.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Narasimhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kulkarni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barzilay</surname>
          </string-name>
          ,
          <article-title>Language understanding for text-based games [3]</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Adhikari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yuan</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Côté</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zelinka</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Rondeau</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Laroche</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Poupart</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>