<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neuro-symbolic Tuning for Multi-hop Reasoning over Spatial Language</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tanawan Premsri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Parisa Kordjamshidi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, Michigan State University</institution>
          ,
          <addr-line>MI</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Spatial Reasoning is a fundamental aspect of human cognition to perform everyday activities. It is also an essential skill for machines to engage in human-like interactions with the environment. However, recent research shows that even state-of-the-art language models struggle in spatial reasoning, especially in unobserved situations with complex input compositions. This is attributed to not achieving the right level of abstraction required for their generalizability. To alleviate this issue, we propose training the language models with neuro-symbolic techniques to exploit the spatial logical rules of reasoning and provide an additional source of supervision to the models. Training models to adhere to spatial reasoning rules guides them to make more efective abstractions for generalizability and transfer learning. We evaluate our proposed technique on various benchmarks for spatial reasoning over text. Our results based on the multiple language model backbones show the efectiveness of our neuro-symbolic training in domain transfer and complex multi-hop spatial reasoning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Spatial Reasoning</kwd>
        <kwd>Neuro-symbolic training</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Spatial reasoning is essential for humans cognition and
also plays a crucial role in many AI applications, including
language grounding [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], computer vision [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ], robotics [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4,
5, 6</xref>
        ] and even more specific fields such as medical domain [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7,
8, 9</xref>
        ].
      </p>
      <p>
        Large Language models have been widely applicable in
many of problems in these areas and, in some cases, show
human level performance [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. However, recent
studies highlight their shortcomings in the spatial reasoning
abilities of in multi-hop reasoning over text [
        <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
        ] in
many downstream applications [
        <xref ref-type="bibr" rid="ref15 ref3">3, 15</xref>
        ] which calls for more
attention to this topic.
      </p>
      <p>
        In this paper, we address the issue of spatial reasoning
in LMs and their dificulty in obtaining the abstractions
required for generalizability in unobserved complex situations
employing a generic neuro-symbolic framework. We
propose to fine-tune the LMs with a neuro-symbolic technique
that exploits the spatial logical rules to guide the level of
abstraction captured during training. In particular, we train
the models to minimize both the cross-entropy loss and
the violation from logical constraints. We demonstrate the
efectiveness of our proposed framework in both
encoderbased and generative language models. For evaluation, we
use three Spatial Question Answering (SQA) benchmarks,
SpartQA-HUMAN [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], ReSQ [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], and StepGame [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        The results show that our proposed method benefits both
LM types, especially when multiple hops of reasoning are
required. The performance improvements over multiple
domains confirm our hypothesis about the efectiveness of
neuro-symbolic training on generalizability.
2. Training with Spatial Logic
The spatial logical rules used in our framework are based
on the developed spatial logical knowledge base in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
Examples of such rules are given in the Figure 1. To clarify,
Converse rules, (,  ) ⇒ (, ), represents
      </p>
      <p>R1
q3: Triangle
below box</p>
      <p>Converse:
Above(X, Y) =&gt;
Below(Y, X)</p>
      <p>Topological:
Below(X, Y) /\ Contain(Y, Z) =&gt;</p>
      <p>Below(X, Z)</p>
      <p>Converse:
CoveredBy(X, Y) =&gt;</p>
      <p>Contain(Y, X)
+</p>
      <p>R3
T: Triangle below</p>
      <p>Square
q2: Square is in box</p>
      <p>Initial facts</p>
      <p>R2
q4: Box contain
square</p>
      <p>Intermediate</p>
      <p>Target
Rules
1
2
3
Rules
1
2
3</p>
      <p>Constraints in YN
 (1) ⇒  (3)
 (2) ⇒  (4)
 (3) ∧  (4) ⇒  ()</p>
      <p>Constraints in FR
 (1, ) ⇒  (3, )
 (2, ) ⇒  (4, )
 (3, ) ∧  (4, ) ⇒  (, )
that if an object  is above object  , therefore, object  is
below object . The rest of the rules use a similar notation.</p>
      <p>To apply training with Spatial Logic, we follow three
steps. Firstly, we create example-specific rules based on the
given Spatial Logic. This process is explained in the example
of Figure 1.</p>
      <p>We use the resolution tree, which provides the logical
implication steps, to infer the answer to the final query
from the input context. Note that our synthetic training
data (e.g., SpaRTUN) provides the logical representation of
the context. We start creating the tree using initial facts in
the given context and a forward chaining approach to find
the applicable rules. In this way, we obtain the intermediate
inferred facts. We denote fact  as  and the sequence of all
derived intermediate facts, including the target question, as
-ℎ.</p>
      <p>Secondly, we generate the consistency constraints
between s given the -ℎ. To explain the consistency
constraints between questions, we denote the answer to
the YN questions as   (), which will be True if
the answer to  is True. We denote the answer to the
FR questions as  (, ), which will be True
if the specified relation exists in the set of answers to .
We obtain a set of consistency rules per training example,
as shown in Table 1. For example, in Figure 1, if 1: box
above the square," is True, then 3: "triangle below box,"
should be True. The corresponding constraints for YN will
be   (1) ⇒   (3), and for the FR case, will
be  (1, ) ⇒  (3, ).</p>
      <p>
        Lastly, after obtaining the consistency constraints, we
minimize the violation of the model from these constraints
by adding a corresponding term in the loss function
objective. However, we need to obtain a diferential form of
logic as a surrogate of the original logical constraints
violation to do this. We follow the previous research for this
goal [
        <xref ref-type="bibr" rid="ref19 ref20 ref21">19, 20, 21</xref>
        ] and use the DomiKnowS framework for
the actual implementation [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. To implement this
problem using the DomiKnowS declarative language, we must
declare a graph of concepts and relationships and add the
logical rules/constraints between them. DomiKnowS ofers
a Python library and a specific syntax to express the graph
and logic. An example of concepts, a symmetric relation,
and a constraint using symmetric relation is as follows,
We refer the reader to DoinKnowS documentation about
the syntax and the semantics of the code 1. Our main
hypothesis is that providing supervision from high-level
logical knowledge enables the model to capture higher levels
of abstraction, improving generalization to other domains.
The advantage of our proposed approach is that it does not
require full access to logical knowledge. Any partially
available knowledge can be exploited during training without
further requirement at inference time. This is crucial since
inference-time symbolic reasoning can be time-consuming
for real-time applications.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Experimental Results</title>
      <p>We conduct two sets of experiments on realistic (ReSQ)
and synthetic datasets (SpartQA, SpaRTUN, and StepGame).
With these experiments, we empirically evaluate the
impact of our proposed logic-based fine-tuning on small-scale
language models and compare them to very large language
models that merely use prompt engineering. We evaluate
the performance of our proposed method on two types of
language models, encoder-based and generative models. We
select BERT as the baseline encoder-based and Flan-T5 as
the baseline generative model.</p>
      <p>We also report the results of basic fine-tuning with the
1https://hlr.github.io/domiknows
SpaRTUN dataset in two, so-called, BERT-T and Flan-T5-T
models.</p>
      <p>Realistic Domain. ReSQ serves as the realistic SQA
domain. As observed in Table 2, using the -ℎ is efective
for both models (BERT and Flan-T5) with a notable
improvement on Flan-T5. Particularly, Flan-T5-T+-ℎ (line 6)
shows 2% improvement over Flan-T5-T (line 5).</p>
      <p>For a deeper understanding of these results, we analyzed
the performances on diferent splits of ReSQ. There are three
splits based on the manually annotated depth of reasoning
required to answer questions in ReSQ. The first two splits
include questions that require one or two hops of reasoning,
denoted as =1, and =2. The last type is unclassified , which
covers questions where the depth of reasoning is dificult
to determine. Those questions require more of
commonsense knowledge. Our observations in Table 2 reveal that
our model consistently improves on  = 2 but adversely
afects BERT’s performance on  = 1 and the unclassified
categories. According to this result, we conclude that when
more hops of reasoning are required, logic-based tuning
demonstrates significant improvement. However, our
proposed tuning method is less efective in the unclassified class,
which requires commonsense knowledge.</p>
      <p>On the other hand, LLMs show superior performance
on ReSQ compared to all fine-tuning results. The LLMs
consistently exhibit around 2% to 13% higher performance
compared to Flan-T5+T+-ℎ (lines 7 to 13). The
performance is much higher on the unclassified subset of the
dataset, which can be seen even with the zero-shot method.
This implies that LLM’s out-performance is mainly due to
their commonsense knowledge rather than their complex
reasoning capability, in contrast to our proposed method,
which deals with complex multi-hop reasoning.</p>
      <p>Nevertheless, we observe that using logic-based
finetuning yields a higher improvement over Flan-T5 compared
to BERT on the unclassified subset. This indicates that the
-ℎ approach can guide complex reasoning when
applied to a model with more commonsense knowledge.
Synthetic Domain with More Complex Logical
Reasoning. SpartQA-Human and StepGame are synthetic domains
used in our experiments. We consistently observe
improvement with our proposed -ℎ in this domain, which
typically requires many more hops of reasoning. As observed in
Table 2, -ℎ consistently shows improvement in
FlanT and BERT compared to fine-tuning without it. Moreover,
the gap between small PLMs and LLMs is much smaller in
this dataset compared to the realistic domain (ReSQ). This
is expected since LLMs are better at commonsense than
complex reasoning as previously presented.</p>
      <p>
        The result is further supported when assessing the
proposed method on StepGame. As can be observed in Table 3,
the fine-tuning method consistently demonstrates
significant positive diferences in all reasoning steps compared to
LLMs. The struggle of GPT3 on reasoning over StepGame
is also investigated in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The reported results from this
paper are in Table 3. Our proposed method consistently
improves by 1%—4% on a higher number of reasoning hops
( = 6 to  = 10), similar to the observation in ReSQ. These
results confirm our primary hypothesis that our proposed
method equips the models with a higher level of logical
abstraction to conduct higher reasoning steps.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjamshidi</surname>
          </string-name>
          ,
          <article-title>Towards navigation by reasoning over spatial configurations</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2105</volume>
          .
          <fpage>06839</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Kordjamshidi,
          <article-title>Lovis: Learning orientation and visual signals for vision and</article-title>
          language navigation,
          <year>2022</year>
          . arXiv:
          <volume>2209</volume>
          .
          <fpage>12723</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          , G. Emerson,
          <string-name>
            <given-names>N.</given-names>
            <surname>Collier</surname>
          </string-name>
          ,
          <article-title>Visual spatial reasoning</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>11</volume>
          (
          <year>2023</year>
          )
          <fpage>635</fpage>
          -
          <lpage>651</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Sisbot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Marin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Alami</surname>
          </string-name>
          ,
          <article-title>Spatial reasoning for human robot interaction</article-title>
          ,
          <source>in: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems</source>
          , IEEE,
          <year>2007</year>
          , pp.
          <fpage>2281</fpage>
          -
          <lpage>2287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Yadollahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Monteiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paiva</surname>
          </string-name>
          ,
          <article-title>Learning spatial reasoning in virtual vs. physical games with robots</article-title>
          ,
          <source>in: Proceedings of the 11th International Conference on Human-Agent Interaction, HAI '23</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>162</fpage>
          -
          <lpage>170</lpage>
          . URL: https://doi.org/10.1145/3623809. 3623830. doi:
          <volume>10</volume>
          .1145/3623809.3623830.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Kordjamshidi,
          <article-title>LOViS: Learning orientation and visual signals for vision and language navigation</article-title>
          ,
          <source>in: Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          , pp.
          <fpage>5745</fpage>
          -
          <lpage>5754</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .
          <fpage>505</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Atif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hudelot</surname>
          </string-name>
          , G. Fouquier,
          <string-name>
            <given-names>I.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Angelini</surname>
          </string-name>
          ,
          <article-title>From generic knowledge to specific reasoning for medical image interpretation using graph based representations</article-title>
          .,
          <source>in: IJCAI</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>224</fpage>
          -
          <lpage>229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Shooshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <article-title>Understanding spatial language in radiology: Representation framework, annotation, and spatial relation extraction from chest x-ray reports using deep learning</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>108</volume>
          (
          <year>2020</year>
          )
          <article-title>103473</article-title>
          . URL: https://www.sciencedirect.com/science/article/ pii/S1532046420301027. doi:https://doi.org/10. 1016/j.jbi.
          <year>2020</year>
          .
          <volume>103473</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , W. Ma,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Heng</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Dou</surname>
          </string-name>
          ,
          <article-title>3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable medical image segmentation</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2306</volume>
          .
          <fpage>13465</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>OpenAI</surname>
          </string-name>
          , Gpt-4
          <source>technical report</source>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <year>2020</year>
          . arXiv:
          <year>2005</year>
          .14165.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cahyawijaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lovenia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination</article-title>
          , and interactivity,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2302.04023. arXiv:
          <volume>2302</volume>
          .
          <fpage>04023</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ishay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Coupling large language models with logic programming for robust and general reasoning from text</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2307</volume>
          .
          <fpage>07696</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mirzaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjamshidi</surname>
          </string-name>
          ,
          <article-title>Disentangling extraction and reasoning in multi-hop spatial reasoning</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2310</volume>
          .
          <fpage>16731</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kirmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ichter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Driess</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Florence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sadigh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Guibas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          , Spatialvlm:
          <article-title>Endowing vision-language models with spatial reasoning capabilities</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2401</volume>
          .
          <fpage>12168</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mirzaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Faghihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjmashidi</surname>
          </string-name>
          , Spartqa: :
          <article-title>A textual question answering benchmark for spatial reasoning</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2104</volume>
          .
          <fpage>05832</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mirzaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjamshidi</surname>
          </string-name>
          ,
          <article-title>Transfer learning with synthetic corpora for spatial role labeling and reasoning, 2022</article-title>
          . arXiv:
          <volume>2210</volume>
          .
          <fpage>16952</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , A. Lipani,
          <article-title>Stepgame: A new benchmark for robust multi-hop spatial reasoning in texts</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2204</volume>
          .
          <fpage>08292</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mehta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Srikumar</surname>
          </string-name>
          ,
          <article-title>A logic-driven framework for consistency of neural models</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1909</year>
          .00126.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Asai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hajishirzi</surname>
          </string-name>
          ,
          <article-title>Logic-guided data augmentation and regularization for consistent question answering</article-title>
          , in: D.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Schluter</surname>
          </string-name>
          , J. Tetreault (Eds.),
          <article-title>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>5642</fpage>
          -
          <lpage>5650</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>499</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>499</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , D. Roth,
          <article-title>Joint constrained learning for event-event relation extraction</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>696</fpage>
          -
          <lpage>706</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>51</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>51</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Faghihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uszok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nafar</surname>
          </string-name>
          , E. Raisi,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjamshidi</surname>
          </string-name>
          ,
          <article-title>Domiknows: A library for integration of symbolic domain knowledge in deep learning</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2108</volume>
          .
          <fpage>12370</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>