<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Spatial and Temporal Reasoning with LLMs for Natural Language Comprehension and Grounding</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Parisa Kordjamshidi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Michigan State University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent research in Natural Language Processing (NLP) has revealed that deep learning models, particularly large language models (LLMs) trained on huge amounts of data sufer from a lack of interpretability and generalizability. This issue extends to spatial and temporal reasoning over natural language and visual data too. Although LLMs can impress us by fluently generating articles given a prompt, they often fail in basic reasoning tasks like understanding that "left" is the opposite of "right." Real-world problem-solving requires computational models that involve multiple interdependent learners, extensive composition, and reasoning based on additional knowledge beyond the available data. Our research endeavors at the Heterogeneous Learning and Reasoning Lab (HLR)1 focus on tackling some of these challenges. In the first part of my talk, I will discuss our recent research on three key areas. Firstly, we have evaluated the spatial reasoning capabilities of large language models over text and introduced new benchmarks specifically designed for this purpose [ 1, 2]. Secondly, we have developed architectures capable of capturing spatial and temporal information about entities and their activities, enabling procedural reasoning [3, 4]. Lastly, for vision and language grounding and navigation, we have developed new modules integrated with large vision and language model backbones. We pre-train these modules with novel synthesized indirect supervision resources to capture fine-grained semantics required for an accurate and explainable instruction following and navigation in a visual environment [5, 6, 7]. In the second part of my talk, I will introduce DomiKnowS, a Declarative learning-based programming framework. DomiKnowS is designed to facilitate the integration of learning and reasoning, leveraging both symbolic and sub-symbolic representations to solve complex AI-complete problems. This framework seamlessly integrates domain knowledge, represented symbolically as logical constraints, into deep models using various underlying algorithms. We cover a variety of training and inference time algorithms. Additionally, I will present GlUECons [8, 9], a new benchmark comprising tasks and models specifically designed for evaluating algorithms that aim to integrate logical constraints into deep models.</p>
      </abstract>
    </article-meta>
  </front>
  <body />
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mirzaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Faghihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Ning</surname>
          </string-name>
          , P. Kordjmashidi, SpartQA: :
          <article-title>A textual question answering benchmark for spatial reasoning</article-title>
          ,
          <source>in: 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2104</volume>
          .
          <fpage>05832</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Mirzaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjamshidi</surname>
          </string-name>
          ,
          <article-title>Transfer learning with synthetic corpora for spatial role labeling and reasoning</article-title>
          ,
          <source>in: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Abu Dhabi, United Arab Emirates,
          <year>2022</year>
          , pp.
          <fpage>6148</fpage>
          -
          <lpage>6165</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .emnlp-main.
          <volume>413</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rajaby Faghihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjamshidi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <article-title>The role of semantic parsing in understanding procedural text, in: Findings of the Association for Computational Linguistics: EACL 2023, Association for Computational Linguistics</article-title>
          , Dubrovnik, Croatia,
          <year>2023</year>
          , pp.
          <fpage>1837</fpage>
          -
          <lpage>1849</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .findings-eacl.
          <volume>137</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rajaby Faghihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kordjamshidi</surname>
          </string-name>
          ,
          <article-title>Time-stamped language model: Teaching language models to understand the flow of events, in: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>4560</fpage>
          -
          <lpage>4570</lpage>
          . URL: https: //aclanthology.org/
          <year>2021</year>
          .naacl-main.
          <volume>362</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>362</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Kordjamshidi,
          <article-title>VLN-trans: Translator for the vision and language navigation agent, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>13219</fpage>
          -
          <lpage>13233</lpage>
          . URL: https://aclanthology.org/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>737</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Kordjamshidi,
          <article-title>LOViS: Learning orientation and visual signals for vision and language navigation</article-title>
          ,
          <source>in: Proceedings of the 29th International Conference on Computational Linguistics (CoLING)</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Gyeongju, Republic of Korea,
          <year>2022</year>
          , pp.
          <fpage>5745</fpage>
          -
          <lpage>5754</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .coling-
          <volume>1</volume>
          .
          <fpage>505</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , P. Kordjamshidi,
          <article-title>Vision and language navigation agent with explanation ability</article-title>
          ,
          <source>Under Review</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H. R.</given-names>
            <surname>Faghihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nafar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mirzaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uszok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Premsri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          , P. Kordjamshidi,
          <article-title>GLUECons: A generic benchmark for learning under constraints</article-title>
          ,
          <source>in: Proceedings of Thirty-Seventh AAAI conference on artificial intelligence, accepted</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rajaby Faghihi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Uszok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nafar</surname>
          </string-name>
          , P. Kordjamshidi,
          <article-title>DomiKnowS: A library for integration of symbolic domain knowledge in deep learning</article-title>
          ,
          <source>in: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>