<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Some Connections between Qualitative Spatial Reasoning and Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anthony G Cohn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, University of Leeds</institution>
          ,
          <addr-line>LS2 9JT</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Alan Turing Institute</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As has been remarked on before, Space is Special[1, 2]. Tobler's First Law of Geography [3] captures the notion that all things are related, but close things are more related. Tversky [2] eloquently argues for the special place for spatial representations, and in particular that (living) things must move and act in space to survive, that all thought begins as spatial thought and that spatial thinking comes from and is shaped by perceiving the world and acting in it, be it through learning or through evolution. Artificial Intelligence has thus naturally sought to endow artificial agents with spatial representations and ways of reasoning about space. Amongst these, I will focus on qualitative spatial representations and reasoning mechanisms (henceforth QSR, where the 'R' may stand for representation or reasoning or both, depending on the context). There have been many calculi developed for representing and reasoning about space in qualitative ways, covering aspects such as (mereo)topology, orientation/direction, size, distance and shape [4, 5]. Whilst QSR has primarily been concerned with deductive reasoning, there have been and there are increasingly many connections between QSR and machine learning. In this talk I will discuss a number of such connections, ranging from the use of qualitative spatial representations in an inductive logic programming system to learn event classes occurring in video data, to the question of whether large language models (LLMs) are able to make inferences reliably about qualitative spatial relations, and whether they can be supported by symbolic reasoners. Learning rules for video interpretation: Dubba et al. [6] show how Inductive Logic Programming can be used to learn a set of rules which can be used to recognise event class instances where videos have been abstracted to a set of qualitative spatio-temporal relations. The method is demonstrated in two domains including one which involves recognising the events which are necessary to service an aircraft whilst it is turning around at an airport. Whilst the resulting rules are relatively simple and it might be wondered whether a hand-written set of rules could not be easily written and just as efective, it turns out that in a comparison with such a set of manually written rules, the learned model is more efective, because the latter does not take account of noise in the video data, where as the learned model was already trained on noisy data and was thus more robust in the face of noisy data at classification time. The paper also shows how the inductive process can be interleaved with abduction, using an embedded spatial theory to improve the learned model in the face of noisy training data. Learning groundings for spatial representations: A key question for QSR is how the relations in the calculus correspond to their use in language and their correspondence to the real world. Whilst relations are usually given plausible names in a relational calculus, there is no guarantee that these correspond to naturally occurring instances. Indeed, McDermott [7] notes the dangers of “wishful naming”. Alomari et al. [8] present a system, named OLAV, which addresses the problem of bootstrapping knowledge in language and vision for autonomous robots. OLAV is able, for the first time, to (1) learn to form discrete concepts from sensory data; (2) ground language (n-grams) to these concepts (which include not only spatial relations, but also object attributes and actions); (3) induce a grammar for the language being used to describe the perceptual world; and moreover to do all this incrementally, without storing all previous data. The resulting grammar can then be used to parse novel commands for downstream action in a robotic system. Analysing polysemy in spatial prepositions: One challenge in assigning meanings to spatial prepositions is that they can frequently be polysemous, i.e. they can have multiple related senses (the polysemes). As the senses of polysemous terms are so closely intertwined, the theoretical and computational treatment of polysemy presents a dificult challenge for semantic models. To given an example: compare “book on a table”, “balloon on the ceiling” and “picture on the wall”. Richard-Bollans et al. [9] discuss this problem and shows how a model can be built in which these senses can be distinguished using data from human subjects. Can Large Language Models perform qualitative spatial reasoning reliably? Many claims (e.g. [10, 11, 12]) have been made since the emergence of Large Language Models (LLMs) as to their ability to reason. Spatial</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        reasoning is of particular interest since not only does it underlie a human’s ability to operate in the physical
world, but also because LLMs are not embodied; so the question arises, have they nonetheless acquired an ability
to reason about situations which might occur in the real physical world? I will present the results of a number of
experiments in which this ability is tested: for (cardinal)Michael Sioutis &lt;michael.sioutis@lirmm.fr&gt; directions
[
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ], for relational composition and conceptual neighbourhood construction [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and other notions in spatial
reasoning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. One challenge for evaluating LLMs in the domain of spatial reasoning (and commonsense more
generally [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) is the paucity of good benchmarks – I will discuss this issue and briefly present a new benchmark
which is based on a synthetic generator, able to provide arbitrarily many examples of automatically labelled
indoor virtual scenes[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        Using LLMs as a natural language interface to symbolic spatial reasoners: Given the deficiencies in the
robustness of LLMs in performing qualitative spatial reasoning, it is worth asking the question whether an LLM
and a more traditional symbolic reasoner in combination could be more efective than either on their own. An
LLM has strengths in analysing language, but no so much in more complex reasoning, whilst an LLM on its own
has no ability to comprehend natural language. The combination of the two can be particularly efective, for
example as demonstrated in the StepGame benchmark [
        <xref ref-type="bibr" rid="ref14 ref19">19, 14</xref>
        ].
      </p>
      <p>
        Acknowledgements This work was supported by: the Fundamental Research priority area of The Alan Turing
Institute; Microsoft Research - Accelerating Foundation Models Research program; the Economic and Social
Research Council (ESRC) under grant ES/W003473/1. I also wish to give heartfelt thanks to all my co-authors in
the papers [
        <xref ref-type="bibr" rid="ref14 ref16 ref18 ref6 ref8 ref9">6, 8, 9, 14, 16, 18</xref>
        ] I will discuss in the talk, and with whom it has been such a pleasure to interact with.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Goodchild</surname>
          </string-name>
          ,
          <article-title>Challenges in geographical information science</article-title>
          ,
          <source>Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences</source>
          <volume>467</volume>
          (
          <year>2011</year>
          )
          <fpage>2431</fpage>
          -
          <lpage>2443</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Tversky</surname>
          </string-name>
          , Mind in Motion: How Action Shapes Thought, Basic Books,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>Tobler's first law and spatial analysis</article-title>
          ,
          <source>Annals of the association of American geographers 94</source>
          (
          <year>2004</year>
          )
          <fpage>284</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Renz</surname>
          </string-name>
          ,
          <article-title>Qualitative spatial representation and reasoning</article-title>
          , in: F. Van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Lifschitz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Porter</surname>
          </string-name>
          (Eds.),
          <source>Handbook of knowledge representation, Elsevier</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>551</fpage>
          -
          <lpage>596</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>A survey of qualitative spatial representations</article-title>
          ,
          <source>The Knowledge Engineering Review</source>
          <volume>30</volume>
          (
          <year>2015</year>
          )
          <fpage>106</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Dubba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Hogg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bhatt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dylla</surname>
          </string-name>
          ,
          <article-title>Learning relational event models from video</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>53</volume>
          (
          <year>2015</year>
          )
          <fpage>41</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>McDermott</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence meets natural stupidity</article-title>
          ,
          <source>ACM SIGART Bulletin</source>
          (
          <year>1976</year>
          )
          <fpage>4</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Alomari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Hogg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>Online perceptual learning and natural language acquisition for autonomous robots</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>303</volume>
          (
          <year>2022</year>
          )
          <fpage>103637</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Richard-Bollans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>Identifying and modelling polysemous senses of spatial prepositions in referring expressions</article-title>
          ,
          <source>Cognitive Systems Research</source>
          <volume>77</volume>
          (
          <year>2023</year>
          )
          <fpage>45</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Creswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shanahan</surname>
          </string-name>
          ,
          <article-title>Faithful reasoning using large language models</article-title>
          ,
          <year>2022</year>
          . arXiv:
          <volume>2208</volume>
          .
          <fpage>14271</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. C.-C. Chang</surname>
          </string-name>
          ,
          <article-title>Towards reasoning in large language models: A survey</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2212</volume>
          .
          <fpage>10403</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kojima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Iwasawa</surname>
          </string-name>
          ,
          <article-title>Large language models are zero-shot reasoners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>22199</fpage>
          -
          <lpage>22213</lpage>
          . arXiv:
          <volume>2205</volume>
          .
          <fpage>11916</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Blackwell</surname>
          </string-name>
          ,
          <article-title>Evaluating the ability of large language models to reason about cardinal directions</article-title>
          ,
          <source>in: Proc. COSIT</source>
          -
          <volume>24</volume>
          (to appear),
          <source>arXiv:2406.16528</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Hogg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>Advancing spatial reasoning in large language models: An in-depth evaluation and enhancement using the StepGame benchmark</article-title>
          ,
          <source>in: Proc. AAAI</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>An evaluation of ChatGPT-4's qualitative spatial reasoning capabilities in</article-title>
          <source>RCC-8</source>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2309.15577. arXiv:
          <volume>2309</volume>
          .15577, appears in Working Notes of QR-
          <volume>23</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hernandez-Orallo</surname>
          </string-name>
          ,
          <article-title>Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs</article-title>
          , arXiv preprint arXiv:
          <volume>2304</volume>
          .11164 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <article-title>Benchmarks for automated commonsense reasoning: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>F.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Hogg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>Reframing spatial reasoning evaluation in language models: A real-world simulation benchmark for qualitative reasoning</article-title>
          ,
          <source>in: Proc. IJCAI</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , A. Lipani,
          <article-title>StepGame: A new benchmark for robust multi-hop spatial reasoning in texts</article-title>
          ,
          <source>in: Proc. AAAI</source>
          , volume
          <volume>36</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>11321</fpage>
          -
          <lpage>11329</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>