Neurosymbolic Visual Commonsense
                         On Integrated Reasoning and Learning about Space and Motion in
                         Embodied Multimodal Interaction

                         Mehul Bhatt
                         School of Science and Technology, Örebro University – Sweden
                         CoDesign Lab EU (Artificial and Human Intelligence).,
                         Cognitive Vision and Perception » https:// codesign-lab.org/ cognitive-vision


                                           Abstract
                                           We present recent and emerging advances in computational cognitive vision addressing artificial visual and spatial intelligence at
                                           the interface of (spatial) language, (spatial) logic and (spatial) cognition research. With a primary focus on explainable sensemaking
                                           of dynamic visuospatial imagery, we highlight the (systematic and modular) integration of methods from knowledge representation
                                           and reasoning, computer vision, spatial informatics, and computational cognitive modelling. A key emphasis here is on generalised
                                           (declarative) neurosymbolic reasoning & learning about space, motion, actions, and events relevant to embodied multimodal interaction
                                           under ecologically valid naturalistic settings in everyday life. Practically, this translates to general-purpose mechanisms for computational
                                           visual commonsense encompassing capabilities such as (neurosymbolic) semantic question-answering, relational spatio-temporal
                                           learning, visual abduction etc.
                                           The presented work is motivated by and demonstrated in the applied backdrop of areas as diverse as autonomous driving, cognitive
                                           robotics, design of digital visuoauditory media, and behavioural visual perception research in cognitive psychology and neuroscience.
                                           More broadly, our emerging work is driven by an interdisciplinary research mindset addressing human-centred responsible AI through
                                           a methodological confluence of AI, Vision, Psychology, and (human-factors centred) Interaction Design.

                                           Keywords
                                           Cognitive vision, Knowlede representation and reasoning (KR), Machine Learning, Integration of reasoning & learning, Commonsense
                                           reasoning, Declarative spatial reasoning, Relational Learning, Computational cognitive modelling, Human-Centred AI, Responsible AI


                         1. Motivation                                                                                                  spectrum of high-level human-centred sensemaking capa-
                                                                                                                                        bilities. These capabilities encompass operational functions
                         Multimodality in embodied interaction is an inherent aspect                                                    such as:
                         of human activity, be it in social, professional, or every-                                                         • Visuospatial conception formation, common-
                         day mundane contexts. Next-generation human-centred                                                                   sense/qualitative generalisation,  analogical
                         AI technologies, operating in such contextualised every-                                                              inference;
                         day settings, will require an inherent foundational capacity
                         to “make sense” of —e.g., perceive, understand, explain,                                                            • Hypothetical reasoning, argumentation, explana-
                         anticipate— everyday, naturalistic interactional multimodal-                                                          tion, counterfactual reasoning;
                         ity. This would be essential towards successfully achieving                                                         • Event based episodic maintenance & retrieval for
                         technology mediated (“human-in-the-loop” ) collaborative                                                              perceptual narrativisation.
                         assistance, as well as ensuring compliance with emerging
                         human-centred ethical and legal requirements, performance                                                      The afore enumeration is by no means exhaustive: in
                         benchmarks, and inclusive usability expectations. It is there-                                                 essence, in scope of artificial visual intelligence are diverse
                         fore crucial that the foundational building blocks of such                                                     high-level cognitive visuospatial sensemaking capabili-
                         next-generation systems be semantically aligned with the                                                       ties —be it mundane, analytical, or creative— that humans
                         descriptive, analytical, and explanatory characteristics and                                                   acquire developmentally or through specialised training,
                         complexity of human task conceptualisation, performance                                                        and are routinely adept at performing seamlessly in their
                         benchmarks, and usability expectations. Against this back-                                                     everyday life and work (e.g., driving a vehicle, tracking
                         drop, we define artificial visual intelligence [1] as:                                                         moving objects, navigating a crowded urban environment,
                                                                                                                                        engaging in sports, interpreting subtle cues in everyday
                                  » The computational capability to seman-                                                              people-communication from visual / gestural and auditory
                                  tically process and interpret diverse forms                                                           signals).
                                  of visual stimuli (typically, but not necessar-
                                  ily) emanating from sensing embodied mul-                                                             Our central focus is on the development of general,
                                  timodal interactions of / amongst humans                                                              domain-independent methods that may be seamlessly
                                  and other artefacts in diverse naturalistic                                                           integrated as part of hybrid computational cognitive system,
                                  situations of everyday life and work.                                                                 or even within computational cognitive models / cognitive
                                                                                                                                        architectures [2]. We also contextualise and demonstrate in
                         Within the scope of artificial visual intelligence are a wide-                                                 the backdrop of applications in autonomous driving, cog-
                                                                                                                                        nitive robotics, visuoauditory media design, and cognitive
                         International Joint Conference on Artificial Intelligence (IJCAI)., STRL 24:                                   psychology (e.g. [3, 4, 5, 6], [7, 8] ). Through applied case-
                         Third International Workshop on Spatio-Temporal Reasoning and Learning                                         studies, we provide a systematic model and general method-
                         (STRL), IJCAI 2024 – 5 August 2024, Jeju, South Korea                                                          ology showcasing the integration of diverse, multi-faceted
                         $ mehul.bhatt@oru.se (M. Bhatt)                                                                                AI methods pertaining Knowledge Representation and Rea-
                          https://mehulbhatt.org (M. Bhatt)
                                   © 2024 CoDesign Lab EU. Use permitted under Creative Commons License Attribution 4.0 International   soning, Computer Vision, Machine Learning, and Visual
                                   (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Perception towards realising practical, human-centred, com-       spatial systems [10] where integrated reasoning about ac-
putational visual intelligence.                                   tion and change [11, 12] is involved:

                                                                         • interpolation and projection of missing informa-
2. Neurosymbolic Visual                                                    tion, e.g., what could be hypothesised about missing
                                                                           information (e.g., moments of occlusion [13]); how
   Commonsense: Integrated                                                 can this hypothesis support planning an immediate
   Reasoning and Learning about                                            next step?
   Space, Motion, and Inter(A)ction                                      • object identity maintenance at a semantic level,
                                                                           e.g., in the presence of occlusions, missing and noisy
In the present status quo, our research in (computational)                 quantitative data, error in detection and tracking
neurosymbolic visual commonsense categorically addresses
                                                                         • ability to make default assumptions, e.g., pertain-
three key questions:
                                                                           ing to persistence objects and/or object attributes
    I. What kind of (relational) abstraction mechanisms
                                                                         • maintaining consistent beliefs respecting (domain-
        are needed to computationally “make-sense” of em-
                                                                           neutral) commonsense criteria, e.g., related to com-
        bodied multimodal interaction ?
                                                                           positionality & indirect effects, space-time continu-
    II. How can (and why should) abstraction mechanisms                    ity, positional changes resulting from motion
        (such as in I) be founded on behaviourally estab-
        lished cognitive human- factors emanating from nat-              • inferring / computing counterfactuals [14], in a
        uralistic empirical observation in real-world applied              manner akin to human cognitive ability to perform
        contexts?                                                          mental simulation for purposes of introspection
   III. How to articulate behaviourally established abstrac-               about the past or anticipation of the future, or per-
        tion mechanisms, preferences (etc) as formal declar-               forming “what-if” reasoning tasks etc
        ative models suited for computational modelling
                                                                  We particularly emphasise the abilities to abstract, learn,
        aimed at operational“sensemaking” (encompassing
                                                                  and reason with cognitively rooted structured characterisa-
        capabilities such as abduction, relational learning,
                                                                  tions of commonsense knowledge about space and motion,
        counterfactual inference) ?
                                                                  encompassing visuospatial question-answering, abduction,
                                                                  and relational learning:
Present work is particularly aimed at developing general
methods for the semantic interpretation of (multimodal) dy-
                                                                  I.  Visuospatial Question-Answering. Focus is on a com-
namic visuospatial imagery with an emphasis on the ability
to neurosymbolically perform abstraction, reasoning, and          putational framework for semantic-question answering
learning with cognitively rooted structured characterisa-         with video and eye-tracking data founded in constraint logic
tions of commonsense knowledge pertaining to space and            programming; we also demonstrate an application in cogni-
motion. Here, we specifically emphasise:                          tive film & media studies, where human perception of films
                                                                  vis-a-via cinematographic devices is of interest.
     • General foundational commonsense abstractions of                                                           » [4, 6, 7, 8]
       space, time, and motion needed for representation
       mediated (grounded) reasoning and learning with            II.  Visuospatial Abduction. Focus is on a hybrid archi-
       dynamic visuospatial stimuli (e.g., emanating from         tecture for systematically computing robust visual explana-
       multimodal human behavioural signals in modali-            tion(s) encompassing hypothesis formation, belief revision,
       ties such as RGB(D), video, audio, eye-tracking and        and default reasoning with video data (for active vision
       possibly even bio signals [9]);                            for autonomous driving, as well as for offline processing).
     • Deep (visuospatial) semantics, entailing systemat-         The architecture supports visual abduction with space-time
       ically formalised declarative (neurosymbolic) rea-         histories as native entities, and founded in (functional) an-
       soning and learning with aspects pertaining to             swer set programming based spatial reasoning.
       space, space-time, motion, actions & events, spatio-                                               » [3, 13, 15][16, 17]
       linguistic conceptual knowledge. Here, it is of the
       essence that an expressive ontology consisting of,         III.  Relational Visuospatial Learning. Focus is on a gen-
       for instance, space, time, space-time motion primi-        eral framework and pipeline for: relational spatio-temporal
       tives as first-class ‘neurosymbolic’ objects is accessi-   (inductive) learning with an elaborate ontology supporting
       ble within the (declarative) programming paradigm          a range of space-time features; and generating semantic,
       under consideration; and                                   (declaratively) explainable interpretation models in a neu-
                                                                  rosymbolic pipeline demonstrated for the case of analysing
     • Explainable models of computational visuospatial           visuospatial symmetry in visual art.
       commonsense based on a systematic integration of                                                         » [18][5][19]
       symbolic/relational methods on the one hand, and
       neural techniques aimed at low level quantitative
                                                                  Formal semantics and computational models of deep seman-
       (e.g., visual) data processing on the other;
                                                                  tics manifest themselves as neurosymbolic spatio-temporal
At a higher level of abstraction, deep (visualspatial) se-        extensions of established declarative AI frameworks such
mantics (or deep semantics for short) entails inherent sup-       as Constraint Logic Programming (CLP) [20], Inductive
port for tackling a range of challenges concerning epistemo-      Logic Programming (ILP) [21], and Answer Set Program-
logical and phenomenological aspects relevant to dynamic          ming (ASP) [22]. The more foundational aspects pertaining
declarative spatial reasoning (built on top of CLP, ILP, ASP)       neural learning techniques, or otherwise.
independent of its relationship to cognitive vision research
                                                                    In this invited position statement, we have attempted to
may be consulted in [23], [16, 24], [18].
                                                                    summarise our mindset and ongoing work in the CoDesign
                                                                    Lab towards:

3. Discussion                                                              » Establishing a human-centric foundation
                                                                           and roadmap for the development of neu-
The vision that drives our scientific methodology is:                      rosymbolically grounded inference about
                                                                           embodied multimodal interaction as iden-
        » To shape the nature and character of                             tifiable in a range of real-world application
        (machine-based) artificial visual intelligence                     contexts.
        with respect to human-centred cognitive
        considerations, demonstrating an exemplar                   This summary is not meant to be a comprehensive literature
        for developing, applying, and disseminating                 review; this may be obtained through the cited works. For
        such methods in socio-technologically rele-                 key technical details and to obtain a summary of open di-
        vant application areas where:                               rections, we direct interested readers to select publications
                                                                    as follows: a compact starting point may be obtained via
        (a) embodied (multimodal) human interac-                    the comprehensive summary in [1], or through the shorter/-
        tion is inherent;                                           focussed components in [15, 5, 4, 13, 3]. Longer summaries
                                                                    in the form of (recent) doctoral dissertations are available
        (b) human-in-the-loop collaborative work is
                                                                    in [29] and [30, 31].
        of the essence; and
        (c) normative ethico-legal compliance based
        on regulatory requirement and human-                        Acknowledgments
        factors driven inclusive or universal design
        criteria is to be ensured.
                                                                    We acknowledge funding by the Swedish Research Council
Towards realising this vision, we adopt an interdisciplinary        (VR - Vetenskapsrådet) - https://www.vr.se, and the Swedish
approach –at the confluence of Cognition, AI, Interaction,          Foundation for Strategic Research (SSF – Stiftelsens för
and Design– which we deem necessary to better appreci-              Strategisk Forskning) - https://strategiska.se. Previously,
ate the complexity and spectrum of varied human-centred             this research has been supported by the German Research
challenges for the design and (usable) implementation of            Foundation (DFG – Deutsche Forschungsgemeinschaft) -
(explainable) artificial visual intelligence solutions in diverse   https://www.dfg.de.
human-system interaction contexts.
One of the key technical driving forces in our work is that of
“representation mediated multimodal sensemaking”.
                                                                    References
In essence, we consider (neurosymbolic) representation me-           [1] M. Bhatt, J. Suchan, Artificial visual intelligence:
diated grounding as being significant in semiotic construc-              Perceptual commonsense for human-centred cogni-
tion, e.g., enabling high-level meaning-making. This view                tive technologies, in: Human-Centered Artificial
stems from the long-established value of “grounding” in                  Intelligence: Advanced Lectures, Springer-Verlag,
Artificial Intelligence and related disciplines [25]. Our re-            Berlin, Heidelberg, 2023, p. 216–242. URL: https://
search advances the theoretical, methodological, and ap-                 doi.org/10.1007/978-3-031-24349-3_12. doi:10.1007/
plied understanding of “grounded representation” mediated                978-3-031-24349-3_12.
multimodal sensemaking of embodied human interaction
at the interface of spatial language, spatial logic, and spatial     [2] S. Jones, J. Laird, Anticipatory thinking in cogni-
cognition. In our view, the significance of this form of (neu-           tive architectures with event cognition mechanisms,
rosymbolic) grounding must now be reiterated, re-asserted                in: A. Amos-Binks, D. Dannenhauer, R. E. Cardona-
even, in view of recent advances in neural machine learning              Rivera, G. A. Brewer (Eds.), Short Paper Proc. of Work-
and the well-recognised “explainability” and “interpretabil-             shop on Cognitive Systems for Anticipatory Thinking
ity” requirements from the viewpoint of human-centred AI                 (COGSAT 2019), AAAI Fall Symp., volume 2558, 2019.
[26, 27, 28]. We believe that research in knowledge represen-            URL: http://ceur-ws.org/Vol-2558/short1.pdf.
tation and reasoning (KR) has, since its inception, concerned
                                                                     [3] J. Suchan, M. Bhatt, S. Varadarajan, Common-
itself with the “hard” problem of semantics, emphasising
                                                                         sense visual sensemaking for autonomous driv-
explainability, formal verification and diagnosis, elabora-
                                                                         ing - on generalised neurosymbolic online abduc-
tion tolerance amongst other things. Research in KR, and
                                                                         tion integrating vision and semantics, Artif. In-
more broadly in symbolic AI and semantics, and their role
                                                                         tell. 299 (2021) 103522. URL: https://doi.org/10.1016/
and contribution towards large-scale hybrid “human-in-the-
                                                                         j.artint.2021.103522. doi:10.1016/j.artint.2021.
loop” intelligence is of even greater significance now than
                                                                         103522.
ever before given the tremendous synergistic opportunities
afforded by the widely demonstrated power of deep learning           [4] J. Suchan, M. Bhatt, Semantic question-answering
driven techniques in computer vision (and beyond). The                   with video and eye-tracking data: AI foundations for
onus now, we posit, is on KR research to drive itself towards            human visual perception driven cognitive film stud-
developing methods that can seamlessly integrate (and be                 ies, in: S. Kambhampati (Ed.), Proceedings of the
“usable”) with other kinds of AI methods, be data-centric                Twenty-Fifth International Joint Conference on Ar-
     tificial Intelligence, IJCAI 2016, New York, NY, USA,      [15] J. Suchan, M. Bhatt, P. A. Walega, C. P. L. Schultz, Vi-
     9-15 July 2016, IJCAI/AAAI Press, 2016, pp. 2633–2639.          sual explanation by high-level abduction: On answer-
     URL: http://www.ijcai.org/Abstract/16/374.                      set programming driven reasoning about moving ob-
                                                                     jects, in: 32nd AAAI Conference on Artificial Intel-
 [5] J. Suchan, M. Bhatt, S. Vardarajan, S. A. Amirshahi,
                                                                     ligence (AAAI-18), USA, AAAI Press, 2018, pp. 1965–
     S. Yu, Semantic Analysis of (Reflectional) Visual Sym-
                                                                     1972.
     metry: A Human-Centred Computational Model for
     Declarative Explainability, Advances in Cognitive          [16] P. A. Walega, M. Bhatt, C. P. L. Schultz, ASPMT(QS):
     Systems 6 (2018) 65–84. URL: http://www.cogsys.org/             non-monotonic spatial reasoning with answer set pro-
     journal.                                                        gramming modulo theories, in: F. Calimeri, G. Ianni,
                                                                     M. Truszczynski (Eds.), Logic Programming and Non-
 [6] J. Suchan, M. Bhatt, The geometry of a scene: On deep
                                                                     monotonic Reasoning - 13th International Conference,
     semantics for visual perception driven cognitive film,
                                                                     LPNMR 2015, Lexington, KY, USA, September 27-30,
     studies, in: 2016 IEEE Winter Conference on Applica-
                                                                     2015. Proceedings, volume 9345 of Lecture Notes in
     tions of Computer Vision, WACV 2016, Lake Placid,
                                                                     Computer Science, Springer, 2015, pp. 488–501. URL:
     NY, USA, March 7-10, 2016, IEEE Computer Soci-
                                                                     https://doi.org/10.1007/978-3-319-23264-5_41. doi:10.
     ety, 2016, pp. 1–9. URL: https://doi.org/10.1109/WACV.
                                                                     1007/978-3-319-23264-5\_41.
     2016.7477712. doi:10.1109/WACV.2016.7477712.
                                                                [17] P. A. Walega, C. P. L. Schultz, M. Bhatt, Non-
 [7] J. Suchan, M. Bhatt, Deep Semantic Abstractions of
                                                                     monotonic spatial reasoning with answer
     Everyday Human Activities: On Commonsense Rep-
                                                                     set programming modulo theories,           Theory
     resentations of Human Interactions, in: ROBOT 2017:
                                                                     Pract. Log. Program. 17 (2017) 205–225. URL:
     Third Iberian Robotics Conference, Advances in Intel-
                                                                     https://doi.org/10.1017/S1471068416000193.
     ligent Systems and Computing 693, 2017.
                                                                     doi:10.1017/S1471068416000193.
 [8] M. Spranger, J. Suchan, M. Bhatt, Robust Natural Lan-
                                                                [18] J. Suchan, M. Bhatt, C. P. L. Schultz, Deeply semantic
     guage Processing - Combining Reasoning, Cognitive
                                                                     inductive spatio-temporal learning, in: J. Cussens,
     Semantics and Construction Grammar for Spatial Lan-
                                                                     A. Russo (Eds.), Proceedings of the 26th Interna-
     guage, in: IJCAI 2016: 25th International Joint Confer-
                                                                     tional Conference on Inductive Logic Programming
     ence on Artificial Intelligence, AAAI Press, 2016.
                                                                     (Short papers), London, UK, 2016, volume 1865, CEUR-
 [9] M. Bhatt, K. Kersting, Semantic interpretation of multi-        WS.org, 2016, pp. 73–80.
     modal human-behaviour data - making sense of events,
                                                                [19] K. S. R. Dubba, A. G. Cohn, D. C. Hogg, M. Bhatt,
     activities, processes, Künstliche Intell. 31 (2017) 317–
                                                                     F. Dylla, Learning Relational Event Models from
     320. URL: https://doi.org/10.1007/s13218-017-0511-y.
                                                                     Video, J. Artif. Intell. Res. (JAIR) 53 (2015) 41–
     doi:10.1007/S13218-017-0511-Y.
                                                                     90. URL: http://dx.doi.org/10.1613/jair.4395. doi:10.
[10] M. Bhatt, S. W. Loke, Modelling dynamic spatial                 1613/jair.4395.
     systems in the situation calculus, Spatial Cogni-
                                                                [20] J. Jaffar, M. J. Maher, Constraint logic programming:
     tion & Computation 8 (2008) 86–130. URL: https:
                                                                     A survey, The journal of logic programming 19 (1994)
     //doi.org/10.1080/13875860801926884. doi:10.1080/
                                                                     503–581.
     13875860801926884.
                                                                [21] S. Muggleton, L. D. Raedt, Inductive logic program-
[11] M. Bhatt, H. W. Guesgen, S. Wölfl, S. M. Hazarika,
                                                                     ming: Theory and methods, Journal of Logic Program-
     Qualitative spatial and temporal reasoning: Emerging
                                                                     ming 19 (1994) 629–679.
     applications, trends, and directions, Spatial Cognition
     & Computation 11 (2011) 1–14. URL: https://doi.org/10.     [22] G. Brewka, T. Eiter, M. Truszczyński, Answer set
     1080/13875868.2010.548568. doi:10.1080/13875868.                programming at a glance, Commun. ACM 54 (2011)
     2010.548568.                                                    92–103. doi:10.1145/2043174.2043195.
[12] M. Bhatt, Reasoning about space, actions and change:       [23] M. Bhatt, J. H. Lee, C. P. L. Schultz, CLP(QS): A declar-
     A paradigm for applications of spatial reasoning, in:           ative spatial reasoning framework, in: M. J. Egenhofer,
     Qualitative Spatial Representation and Reasoning:               N. A. Giudice, R. Moratz, M. F. Worboys (Eds.), Spa-
     Trends and Future Directions, IGI Global, USA, 2012.            tial Information Theory - 10th International Confer-
                                                                     ence, COSIT 2011, Belfast, ME, USA, September 12-16,
[13] J. Suchan, M. Bhatt, S. Varadarajan, Out of sight but
                                                                     2011. Proceedings, volume 6899 of Lecture Notes in
     not out of mind: An answer set programming based
                                                                     Computer Science, Springer, 2011, pp. 210–230. URL:
     online abduction framework for visual sensemaking in
                                                                     https://doi.org/10.1007/978-3-642-23196-4_12. doi:10.
     autonomous driving, in: S. Kraus (Ed.), Proc. of 25th
                                                                     1007/978-3-642-23196-4_12.
     Intnl. Joint Conference on Artificial Intelligence, IJ-
     CAI 2019, 2019, pp. 1879–1885. doi:10.24963/ijcai.         [24] C. P. L. Schultz, M. Bhatt, J. Suchan, P. A. Walega,
     2019/260.                                                       Answer Set Programming Modulo Space-Time, in:
                                                                     C. Benzmüller, F. Ricca, X. Parent, D. Roman (Eds.),
[14] R. Byrne, Counterfactual thought, Annual Re-
                                                                     Rules and Reasoning - Second International Joint Con-
     view of Psychology 67 (2016) 135–157. URL: https:
                                                                     ference, RuleML+RR 2018, Luxembourg, September 18-
     //doi.org/10.1146/annurev-psych-122414-033249.
                                                                     21, 2018, Proceedings, volume 11092 of Lecture Notes
     doi:10.1146/annurev-psych-122414-033249.
                                                                     in Computer Science, Springer, 2018, pp. 318–326. URL:
     pMID: 26393873.
                                                                     https://doi.org/10.1007/978-3-319-99906-7_24. doi:10.
      1007/978-3-319-99906-7\_24.                                         (Eds.), 2013 Workshop on Computational Models of
                                                                          Narrative, CMN 2013, August 4-6, 2013, Hamburg,
[25] S. Harnad, The symbol grounding problem, Physica
                                                                          Germany, volume 32 of OASIcs, Schloss Dagstuhl -
     D 42 (1990) 335–346.
                                                                          Leibniz-Zentrum für Informatik, 2013, pp. 24–29. URL:
[26] AI HLEG, High-level expert group on artificial intel-                https://doi.org/10.4230/OASIcs.CMN.2013.24. doi:10.
     ligence: Ethical guidelines for trustworthy ai, 2019.                4230/OASICS.CMN.2013.24.
     URL: https://www.aepd.es/sites/default/files/2019-12/
                                                                     [36] C. Lewis, Representation, Inclusion, and Innova-
     ai-ethics-guidelines.pdf.
                                                                          tion: Multidisciplinary Explorations, Synthesis Lec-
[27] EU Commission, Communication: Building trust in                      tures on Human-Centered Informatics, Morgan &
     human centric artificial intelligence, 2019.                         Claypool Publishers, 2017. URL: https://doi.org/10.
                                                                          2200/S00812ED1V01Y201710HCI038. doi:10.2200/
[28] EU Commission, Proposal for a regulation of the euro-                S00812ED1V01Y201710HCI038.
     pean parliament and of the council laying down har-
     monised rules on artificial intelligence (artificial intelli-
     gence act) and amending certain union legislative acts,
     2021. URL: https://eur-lex.europa.eu/legal-content/
     EN/TXT/?uri=CELEX:52021PC0206.
[29] J. Suchan, Declarative Reasoning about Space and Mo-
     tion in Visual Imagery - Theoretical Foundations and
     Applications, Ph.D. thesis, Universität Bremen, 2022.
     URL: https://elib.dlr.de/188919/.
[30] V. Kondyli, Behavioural Principles for the Design of
     Human-Centred Cognitive Technologies : The Case
     of Visuo-Locomotive Experience, Ph.D. thesis, Örebro
     University, School of Science and Technology, 2023.
[31] V. Nair, The Observer Lens: Characterizing Visuospa-
     tial Features in Multimodal Interactions, Ph.D. thesis, ,
     School of Informatics, Informatics Research Environ-
     ment, 2024.
[32] M. Bhatt, J. Suchan, Cognitive vision and percep-
     tion, in: G. D. Giacomo, A. Catalá, B. Dilkina, M. Mi-
     lano, S. Barro, A. Bugarín, J. Lang (Eds.), ECAI 2020 -
     24th European Conference on Artificial Intelligence,
     29 August-8 September 2020, Santiago de Compostela,
     Spain, August 29 - September 8, 2020 - Including 10th
     Conference on Prestigious Applications of Artificial
     Intelligence (PAIS 2020), volume 325 of Frontiers in Ar-
     tificial Intelligence and Applications, IOS Press, 2020, pp.
     2881–2882. URL: https://doi.org/10.3233/FAIA200434.
     doi:10.3233/FAIA200434.
[33] V. Kondyli, M. Bhatt, D. Levin, J. Suchan, How do
     drivers mitigate the effects of naturalistic visual com-
     plexity? on attentional strategies and their impli-
     cations under a change blindness protocol, Cogni-
     tive Research: Principles and Implications 8 (2023).
     doi:10.1186/s41235-023-00501-1.
[34] K. S. R. Dubba, M. Bhatt, F. Dylla, D. C. Hogg, A. G.
     Cohn, Interleaved inductive-abductive reasoning for
     learning complex event models, in: S. H. Muggle-
     ton, A. Tamaddoni-Nezhad, F. A. Lisi (Eds.), Inductive
     Logic Programming - 21st International Conference,
     ILP 2011, Windsor Great Park, UK, July 31 - August 3,
     2011, Revised Selected Papers, volume 7207 of Lecture
     Notes in Computer Science, Springer, 2011, pp. 113–
     129. URL: https://doi.org/10.1007/978-3-642-31951-8_
     14. doi:10.1007/978-3-642-31951-8\_14.
[35] M. Bhatt, J. Suchan, C. Schultz, Cognitive interpre-
     tation of everyday activities - toward perceptual nar-
     rative based visuo-spatial scene interpretation, in:
     M. A. Finlayson, B. Fisseni, B. Löwe, J. C. Meister