<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Semantic Representation of Pedestrian Crossing Behavior</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>He Tan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Westphal</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computing, School of Engineering, Jönköping University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we focus on the crucial task of understanding and modeling pedestrian behavior, which is essential for numerous applications. We introduce a semantic representation of pedestrian crossing behavior. The representation is to capture the sub-events within a behavior and the spatial-temporal evolution of interactions between pedestrians and other objects involved in crossing events. We demonstrate its practical application by utilizing it to analyze pedestrian crossing behavior from road user movement data (i.e. trajectories). By constructing a knowledge graph from detailed road user dynamics data using this representation, we enable queries that address safety concerns related to pedestrian crossing behavior, aiding trafic engineers in their work on urban trafic infrastructure design.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology</kwd>
        <kwd>Knowledge Graph Construction from trajectory data</kwd>
        <kwd>Visual Question Answering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Vulnerable road users, such as children, the elderly, and disabled individuals, are integral to
the dynamics of city trafic. They play essential roles in establishing a sustainable, active, and
inclusive mobility environment. Therefore, understanding and modeling pedestrian behavior is
fundamental for many applications, including trafic flow analysis, trafic safety improvement,
urban planning, and intelligent driving systems. Pedestrian crossing behavior is one of the
main aspects of pedestrian behavior that has been studied in numerous research studies [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It
involves the actions and movements of pedestrians while crossing streets or roadways, often
guided by trafic signals, road markings, and trafic conditions.
      </p>
      <p>
        Traditionally, stochastic, linear regression, and discrete choice models are used to build an
understanding of how pedestrians make crossing decisions considering various factors related
to people, roadway, trafic, trafic controls and trafic rules [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Parameters of the models
are estimated from survey and/or questionnaire data or manually screened video recordings.
More recently, agent-based modeling has been used to model road users as intelligent agents
attempting to make rational decisions in uncertain and complex situations. However, most work
has focused on modeling vehicle behaviors. Very limited studies dedicated to develop models
for other road users such as pedestrians [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Often, these studies focus on pedestrian-vehicle
conflicts and model pedestrians’ collision avoidance mechanism [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The parameters of the
models are typically calibrated by detecting and tracking road users from video data or using
results from literature.
      </p>
      <p>
        At the same time, researchers across diverse disciplines, such as computer vision, artificial
intelligence (AI), cognitive science, and neuroscience, have conducted numerous studies on
understanding human activities. Depending on their complexity, human activities can be
classified into diferent levels: gestures, actions, interactions, and group activities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This paper
specifically focuses on understanding pedestrian crossing behavior on the level of human-object
interactions. Human activity is a spatial-temporal evolution of interactions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Here, we
present a semantic representation of pedestrian crossing behavior, capturing sub-events of a
behavior and their temporal and spatial structures. Previous studies within computer vision
have suggested that a structured spatial-temporal representation can lead to more accurate
activity understanding and improve the performance of various computer vision tasks, including
image captioning and visual question answering [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Studies (e.g., [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ]) in the AI area indicate
that such a representation can cope with less training data by incorporating prior knowledge
and help to understand human activities.
      </p>
      <p>In this paper we present the structured spatial-temporal representation of pedestrian crossing
behavior, and describe its application to gain an understanding of pedestrian crossing behavior
from recorded road user dynamics data. Utilizing the representation, a knowledge graph is
constructed from road user dynamics data. The queries over the knowledge graph can answer
safety related questions on pedestrian crossing behavior for trafic engineers and help with
their work on urban trafic infrastructure design.</p>
      <p>The remainder of this paper is organized as follows: In Section 2, we introduce the methods
that have been employed to semantically represent human activities, particularly pedestrian
behaviors. In Section 3, we present our approach to semantically representing crossing behaviors.
Section 4 outlines the utilization of semantic representation within the context of trafic data
analysis for trafic engineers. Finally, the paper concludes the work in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Pedestrian behavior has been widely analyzed in various research works using a plethora
of methods. Nevertheless, understanding pedestrian behavior remains challenging due to
the inherent complexity of human activities. Despite the diverse analysis methods used, the
significance of semantic representation in understanding pedestrian behavior has often been
overlooked. Only a limited number of studies have explored the semantic representation for
pedestrian behavior.</p>
      <p>
        Chai et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] utilized fuzzy logic to model the cognitions and behavioral patterns of
pedestrians, in order to understand the efect of age and gender when pedestrians are crossing
a signalized crosswalk and jaywalking. Gharebaghi et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] developed a mobility ontology
for people with motor disabilities (PWMD). Specifically, it considers the interactions between
people and both the social and physical environment. The ontology was used to support the
development of assistive technologies for the mobility of PWMD. Fang et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] developed
an ontology defining various kinds of road users, including pedestrians, and describing their
relationships. The concepts from the ontology are used to define the rules for describing the
interactions between road users and to support rule-based reasoning for predicting road users’
behavior.
      </p>
      <p>In this paper, we present a semantic representation of pedestrian crossing behavior. The
representation describes the dynamic evolution of interactions between pedestrians and objects
within the physical environment over time, capturing interactions in both spatial and temporal
dimensions.</p>
      <p>In 1970, Hägerstrand [13] introduced the concept of a time-space path in understanding
human activities. This theory has laid the groundwork for trajectories that have been shown to
be useful in representing people’s movements. Inspired by Hägerstrand’s work, Orellana and
Renso [14] developed an interaction ontology. The ontology conceptualizes the characteristics
of pedestrian movement behaviour. It has focused on identifying various movement patterns
from time-space paths, and the diferent categories of interactions, spatial and temporal contexts,
behavior, and the high-level relations between these concepts. Logic-based reasoning is used to
categorize pedestrian movement behavior based on its movement patterns, interactions, and
contexts.</p>
      <p>
        Meanwhile, in cognitive science and neuroscience, it has been recognized that segmentation
is a fundamental component of perception, playing a critical role in understanding activities.
People tend to perceive ongoing continuous activity as series of discrete events (or called
segments) [15, 16, 17]. The relationships between segments are encoded in partonomic
hierarchies [18]. Coarse segmentation is often related to objects’ locations and their goals, and
the causal relations between their actions. Fine segmentation is closely linked to changes
in the interactions between objects [19]. Building on these findings in cognitive science and
neuroscience, Ji et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed a spatial-temporal scene graph to represent human activity
and to improve the performance of action recognition and few-shot action recognition using
neural networks. Mlodzian et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] presented an ontology that was tailored for representing
entities and their spatial and temporal relations in trafic scenes in the nuScenes dataset 1. A
knowledge graph was constructed from the nuScenes dataset using the ontology and provided
as a benchmark dataset for developing advanced trajectory prediction models.
      </p>
      <p>In this paper, drawing from these insights in cognitive science, neuroscience and computer
vision, we propose a structured spatial-temporal representation for pedestrian crossing behavior
and present its application to gain an understanding of pedestrian crossing behavior from road
user movement data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Semantic Representation</title>
      <p>In this section we present the semantic representation for pedestrian crossing behavior. A
pedestrian crossing behavior can be seen as a dynamic evolution of interactions between
pedestrians and objects within the physical environment over time. Every crossing behavior
can be broken down into segments, each representing a distinct phase of the behavior. These
segments capture the changes of the interactions between pedestrians and objects in both
physical and temporal dimensions, and together represent the pedestrian crossing behavior. For
example, Fig. 1 shows a crossing event, which is extracted from road user behavior measurement
performed at a zebra-free crossing in Lindholmen, Gothenburg in Sweden. Fig 1-a displays
the trajectories of the pedestrian and other moving objects involved in the event. The red
trajectory represents a pedestrian, the blue trajectory represents a cyclist, and the cyan trajectory
represents a light vehicle. Fig 1-b1 to b8 show a sequence of distinct segments that capture the
changes in interactions between pedestrians and objects over time during the event. These
interactions are expressed in a set of triples, as shown in Fig 2. Each triple follows the format
((id, object_1), spatial_relation, (id, object_2)), where the object 1 is a
moving object such as pedestrian, cyclist, and vehicle, and the object 2 can be a moving object
or a static object such as crossing, area and sidewalk, and id is the unique identifier for each
object.</p>
      <p>Fig 3 illustrates the current version of the ontology designed to represent the spatial-temporal
evolution of crossing behavior. This ontology is accessible on GitHub2. Since segment is often</p>
      <sec id="sec-3-1">
        <title>2https://github.com/tanhe-git/crossing_behavior/blob/main/trafic_scene_ontology.owl</title>
        <p>related to regions in an image in computer vision, the term frame is used instead. In computer
vision, a video can be divided into a sequence of frames. Each frame represents a single still
image in the video sequence. The blue arrow represents subclass relations between concepts.
Currently, the ontology includes only a limited number of categories for both moving and static
objects. However, additional categories will be integrated as the ontology continues to undergo
further development.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Usage of the Representation</title>
      <p>In this section, we describe an application of the semantic representation of pedestrian crossing
behavior mentioned beforehand in Section 3. The application aims to provide information
support for trafic engineers during trafic infrastructure planning and development, with a
particular focus on pedestrian safety. In the application, pedestrian crossing behaviors are
described using the semantic representation, and a knowledge graph is constructed for these
behaviors. Subsequently, a number of queries serve as question-answering tools to provide
information for trafic engineers.</p>
      <sec id="sec-4-1">
        <title>Crossing Behavior Dataset</title>
        <p>The crossing behavior dataset is prepared from the trafic measurement aforementioned in
Section 3. Fig 4 shows an example frame extracted from the dataset. The road user positions
and trajectories are displayed in a camera view, overlaid on the anonymized video frame. The
measurement is performed by Viscando AB3 using the 3D&amp;AI based infrastructure sensor
OTUS3D. The total period of the measurment is 11 hours and 5 minutes. The data contains
trajectories of all road users recorded 10 times per second. Trajectories contain the unique track
ID for each object, the UTC time stamp, position (i.e. X-coordinate and Y-coordinate), velocity
(i.e. object speed in the direction of motion (km/h)) and object type. Currently, the object types
include pedestrian, cyclist, light vehicle and heavy vehicle. Vision data are processed in the
embedded computational unit and removed within 20 ms from being captured. Thus, the dataset
is stored fully anonymously, ensuring compliance with the General Data Protection Regulation
(GDPR) of the European Union4, because personal information is neither stored in the sensors
nor transmitted.</p>
        <sec id="sec-4-1-1">
          <title>3www.viscando.com 4https://gdpr-info.eu/</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Knowledge Graph Construction</title>
        <p>In this section we describe the construction of the knowledge graph that describes the pedestrian
crossing behaviors recorded in the aforementioned dataset. Since the application is to support the
trafic infrastructure planning and development prioritizing pedestrian safety, the construction
has focused on the crossing events involving pedestrians/cyclists but also vehicle(s). The
spatial relationship between objects was calculated based on the physical distance between
them. The current spatial relationships include the ones between moving objects, i.e. close_to
and far_away, and the ones between a moving object and a static object, i.e., left_close_to,
right_close_to, left_far_away, right_close_to, on, out_of_area. If the x-coordinate of one object is
smaller than that of another, the former is positioned to the left of the latter; otherwise, it is
positioned to the right.</p>
        <p>When the information was extracted from the aforementioned dataset, the ontology described
in Section 3 was populated, and the knowledge graph is set up. Fig 5 shows the fragment of the
knowledge graph that represents the pedestrian crossing behavior presented in Section 3.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Question Answering</title>
        <p>In this section, we present the SPARQL queries to retrieve answers from the knowledge graph
or to inquire the information from it to formulate responses to a few example questions that
trafic engineers might pose.</p>
        <p>First, two prefixes are predefined for the following SPARQL queries, i.e., tsdata (http://www.
example.com/ontology/trafic_scene_kg# and ts (http://www.example.com/ontology/trafic_
scene_ontology.owl#).</p>
        <p>Example 1: describe a crossing behavior. The query will return an RDF dataset describing a
specific crossing behavior. Fig 5 shows a visualization of such an RDF dataset. It was generated
by using the Stardog Studio visualization tool5. Such an RDF dataset can also be converted into
text, allowing trafic engineers to easily access and understand the information [20].</p>
        <p>Example 2: find and describe the crossing behaviors within a specified time period. The query
will return an RDF dataset containing the crossing behaviors within the specified time period.
For each behavior, using the query given in Example 1, it can be described in text, allowing
trafic engineers to access and understand the information.</p>
        <p>SELECT DISTINCT ?b
WHERE {
?b a ts:Behavior .
?b ts:hasFrame ?f.
?f ts:absoluteTime ?t.</p>
        <p>FILTER (?t &gt;= "2019-05-17 08:00:00"^^xsd:dateTime</p>
        <p>&amp;&amp; ?t &lt;= "2019-05-17 08:20:00"^^xsd:dateTime)
}</p>
        <p>Example 3: find the crossing events where pedestrians/cyclists are close to vehicles and return
the frames when this happens.</p>
        <p>SELECT DISTINCT ?b ?f
WHERE {
?b rdf:type ts:Behavior .
?b ts:hasFrame ?f .
?f ts:containsInteraction ?i .
?i ts:hasSpatialRelationship ts:close_to .
{ ?i ts:hasObject1 ?obj1 .</p>
        <p>{?obj1 rdf:type ts:Pedestrian}</p>
        <p>UNION {?obj1 rdf:type ts:Bicyclist}.
?i ts:hasObject2 ?obj2 .</p>
        <p>{?obj2 rdf:type ts:HeavyVehicle}</p>
        <p>UNION {?obj2 rdf:type ts:LightVehicle}}
UNION
{?i ts:hasObject2 ?obj2 .</p>
        <p>{?obj2 rdf:type ts:Pedestrian}</p>
        <p>UNION {?obj2 rdf:type ts:Bicyclist}.
?i ts:hasObject1 ?obj1 .</p>
        <p>{?obj1 rdf:type ts:HeavyVehicle}</p>
        <p>UNION {?obj1 rdf:type ts:LightVehicle}}.
}
ORDER BY ?b</p>
        <p>Example 4: find the crossing events where pedestrians/cyclists are close to vehicles and their
speed is too fast. Such behaviors are considered unsafe. The query is an extension of the one
given in Example 3, with the addition of the following triple patterns and filter.
?i ts:hasObject1Info ?obj1info .
?i ts:hasObject2Info ?obj2info.
?obj1info ts:speed ?s1.
?obj2info ts:speed ?s2
FILTER (?s1 &gt;= highest_safe_speed || ?s2 &gt;= highest_safe_speed )
Example 5: find the crossing behaviors where pedestrians take a shortcut to the crossing,
specifically by crossing diagonally across the street. Such a behavior is considered unsafe.
This query is separated into two steps. The first step is retrieving the crossing events and the
frames where pedestrians are involved. In the second step, the y-coordinates of the pedestrians
during the crossing are retrieved. If the changes of the y-coordinates exceed a certain threshold,
the pedestrians are considered as being taking a shortcut to the crossing. As an example, the
following query shows how to retrieve the y-coordinates of the pedestrian involved in the
crossing event presented in Section 3.</p>
        <p>SELECT DISTINCT ?f ?y
WHERE {
tsdata:behavior_103 ts:hasFrame ?f.
?f ts:containsInteraction ?i .
?i ts:hasObject1 ?obj1.
?obj1 rdf:type ts:Pedestrian.
?i ts:hasObject1Info ?obj1info.</p>
        <p>?obj1info ts:coordinate_Y ?y
}</p>
        <p>The queries over the knowledge graph are not limited to the ones listed in this paper. More
complex queries can be constructed when trafic engineers require more intricate information.
For instance, cyclists swinging out at a crossing are considered as unsafe behavior. Such behavior
can be identified by using a number of queries in a simple program.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this paper we have introduced a structured spatial-temporal representation of pedestrian
crossing behavior and demonstrated its application in understanding such behavior from
recorded road user dynamics data. By leveraging this representation, we construct a knowledge
graph from the road user dynamics data. Queries made over this knowledge graph can address
safety-related inquiries regarding pedestrian crossing behavior for trafic engineers, supporting
them in urban trafic infrastructure design work.</p>
      <p>In future work, we aim to enhance the ontology by incorporating more granular categories of
road users and other spatial relations between objects. Additionally, we plan to develop a tool that
enables trafic engineers to pose text-based questions and receive text-based answers, thereby
enhancing their workflow support. This way of interacting with the road user dynamics data
could be implemented with the help of large language models (LLMs) and retrieval-augmented
generation (RAG) [21]. In such a system, the user’s question would be translated into a query
against the knowledge graph and the returned information would be transformed into natural
language text by the LLM.</p>
      <p>
        Apart from querying the constructed knowledge graph to gain insights into the behavior of
diferent trafic participants, the proposed semantic representation could also serve as base for
trajectory prediction approaches. With increased interest in the development of self-driving
cars, predicting the behavior of other trafic participants has come more into focus [ 22]. For
this task, it is important to understand the spatial relationships between diferent actors. Hence,
diferent approaches have been investigated to integrate these relationships into trajectory
prediction, including simple graph structures [23], heterogeneous graphs [24], and knowledge
graphs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. While clearly belonging to the latter category, our representation focuses particularly
on static objects, such as road infrastructure elements, to capture their impact on trajectories of
trafic participants. Presumably, this will not only improve trajectory predictions, but also help
trafic engineers to understand the impact diferent road infrastructure elements will have on
trafic. Therefore, another direction of the future work is to investigate the incorporation of the
constructed knowledge graphs with graph neural networks for trajectory prediction.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been conducted in the project "Data and AI for decision Making suppOrt in
trafic iNfrastructure Development (DAIMOND)" , which is funded by Vinnova (the Sweden’s
innovation agency) and AI Sweden (the Swedish national center for applied AI). The authors
would like to thank the trafic department in Jönköping municipality for providing trafic safety
related use cases and Viscando AB for providing trafic measurement dataset and expertise in
trafic measurements and analysis.
prediction of road users, in: 2019 IEEE Intelligent Transportation Systems Conference
(ITSC), IEEE, 2019, pp. 2068–2073.
[13] T. Hägerstrand, What about people in Regional Science?, Papers of the Regional
Science Association 24 (1970) 6–21. URL: http://dx.doi.org/10.1007/bf01936872. doi:10.1007/
bf01936872.
[14] D. Orellana, C. Renso, Developing an interactions ontology for characterising
pedestrian movement behaviour, in: Movement-aware applications for sustainable mobility:
Technologies and approaches, IGI Global, 2010, pp. 62–86.
[15] D. Newtson, Attribution and the unit of perception of ongoing behavior., Journal of
personality and social psychology 28 (1973) 28.
[16] L. Spector, J. Grafman, Planning, neuropsychology, and artificial intelligence:
crossfertilization, Handbook of neuropsychology 9 (1994) 377–392.
[17] C. Baldassano, J. Chen, A. Zadbood, J. W. Pillow, U. Hasson, K. A. Norman, Discovering
event structure in continuous narrative perception and memory, Neuron 95 (2017) 709–721.
[18] J. M. Zacks, B. Tversky, G. Iyer, Perceiving, remembering, and communicating structure in
events., Journal of experimental psychology: General 130 (2001) 29.
[19] N. K. Speer, J. M. Zacks, J. R. Reynolds, Perceiving narrated events, in: Proceedings of the</p>
      <p>Annual Meeting of the Cognitive Science Society, volume 26, 2004.
[20] C. Gardent, A. Shimorina, S. Narayan, L. Perez-Beltrachini, The WebNLG challenge:
Generating text from RDF data, in: Proceedings of the 10th International Conference on
Natural Language Generation, 2017, pp. 124–133.
[21] P. Zhao, H. Zhang, Q. Yu, Z. Wang, Y. Geng, F. Fu, L. Yang, W. Zhang, B. Cui,
Retrievalaugmented generation for ai-generated content: A survey, arXiv e-prints (2024). URL:
http://arxiv.org/abs/2402.19473v1. arXiv:2402.19473v1.
[22] Y. Huang, J. Du, Z. Yang, Z. Zhou, L. Zhang, H. Chen, A Survey on Trajectory-Prediction
Methods for Autonomous Driving, IEEE Transactions on Intelligent Vehicles 7 (2022) 652–
674. URL: http://dx.doi.org/10.1109/tiv.2022.3167103. doi:10.1109/tiv.2022.3167103.
[23] J. Gao, C. Sun, H. Zhao, Y. Shen, D. Anguelov, C. Li, C. Schmid, VectorNet: Encoding
HD Maps and Agent Dynamics From Vectorized Representation, in: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[24] D. Grimm, P. Schörner, M. Dreßler, J.-M. Zöllner, Holistic Graph-based Motion Prediction,
in: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2023.
URL: http://dx.doi.org/10.1109/icra48891.2023.10161468. doi:10.1109/icra48891.2023.
10161468.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          , G. Yannis,
          <string-name>
            <given-names>J.</given-names>
            <surname>Golias</surname>
          </string-name>
          ,
          <article-title>A critical assessment of pedestrian behaviour models</article-title>
          ,
          <source>Transportation research part F: trafic psychology and behaviour 12</source>
          (
          <year>2009</year>
          )
          <fpage>242</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Papadimitriou</surname>
          </string-name>
          , G. Yannis,
          <string-name>
            <given-names>J.</given-names>
            <surname>Golias</surname>
          </string-name>
          ,
          <article-title>Theoretical framework for modeling pedestrians' crossing behavior along a trip</article-title>
          ,
          <source>Journal of transportation engineering 136</source>
          (
          <year>2010</year>
          )
          <fpage>914</fpage>
          -
          <lpage>924</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nasernejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Alsaleh</surname>
          </string-name>
          ,
          <article-title>Modeling pedestrian behavior in pedestrian-vehicle near misses: A continuous Gaussian Process Inverse Reinforcement Learning (GP-IRL) approach</article-title>
          ,
          <source>Accident Analysis &amp; Prevention</source>
          <volume>161</volume>
          (
          <year>2021</year>
          )
          <fpage>106355</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rasouli</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kotseruba</surname>
          </string-name>
          ,
          <article-title>Intend-wait-cross: Towards modeling realistic pedestrian crossing behavior</article-title>
          ,
          <source>in: 2022 IEEE Intelligent Vehicles Symposium (IV)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Ryoo</surname>
          </string-name>
          ,
          <article-title>Human activity analysis: A review, Acm Computing Surveys (Csur) 43 (</article-title>
          <year>2011</year>
          )
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kravitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kalantidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Shamma</surname>
          </string-name>
          , et al.,
          <article-title>Visual genome: Connecting language and vision using crowdsourced dense image annotations</article-title>
          ,
          <source>International journal of computer vision 123</source>
          (
          <year>2017</year>
          )
          <fpage>32</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Niebles</surname>
          </string-name>
          ,
          <article-title>Action genome: Actions as compositions of spatio-temporal scene graphs</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>10236</fpage>
          -
          <lpage>10247</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramirez-Amaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Beetz</surname>
          </string-name>
          , G. Cheng,
          <article-title>Transferring skills to humanoid robots by extracting semantic representations from observations of human activities</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>247</volume>
          (
          <year>2017</year>
          )
          <fpage>95</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mlodzian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Berkemeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Monka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Halilaj</surname>
          </string-name>
          , J. Luettin,
          <article-title>nuScenes Knowledge Graph - A Comprehensive Semantic Representation of Trafic Scenes for Trajectory Prediction</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          (ICCV) Workshops,
          <year>2023</year>
          , pp.
          <fpage>42</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. D.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Er</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T. M.</given-names>
            <surname>Gwee</surname>
          </string-name>
          ,
          <article-title>Fuzzy logic-based observation and evaluation of pedestrians' behavioral patterns by age and gender</article-title>
          ,
          <source>Transportation research part F: trafic psychology and behaviour 40</source>
          (
          <year>2016</year>
          )
          <fpage>104</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Gharebaghi</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Mostafavi</surname>
            , G. Edwards,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Fougeyrollas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gamache</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Grenier</surname>
          </string-name>
          ,
          <article-title>Integration of the social environment in a mobility ontology for people with motor disabilities</article-title>
          ,
          <source>Disability and Rehabilitation: Assistive Technology</source>
          <volume>13</volume>
          (
          <year>2018</year>
          )
          <fpage>540</fpage>
          -
          <lpage>551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khiat</surname>
          </string-name>
          ,
          <article-title>Ontology-based reasoning approach for long-term behavior</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>