<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Can I Trust My Trajectory Prediction Model?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Franz Motzkus</string-name>
          <email>franz.walter.motzkus@aumovio.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Schlauch</string-name>
          <email>christian.schlauch@aumovio.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Bernhard</string-name>
          <email>sebastian.2.bernhard@aumovio.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ute Schmid</string-name>
          <email>ute.schmid@uni-bamberg.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AUMOVIO</institution>
          ,
          <addr-line>Max-Urich-Str.3, 13355</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Karlsruhe Institute for Technology</institution>
          ,
          <addr-line>Kaiserstraße 12, 76131 Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universität Bamberg</institution>
          ,
          <addr-line>Kapuzinerstraße 16, 96047 Bamberg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Explainability methods allow for inspecting model-internal mechanisms and thus provide transparency for previously black-box AI models. With increased interpretability, an AI developer can reveal and mitigate model errors with informed model improvements. Thus, the provided model transparency can lead to safer, more reliable models and enable fairness evaluations, the foundation for trust in AI. While explainable AI (xAI) research has made significant contributions, most approaches remain confined to baseline datasets and models, limiting their applicability in real-world domains. In autonomous driving, predicting drivable trajectories is key to anticipating trafic scenarios. Current black-box systems lack transparency, which amplifies with model complexity. We use Sparse Autoencoders (SAEs) to probe the latent space of a Wayformer trajectory predictor, exposing model internals as structured, interpretable features. Our study highlights the role of concepts, potential manipulation risks, and shows how SAEs improve transparency and support trustworthy AI development.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Sparse Autoencoders</kwd>
        <kwd>Explainable Trajectory Prediction</kwd>
        <kwd>Applied xAI</kwd>
        <kwd>Explainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Explainable AI (xAI) methods aim to make black-box AI models more interpretable, establishing
transparency, unveiling spurious behavior, and enabling the targeted mitigation of erroneous behavior.
Increasing the transparency of previously black-box behavior and enabling inspection and manipulation
of internal representations provides the foundation for semantically testable and trustworthy systems.</p>
      <p>Transparency is central to trustworthy AI, as it allows stakeholders to understand reasoning processes,
audit models for biases and failure modes, and ensure compliance and accountability. Interpretable
features also help identify vulnerabilities, strengthening robustness against adversarial use, while
exposing ethically questionable correlations supports systematic fairness testing. Ultimately, transparency
fosters trust by enabling AI systems to be semantically tested, validated, and explained in human terms.</p>
      <p>
        Such trustworthiness is especially important in safety-critical applications like autonomous driving.
Given its inherent complexity, many autonomous driving systems are decomposed into modular subtasks
such as perception, trajectory prediction, and planning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The perception module models the state of
the environment, upon which the prediction module forecasts the future behavior of the surrounding
agents. These predictions then serve as input to the planning module, which aims to generate a safe
and comfortable trajectory for the autonomous vehicle.
      </p>
      <p>
        In these modular autonomous driving systems, trajectory prediction models play a central role for
maneuver planning and collision avoidance, with potentially fatal consequences in cases of failure.
Although it was shown that trajectory predictors can over-rely on specific input features, ignoring critical
contextual cues such as surrounding agents [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], most state-of-the-art models remain fundamentally
black-box systems. This crucial downside impedes the root cause discovery of failure cases, which are
currently handled by retraining the model on new data. Advances in the explainability of perception
models, such as attention map visualizations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], ofer only limited insights into trajectory predictors,
since the abstract and complex feature interactions remain dificult to interpret.
      </p>
      <p>
        In this work, we explore the latent space of the Wayformer trajectory prediction model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as a
popular representative for other models in the trajectory prediction domain. We investigate how
concepts are encoded in the Wayformer’s latent space and how they are represented in the trajectory
prediction domain. We use Sparse Autoencoders (SAEs) – an unsupervised learning technique that
aims to find disentangled, human-interpretable features, as they have recently shown to find interesting
insights in model-internal encodings [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. With feature disentanglement, we seek to obtain structured
insights into the semantic model encodings, like steering decisions or encoded scenarios. By selectively
manipulating individual features in the SAE’s latent space, we observe measurable changes in the
predicted output, demonstrating a direct causal influence and exposing a significant security risk in
safety-critical applications like autonomous driving. Finally, we explore the objective how this xAI-based
approach can contribute to the development of transparent and trustworthy autonomous systems.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <sec id="sec-2-1">
        <title>2.1. Trajectory Prediction</title>
        <p>The tested Wayformer model is a state-of-the-art marginal trajectory predictor, which simplifies the
problem of modelling the behavior of all present agents by only considering a single target agent at a
time, see Figure 1. More formally, it aims to estimate the states ,|1: = (,|1, .., ,| ) of a target
agent  at a timepoint  over a sequence of  time steps into the future, given a scene context as input.
This scene context  = {,|:1, ,|:1, , } consists of the observed states ,|:1 of the target
 and ,|:1 of its surrounding agents  ̸=  over a history length , as well as the road’s topology
 (drivable area and lane markings) and dynamic trafic guidance  (trafic lights or speed limits),
centered around the target’s current state. The evolution of the target’s future states is inherently
multi-modal, reflecting the multitude of possible maneuvers and interactions with other agents in
the scene. Wayformer approximates this evolution using a Gaussian mixture model (GMM) decoding
strategy

∑︁
=1
(,|1: |) ≈</p>
        <p>
          ,()  (,|1: |  ,|1: , Σ,|1: ),
where each mixture component or mode  represents a distinct predicted trajectory. Its architecture
consists of a transformer-based early-fusion encoder with a transformer-based decoder using
computeeficient latent query encodings. The two output heads estimate the mode probability  , and the
parameters of the Gaussian mixtures  ,|1: and Σ,|1: , respectively. The encoder-decoder
transformer architecture and GMM decoding strategy in Wayformer is commonly used in state-of-the-art
models on public benchmarks [
          <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Explainability for Trajectory Prediction</title>
        <p>
          Previous explainability approaches focus on the input attribution in trajectory predictors. Makansi et
al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] use Shapley values to show that trajectory predictors are only marginally influenced by most of
the scene context features, such as the road topology or other agents’ states, which casts doubt on the
predictors’ abilities to capture safety-critical interactions. This inspired new model designs [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10, 11</xref>
          ]
that aim to make the feature attribution or embedding spaces in prediction models more interpretable.
However, explainability methods are still rarely applied to state-of-the-art prediction models, even
though approaches in the perception of autonomous driving systems showcase their potential to capture
unusual and potentially safety-critical situations [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Sparse Autoencoders</title>
        <p>Sparse Autoencoders (SAEs) have lately come to attention by applying the known principle of putting
sparsity constraints on higher-dimensional data representations [12]. In latent representations with
polysemantic neurons, SAEs can disentangle the semantic directions in the data, unveiling the internal
concept directions in monosemantic neurons [13, 14]. In higher-dimensional space, each feature
direction can be represented as a mono-semantic neuron, while adding sparsity ensures the reduction
of the sample representation to as few neurons as possible, avoiding multi-semanticity in neurons.
While an applied sparsity constraint may lead to vanishing weights, sparsity can be enforced by a
top-k activation in the latent space instead [15]. Further improvements by initializing the decoder as
the inverse of the encoder and including the activation of dead neurons in the loss functions lead to
better-performing large-scale SAEs [16].</p>
        <p>The processing step of the SAE can be formalized as:
 =  ()
^ =   ()
^ = ^
(1)
with  ∈ ×  and  ∈ × . The training loss is defined as the weighted sum of the
reconstruction loss and the sparsity loss  =‖  − ^ ‖22 + ‖  ‖1.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Hyperparameters and Training</title>
      <p>
        We train a Nano version of the Wayformer model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] with half the number of layers and hidden size, as
the full-sized model has been shown to overfit on in-distribution test sets [ 17]. We train the
WayformerNano jointly on the public NuScenes [18], Argoverse2 [19], Waymo-Open [20], and Shifts [21] datasets
using the standard loss composed of a classification loss on the mode probabilities and a regression loss
on the mixture parameters.
      </p>
      <p>In the Wayformer model, the scene context embedding describes a compressed representation between
the scene encoder and the trajectory decoder, describing a dense understanding of the scenario. We
train a Sparse Autoencoder (SAE) on the flattened scene context embeddings to extract monosemantic,
interpretable feature directions. The sparse SAE features may then encode certain road or agent
characteristics that lead to specified driving behavior. The semantics of a feature are extracted with
Activation Maximization [22], allowing for later modification of a feature like "left turn" within the
SAE. Certain driving behaviors can then be triggered according to the modifiable features in the SAE.</p>
      <p>The SAE consists of two linear layers with a latent representation of 49152 neurons, denoting
an expansion factor of 4. Multiple sizes of the latent dimension have been tested, while the stated
setting provides a reasonable trade-of between suficient disentanglement by producing first dead
neurons – thus being large enough –, and the demand for preferably small dimensionality to maintain
explainability. We apply a top-k filter on the SAE activations with  = 128. We train the SAE for 50
epochs and quantitatively test its reconstruction quality. We measure the MSE between the original
scene context embedding and the reconstruction of the SAE, as well as the influence on the predicted
trajectory in terms of the minADE to the top-1 prediction, as the SAE naturally adds an error to the
overall processing of the Wayformer. The minADE is the minimal average displacement error among
all predictions towards a target trajectory, which is in our case the predicted top-1 trajectory of the
unmodified model. We report an MSE of 0.006, showing a very low reconstruction error, and a mean
minADE score of 0.263, denoting a mean deviation of 26 cm, which is low in comparison to the average
minADE towards the ground truth prediction (135 cm). After the training, 26999 active neurons remain,
while the rest of the neurons never fire.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>We evaluate the SAE encodings representing the Wayformer’s latent space in two steps. First, we
inspect high-activating neurons and describe how to systematically assign semantic meaning to the
neurons. Then, we exploit the extracted knowledge to manipulate the trajectory prediction process.</p>
      <sec id="sec-4-1">
        <title>4.1. Semantic Concepts in Trajectory Prediction</title>
        <p>Our trained SAE provides a latent space representation of sparse disentangled features represented as
SAE neurons. With the activation maximization approach [22], the top-activating samples per neuron
can be extracted to reveal the semantic encoding of a neuron. Whereas in standard image settings, a
foundation model can be used to assign a semantic label to each neuron [23], we can only refer to a
manual inspection as a knowledge source. Figure 3 shows the visualization of the highest-activating
inputs for a random selection of high-activating neurons from the SAE encoding. The similarity in
the provided samples is interpreted as the information encoded in the respective neuron. However,
the semantic feature encoding is occasionally ambiguous, and no distinct property can be assigned.
Anyhow, hypotheses about a neuron’s semantic encoding can be postulated and later refined. For
neuron 21952 in Figure 3, a highly dense trafic scenario could be encoded with many surrounding
cars and pedestrians on the sidewalk. Neuron 7828 potentially encodes a slight right turn to follow the
selected lane, while neuron 23899 indicates that it is specific to a single scenario, highlighting a risk of
overfitting the data. This information can already be extracted and used for multiple use cases without
further testing, although manually testing every neuron in the SAE is tedious and costly.</p>
        <p>To counter the expansion in the latent space of the SAE, neurons can be automatically grouped to
reduce the complexity of the search space. Figure 4 shows the correlation matrix between non-zero
neurons of the trained SAE. A clear disentanglement of the features can be seen, as most neurons are
only correlated to a few other neurons, highlighting unique feature encodings. Hierarchical feature
clustering can be applied to the correlation matrix, grouping similar neurons together. A human
inspector can then decide on a granularity level to inspect the clusters from a coarse to the neuron level.
Assuming common features causing a vehicle to make a turn in our use case, these features are expected
to be grouped together in the same cluster hierarchy. Depending on the desired feature granularity,
the SAE neurons in this group can be further evaluated, or the next-finer level of the hierarchy can be
explored.</p>
        <p>When inspecting a selected neuron’s highest correlating neurons, a clear alignment in terms of the
predicted output is observable. These correlated neurons implement complementary aspects related to
a specific driving maneuver. The predicted action can further be evaluated in terms of causality with
regard to the extracted aspects, as a left turn may be encoded by multiple neurons, with diferences in
which aspects the neurons encode and how much influence their activation has on the "left turn" action.
Hereby, false predictions can potentially be evaluated with regard to the underlying feature activations.</p>
        <p>As some vague interpretation is currently necessary for assigning semantics to single neurons, further
methods need to be developed to specify the single encoded feature. In the next section, we show how
to verify the assigned semantics quantitatively by testing the downstream manipulation.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Downstream Prediction Manipulation</title>
        <p>By identifying semantically meaningful neurons in the SAE latent space and deriving their influence on
the predicted output, we can selectively activate specific neurons to steer the model’s predictions. This
mechanism can be used to verify the neuron encoding and its influence on the prediction, but it can
also be used to manipulate the downstream predictions. Using the SAE decoder, we construct a simple
downstream feature manipulation pipeline. The SAE encoding of a scene context is manipulated by
perturbing one or multiple neuron activations before reconstructing and forwarding the manipulated
scene context to the Wayformer decoder. The resulting changes in the trajectory predictions reveal the
influence of the feature manipulation.</p>
        <p>As illustrated in Figure 6, the targeted manipulation of the SAE representation can induce a desired
model prediction. By increasing the activation of neuron 6958, which is associated with left turn
trajectories, we can alter the Wayformer predictions for random samples to shift towards a left turn for
the ego vehicle.</p>
        <p>To quantify this efect, we conduct an experiment comparing the manipulated predictions to the
top-1 predicted trajectory of the original Wayformer model using the minADE score. Specifically, we
perturb the activation value of a single SAE neuron and compute the minADE between the original and
manipulated predictions. As shown in Figure 7, neuron 6958, which is responsible for left turns, causes a
significantly higher deviation in the manipulated predictions than other neurons, confirming its strong
influence on the steering direction. This shows that certain features encode direction-specific behaviors,
from which steering-specific neurons can be actively manipulated towards a desired prediction. With
input perturbations to activate such neurons, these encodings can potentially be exploited, forming
a safety risk. Understanding the relation between these features and their embedding in the overall
model processing requires further investigation. Thorough quantitative benchmarking is currently not
possible due to missing ground truth explanations and fine-grained concept-based annotations.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Most AI-based trajectory prediction models operate as black boxes, optimizing solely on average
performance scores while lacking transparency. This limits failure analysis and hinders verification,
both of which are essential for safe deployment in autonomous driving. Our approach addresses this gap
by using SAEs for revealing interpretable features in the latent space of trajectory prediction models. For
our exemplary used Wayformer, we reveal sparse, disentangled feature directions within the previously
opaque scene context encoding, enabling structured analysis and controlled manipulation of model
internals for desired trajectory outputs by an expert. We demonstrate how the activation of a single SAE
feature can shift the predicted trajectories, which can even be actively steered towards desired directions.
When the feature can be inadvertently triggered through spurious or adversarial input modifications,
this demonstrates a potential safety risk, underscoring the need for a better understanding of the model
processing. To our knowledge, this is one of the first approaches to increase the interpretability of the
latent space in trajectory prediction models. Our findings highlight the necessity of applying latent
space interpretability methods to trajectory prediction models to support model verification, targeted
testing, and failure analysis, which are essential components for promoting trust in AI-driven systems.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The research leading to these results is funded by the German Federal Ministry for Economic Afairs
and Energy within the project “Safe AI Engineering – Sicherheitsargumentation befähigendes AI
Engineering über den gesamten Lebenszyklus einer KI-Funktion" and "“NXT GEN AI METHODS –
Generative Methoden für Perzeption, Prädiktion und Planung". The authors would like to thank the
consortia for the successful cooperation.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly for grammar and spelling checks.
Further, the authors used GPT to support the formulation of single sentences, which have been edited
towards the final version. After using these tools, the authors reviewed and edited the content as needed
and take full responsibility for the publication’s content.
[11] Y. Tang, W. Ma, Intent: Trajectory prediction framework with intention-guided contrastive
clustering, 2025. arXiv:2503.04952.
[12] B. A. Olshausen, D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by
v1?, Vision Research 37 (1997) 3311–3325. URL: https://www.sciencedirect.com/science/article/pii/
S0042698997001697. doi:https://doi.org/10.1016/S0042-6989(97)00169-7.
[13] T. Bricken, A. Templeton, J. Batson, B. Chen, A. Jermyn, T. Conerly, N. Turner, C. Anil, C. Denison,
A. Askell, R. Lasenby, Y. Wu, S. Kravec, N. Schiefer, T. Maxwell, N. Joseph, Z. Hatfield-Dodds,
A. Tamkin, K. Nguyen, B. McLean, J. E. Burke, T. Hume, S. Carter, T. Henighan, C. Olah, Towards
monosemanticity: Decomposing language models with dictionary learning, Transformer Circuits
Thread (2023). Https://transformer-circuits.pub/2023/monosemantic-features/index.html.
[14] R. Huben, H. Cunningham, L. R. Smith, A. Ewart, L. Sharkey, Sparse autoencoders find highly
interpretable features in language models, in: The Twelfth International Conference on Learning
Representations, 2024. URL: https://openreview.net/forum?id=F76bwRSLeK.
[15] A. Makhzani, B. Frey, k-sparse autoencoders, 2014. URL: https://arxiv.org/abs/1312.5663.</p>
      <p>arXiv:1312.5663.
[16] L. Gao, T. D. la Tour, H. Tillman, G. Goh, R. Troll, A. Radford, I. Sutskever, J. Leike, J. Wu, Scaling
and evaluating sparse autoencoders, in: The Thirteenth International Conference on Learning
Representations, 2025. URL: https://openreview.net/forum?id=tcsZt9ZNKD.
[17] M. Bouzidi, C. Schlauch, N. Scheuerer, Y. Yao, N. Klein, D. Göhring, J. Reichardt, Closing the loop:
Motion prediction models beyond open-loop benchmarks, CoRR abs/2505.05638 (2025). URL: https:
//doi.org/10.48550/arXiv.2505.05638. doi:10.48550/ARXIV.2505.05638. arXiv:2505.05638.
[18] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan,
O. Beijbom, nuscenes: A multimodal dataset for autonomous driving, in: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[19] B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett,
J. K. Pontes, D. Ramanan, P. Carr, J. Hays, Argoverse 2: Next generation datasets for self-driving
perception and forecasting, in: Proceedings of the Neural Information Processing Systems Track
on Datasets and Benchmarks (NeurIPS Datasets and Benchmarks 2021), 2021.
[20] S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai, B. Sapp, C. R. Qi, Y. Zhou,
Z. Yang, A. Chouard, P. Sun, J. Ngiam, V. Vasudevan, A. McCauley, J. Shlens, D. Anguelov, Large
scale interactive motion forecasting for autonomous driving: The waymo open motion dataset,
in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp.
9710–9719.
[21] A. Malinin, N. Band, Y. Gal, M. Gales, A. Ganshin, G. Chesnokov, A. Noskov, A. Ploskonosov,
L. Prokhorenkova, I. Provilkov, V. Raina, V. Raina, D. Roginskiy, M. Shmatova, P. Tigas, B. Yangel,
Shifts: A dataset of real distributional shift across multiple large-scale tasks, in: J. Vanschoren,
S. Yeung (Eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and
Benchmarks, volume 1, 2021. URL: https://datasets-benchmarks-proceedings.neurips.cc/paper_
ifles/paper/2021/file/ad61ab143223efbc24c7d2583be69251-Paper-round2.pdf.
[22] A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, J. Clune, Synthesizing the preferred inputs for
neurons in neural networks via deep generator networks, in: D. Lee, M. Sugiyama, U. Luxburg,
I. Guyon, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 29,
Curran Associates, Inc., 2016. URL: https://proceedings.neurips.cc/paper_files/paper/2016/file/
5d79099fcdf499f12b79770834c0164a-Paper.pdf.
[23] M. Dreyer, J. Berend, T. Labarta, J. Vielhaben, T. Wiegand, S. Lapuschkin, W. Samek,
Mechanistic understanding and validation of large AI models with semanticlens, CoRR
abs/2501.05398 (2025). URL: https://doi.org/10.48550/arXiv.2501.05398. doi:10.48550/ARXIV.
2501.05398. arXiv:2501.05398.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hagedorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hallgarten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stoll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Condurache</surname>
          </string-name>
          ,
          <article-title>The integration of prediction and planning in deep learning automated driving systems: A review</article-title>
          ,
          <source>IEEE Transactions on Intelligent Vehicles</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . doi:
          <volume>10</volume>
          .1109/TIV.
          <year>2024</year>
          .
          <volume>3459071</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Makansi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Kügelgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Locatello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. V.</given-names>
            <surname>Gehler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Janzing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <article-title>You mostly walk alone: Analyzing feature attribution in trajectory prediction</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2022</year>
          . URL: https://openreview.net/forum?id=POxF-LEqnF.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Stocco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Nunes</surname>
          </string-name>
          , M. d'Amorim,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tonella</surname>
          </string-name>
          , Thirdeye:
          <article-title>Attention maps for safe autonomous driving systems</article-title>
          ,
          <source>in: 37th IEEE/ACM International Conference on Automated Software Engineering, ASE</source>
          <year>2022</year>
          ,
          <article-title>Rochester</article-title>
          , MI, USA, October
          <volume>10</volume>
          -
          <issue>14</issue>
          ,
          <year>2022</year>
          , ACM,
          <year>2022</year>
          , pp.
          <volume>102</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>102</lpage>
          :
          <fpage>12</fpage>
          . URL: https: //doi.org/10.1145/3551349.3556968. doi:
          <volume>10</volume>
          .1145/3551349.3556968.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Nayakanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Al-Rfou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Refaat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sapp</surname>
          </string-name>
          , Wayformer:
          <article-title>Motion forecasting via simple eficient attention networks</article-title>
          ,
          <source>in: 2023 IEEE International Conference on Robotics and Automation (ICRA)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>2980</fpage>
          -
          <lpage>2987</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICRA48891.
          <year>2023</year>
          .
          <volume>10160609</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ferrando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. B.</given-names>
            <surname>Obeso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rajamanoharan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nanda</surname>
          </string-name>
          ,
          <article-title>Do i know this entity? knowledge awareness and hallucinations in language models</article-title>
          ,
          <source>in: The Thirteenth International Conference on Learning Representations</source>
          ,
          <year>2025</year>
          . URL: https://openreview.net/forum?id=
          <fpage>WCRQFlji2q</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schiele</surname>
          </string-name>
          ,
          <article-title>Motion transformer with global intention localization and local movement refinement</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Girgis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Golemo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Codevilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. A. D'Souza</surname>
            ,
            <given-names>S. E.</given-names>
          </string-name>
          <string-name>
            <surname>Kahou</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Heide</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Latent variable sequential set transformers for joint multi-agent motion prediction</article-title>
          ,
          <source>in: The Tenth International Conference on Learning Representations, ICLR</source>
          <year>2022</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          ,
          <source>April 25-29</source>
          ,
          <year>2022</year>
          , OpenReview.net,
          <year>2022</year>
          . URL: https://openreview.net/forum?id=Dup_
          <fpage>dDqkZC5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <article-title>Qcnet: query context network for salient object detection of automatic surface inspection</article-title>
          ,
          <source>Vis. Comput</source>
          .
          <volume>39</volume>
          (
          <year>2023</year>
          )
          <fpage>4391</fpage>
          -
          <lpage>4403</lpage>
          . URL: https://doi.org/10.1007/ s00371-022-02597-w. doi:
          <volume>10</volume>
          .1007/S00371-022-02597-W.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hayet</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Karamouzas</surname>
          </string-name>
          ,
          <article-title>Socialvae: Human trajectory prediction using timewise latents</article-title>
          , in: S. Avidan,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Brostow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cissé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Farinella</surname>
          </string-name>
          , T. Hassner (Eds.),
          <source>Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27</source>
          ,
          <year>2022</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>IV</given-names>
          </string-name>
          , volume
          <volume>13664</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2022</year>
          , pp.
          <fpage>511</fpage>
          -
          <lpage>528</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -19772-7_
          <fpage>30</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -19772-7\_
          <fpage>30</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <article-title>Resonance: Learning to predict social-aware pedestrian trajectories as co-vibrations</article-title>
          ,
          <source>CoRR abs/2412</source>
          .02447 (
          <year>2024</year>
          ). URL: https://doi.org/10.48550/arXiv.2412.02447. doi:
          <volume>10</volume>
          .48550/ARXIV.2412.02447.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>