<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Role of Eye-Tracking Data in Encoder-Based Models: an In-depth Linguistic Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucia Domenichelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Dini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominique Brunato</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felice Dell'Orletta</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ItaliaNLP Lab, Istituto di Linguistica Computazionale “A. Zampolli” (CNR-ILC)</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper falls within ongoing research aimed at enhancing the human interpretability of neural language models by incorporating physiological data. Specifically, we leverage eye-tracking data collected during reading to explore how such information can guide model behavior. We train a multilingual encoder model to predict eye-tracking features from the Multilingual Eye-tracking Corpus (MECO) and analyze the resulting shifts in model attention patterns, focusing on how attention redistributes across linguistically informed categories such as part of speech, word position, word length, and distance from the syntactic head after fine-tuning. Moreover, we test how this attention shift impacts the representation of the interested words in the embedding space. The study covers both Italian and English, enabling a cross-linguistic perspective on attention and representation shifts in multilingual encoders grounded in human reading behavior.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Eye-tracking</kwd>
        <kwd>Neural Attention</kwd>
        <kwd>Multilingual models</kwd>
        <kwd>Embedding space</kwd>
        <kwd>Interpretability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>attention patterns, as well as their performance on
downstream tasks and representation space. Their findings
Neural language models (NLMs) now match or even sur- show that this intermediate process increases the
correlapass human benchmarks on many NLP tasks, yet the tion between model attention and human attention and it
logic behind their predictions remains largely hidden be- leads to a compression of the embedding space, without
hind billions of parameters. To make these systems more generally degrading performance on downstream tasks.
transparent and data-eficient, researchers are increas- Building on this foundational framework, this paper
ingly borrowing ideas from cognitive science, grounding aims to further highlight the efects of incorporating
inboth training and evaluation in how people actually learn formation about human reading behavior in a NLM
and process language (e.g. [1, 2, 3]). Among the most from a linguistically informed perspective.
Specifiinformative cognitive signals of human language pro- cally, we examine how fine-tuning on eye-tracking
sigcessing is eye-tracking (ET). Decades of psycholinguistic nals leads to shifts in model attention, and how these
work show that fixation times, regressions, and skips shifts afect the structure of word representations. To
mirror both early lexical access and later integrative pro- explore this, we extract a set of linguistic features,
capturcesses underlying text comprehension [4, 5]. Leveraging ing progressively more complex language phenomena,
these signals has already boosted model accuracy on a from the input text and analyze how attention is
redisvariety of downstream tasks ranging from core linguistic tributed across word classes defined by these features. In
tasks [6] to more applied tasks like sentiment analysis [7], parallel, we assess how these attention shifts influence
language proficiency assessment [ 8], machine reading the embedding space, both at a global level and within the
comprehension [9], while also giving us a new lens on local representational geometry of specific word classes.
model interpretability. Studies by Sood et al. [10] and The code for our experiments is publicly available on
Eberle et al. [11] found that transformer attention does GitHub.
not always line up with human gaze, whereas Bensemann
et al. [12] and Wang et al. [13] revealed stronger links
in specific layers, hinting at a layered correspondence 2. Related work
between reading behavior and neural representations.</p>
      <p>Extending this direction, Dini et al. [14] investigate how
injecting reading-related information into NLMs through
diferent fine-tuning strategies on ET data afects their</p>
      <sec id="sec-1-1">
        <title>Our study intersects two complementary lines of research</title>
        <p>within NLMs interpretability. The first investigates ET
data as a diagnostic signal to evaluate the alignment
between model behavior and human cognitive processing,
particularly through the lens of attention mechanisms.
The second focuses on analysing model’s attention
mechanisms (Section 2.2) and representational space (Section</p>
        <sec id="sec-1-1-1">
          <title>2.1. Eye-tracking and NLMs</title>
          <p>entity’s determiners or subjects attending to their verbs.</p>
          <p>However, fine-tuning on syntactic or semantic tasks had
minimal efect on altering self-attention patterns. Vig
and Belinkov [26] conducted a comprehensive analysis
of attention head interpretability in GPT-2 using both
visualization and quantitative measures. Their results
indicate a layer-specific linguistic sensitivity, with diferent
types of linguistic information—such as PoS and
syntactic dependencies—being more salient in particular layers.</p>
          <p>They also found stronger alignment with syntactic
dependencies in the model’s middle layers. Htut et al. [27]
directly evaluated the extent to which attention aligns
with gold-standard dependency parses. By computing
the correspondence between attention distributions and
syntactic head-dependent pairs, they showed that BERT’s
attention does not systematically reflect syntactic
dependency structures, particularly in deeper layers.</p>
          <p>Taken together, these studies suggest that while
attention mechanisms can exhibit linguistically meaningful
behavior in isolated cases—especially in specific layers or
individual heads—they do not consistently encode
syntactic or morpho-syntactic structure.</p>
          <p>In recent years, eye-tracking has emerged as a prominent
physiological signal in NLP research due to its
afordability and ease of collection compared to methods like fMRI
or MEG. Public resources such as the GECO corpus [15],
the MECO corpus [16], and the WE-RDD dataset [17]
now let researchers probe gaze behaviour at scale across
languages and reading paradigms.</p>
          <p>Work with these corpora has split in two directions.</p>
          <p>The former injects gaze-derived features, into neural
architectures, typically lifting accuracy on downstream
tasks. The latter, which motivates our study, treats ET as
a diagnostic for a model’s internal workings.</p>
          <p>The first systematic comparison came from Sood et al.
[18], who matched attention maps from CNNs, LSTMs
and Transformers against human fixations. Their
findings reveal that while transformers performed the best,
they showed the weakest alignment with gaze. Eberle
et al. [11] confirmed that even after task-specific
finetuning, large Transformers stayed distant from human
reading patterns. Conversely, Bensemann et al. [12]
reported that raw dwell times correlate strongly with the
earliest BERT layers, a relation that persists as model
size grows. Morger et al. [19] extended the inquiry cross- 2.3. Geometry of the embedding space
lingually and found robust correlations, especially for Transformer models learn a high-dimensional embedding
monolingual encoders, between human word-importance space in which every token is represented by a dense
vecrankings and model saliency. Most recently, Wang et al. tor that encodes both meaning and syntax. A consistent
[20] showed that deeper layers of NLMs once again echo ifnding is that these vectors occupy only a narrow cone
ifxation metrics, hinting at a layered, non-monotonic link of the space, an anisotropic layout sometimes called the
between model depth and cognitive fidelity. representation degradation efect [ 28, 29, 30]. In NLP,
such behaviour is often viewed as harmful because it
2.2. Model Attention Dynamics can hide fine-grained linguistic cues [ 31, 32, 33]. Yet
theory and broader machine-learning evidence show that
The role of attention mechanisms in NLMs has been a anisotropy can arise naturally under stochastic
gradisubject of extensive research and debate. While atten- ent descent and may even aid generalization, especially
tion weights are often interpreted as providing insight when models project data onto low-dimensional
maniinto model reasoning, a growing body of research has folds [34, 35, 36, 37]. In this respect, studying the impact
questioned their reliability as faithful explanations of of various fine-tuning objectives and downstream tasks
model decisions. Some studies suggest that attention provides important insights into how they shape the
gecan highlight important input elements, yet others argue ometry of the embedding space [34, 35, 36]. While still
that attention distributions can be manipulated with- relatively limited, a growing body of work has begun
out significantly afecting predictions, casting doubt on to examine the relationship between embedding space
their explanatory power [21, 22]. In response to these properties and linguistic phenomena. For example,
Herconcerns, alternative attribution methods have been pro- nandez and Andreas [38] show that linguistic features
posed—such as attention rollout [23] and gradient-based tend to be encoded in lower-dimensional subspaces in
techniques [24]—which aim to better capture the path- the early layers of both ELMo and BERT and that
relaways through which information influences predictions. tional features (like dependency relations between pairs
As part of this debate, a parallel line of work has ex- of words) are encoded less compactly than categorical
plored whether attention aligns with known linguistic features like part of speech. More recently, Cheng et al.
structures, such as syntactic dependencies or PoS cate- [39] analyzed representation compression in pre-trained
gories, ofering a complementary perspective on its in- language models from both geometric and
informationterpretability. The foundational study by Clark et al. [25] theoretic perspectives. Their findings reveal a strong
showed that certain attention heads in BERT consistently correlation between these two views and show that the
focus on syntactic phenomena, such as attending to an intrinsic geometric dimension of linguistic data is
predictive of its coding length under the language model.</p>
          <p>To the best of our knowledge, no systematic study has
examined how eye-tracking fine-tuning afects attention
patterns and the resulting embedding representations
across diferent linguistic phenomena. Moreover,
crosslinguistic analyses of these changes following cognitively
motivated fine-tuning remain scarce.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Dataset</title>
      <sec id="sec-2-1">
        <title>For our analysis, we leverage two distinct datasets: the</title>
        <p>Multilingual Eye-tracking Corpus (MECO) to finetune
the model on human gaze modeling and treebanks from
the Universal Dependencies (UD) project to extract
linguistically motivated features and compute model
attention shifts and representation structure induced by
ifne-tuning on ET data.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Italian Stanford Dependency Treebank (ISDT), which</title>
        <p>contains ≈ 13, 000 sentences drawn from a variety of
textual genres. For English, we used the training set of the
English Web Treebank (EWT) [41], including ≈ 12, 000
sentences , also multi-genre. UD corpora were chosen
due to their gold-standard syntactic and part-of-speech
annotations, which provide a reliable foundation for our
ifne-grained linguistic analyses. Additionally, the
crosslinguistically consistent annotation schema ofered by
UD enables meaningful comparisons across typologically
distinct languages.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Our Approach</title>
      <p>We propose a linguistically informed framework to
investigate the impact of injecting human reading
behaviour into a pre-trained NLM, focusing on its efects
on attention and word representations. The approach
3.1. Eye-tracking data: The MECO Corpus consists of two main stages: first, we fine-tune the model
on predicting several ET features; then, we compare the
MECO [16] is a multilingual collection featuring read- pre-trained and fine-tuned models along three axes: i)
ing behavior from both native (L1) and second-language Correlation between model attention and human
attenspeakers across 13 languages. We focus on the L1 subsets tion; ii) Attention distribution over input tokens; iii)
Senfor English and Italian, chosen for their typological di- tence representations in the embedding space.
versity and data completeness, allowing for a controlled To enable a more fine-grained analysis of how ET
fineyet cross-linguistic perspective on gaze modeling. tuning afects word representations, we condition our</p>
      <p>Each participant in MECO read 12 encyclopedic-style evaluations on the following linguistic features extracted
texts, covering general knowledge topics. To ensure con- from the UD treebanks: word length in characters, part
sistency and limit computational costs, we selected the of speech category, position in the sentence, and
dislargest subsets of users who had read the majority of sen- tance from the syntactic head.
tences. For Italian, we included 9 participants who read For our experiments we used XLM-RoBERTa-base, a 12
all sentences. For English, since no participant completed layer multilingual encoder-based model. In what follows,
the full set, we selected 25 participants who all read the we outline the methodological choices and
implementasame set of sentences, missing only two in common. tion details of our experimental setting.</p>
      <p>We used five ET features intended to represent early,
late and contextual signals of human reading processes: 4.1. ET injection into the Model
First Fixation Duration: the duration of the first
fixation landing on the word; Gaze Duration: the summed
duration of fixations on the word in the first pass, i.e.,
before the gaze leaves it for the first time; Total Reading
Time: the cumulative amount of time spent reading a
word, capturing both fixations and potential
interruptions (e.g., regressions or pauses); First-run Number of
Fixations: the number of fixations on a word during the
ifrst pass; Total Number of Fixations: the number of
discrete fixations on areas of interest overall.</p>
      <sec id="sec-3-1">
        <title>To inject reading-related information into the model, we</title>
        <p>leverage the set of eye-tracking features from MECO
described in Section 3.1. Unlike most prior work—which
typically aggregates eye-tracking data across participants,
with few exceptions [42]—we treat each reader
individually, conducting experiments separately for each subject.</p>
        <p>This design choice is motivated by the intrinsic variability
observed in reading behavior, even among skilled
readers [43, 44, 45], and enables a more accurate modeling of
reader-specific dynamics.
3.2. Universal Dependencies Treebanks After a hyperparameter tuning phase using 5-fold
cross-validation, we fine-tune the model to predict
To analyze how model attention weights and embedding five word-level eye-tracking features , training a
sepspace shift following fine-tuning on eye-tracking data, arate model for each individual reader.
we relied on linguistically annotated corpora from UD Since the MECO dataset provides annotations at the
treebanks [40]. Specifically, for Italian, we employed word level, while the model’s tokenizer splits some words
the subsection corresponding to the training set of the into subword units, we follow standard practice [46] and
assign eye-tracking features only to the first sub-token
of each word, ignoring the rest during training1.</p>
        <p>To examine whether the fine-tuned model develops
a more human-like attention pattern, we compute the
correlation between model attention and human
attention before and after fine-tuning. For model
attention, we consider the attention weights received by
each word when computing the representation of the
beginning-of-sentence token (&lt;s&gt;), which is the only
token used during the eye-tracking prediction phase and
serves as a global summary of the sentence. To account
for subword tokenization, we follow the same approach
used during fine-tuning and associate attention scores to
the first sub-token of each word. As a proxy for human
attention, we choose the Total Reading Time feature (see
Section 3.1). For each reader, we thus compute the
correlation between their eye-tracking data and the attention
patterns of both the pre-trained and the fine-tuned model
across all layers, allowing us to assess whether the latter
aligns more closely with human reading behavior.</p>
        <sec id="sec-3-1-1">
          <title>4.2. Assessing the Role of ET fine-tuning on Word Representations</title>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>The two metrics were computed on the first sub-token</title>
        <p>of each word in the UD treebanks. In line with the other
analyses, we compare the embedding spaces of the
pretrained and fine-tuned models to assess whether ET
finetuning leads to more compact or more isotropic
representations, as reflected by changes in these metrics.</p>
        <p>All reported scores are first computed for each user
individually and then averaged across all users.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results</title>
      <sec id="sec-4-1">
        <title>5.1. Correlation between model and human attention</title>
        <sec id="sec-4-1-1">
          <title>To assess how fine-tuning on ET afects the model’s internal dynamics for attention and embedding space, we leverage the linguistic features from the treebanks described in Section 4.</title>
          <p>Specifically, to compute the attention shifts, for each
value of these features, we analyse the amount of
attention the corresponding words receive before and after
ifne-tuning. This allows us to characterize shifts in
attention distribution across diferent linguistic phenomena
and across all layers of the models. Firstly, we normalize
the attention scores for each sentence (excluding BOS
and EOS tokens) so that their sum is 1. Attention shifts
are quantified as the percentage change in the average
attention received by tokens with a given feature value, Figure 1: Correlation between model attention and human
before fine-tuning. A positive shift indicates increased attention (p-value &lt; 0.05).
attention to these tokens after fine-tuning, while a
negative shift reflects a decrease. This allows us to identify As a first evaluation step, we computed the correlation
which linguistic categories gain or lose prominence after between human attention and model attention, both
beincorporating eye-tracking supervision. fore and after fine-tuning on eye-tracking data. As we</p>
          <p>To analyze the shifts in the embedding space, we are interested in the strength rather than the direction
rely on two complementary metrics. (i) IsoScore [47] of- of the association, we considered the absolute values of
fers a scale-invariant measure of isotropy: lower scores the correlation coeficients. For the fine-tuned models,
indicate that the embedding variance is concentrated we computed the correlation between the model’s
attenalong fewer directions, pointing to a more anisotropic tion weights and the Total Reading Time of the specific
space. (ii) Linear Intrinsic Dimensionality (Linear-ID) [48] user on which each model was fine-tuned. For the
preestimates the dimensionality of the smallest linear sub- trained model, which is not finetuned to any individual
space that captures the embeddings, providing a proxy reader, we calculated the correlation between its
attenfor their geometric complexity. tion weights and the Total Reading Time of each user
independently, and subsequently averaged the resulting
1The fine-tuning is run for 50 epochs, using a learning rate of 5− 05, coeficients. Figure 1 reports the comparison of Spearman
a weight decay of 0.01, and a warm-up ratio of 0.05. correlation coeficients, averaged across all users.</p>
          <p>In line with results reported in [14, 49], fine-tuning
on ET data consistently leads to stronger correlation
coeficients between model and human attention,
particularly in the deeper layers of the model. This
efect is evident in both Italian and English. The overall
patterns are remarkably similar across the two languages,
although the correlation scores for Italian are slightly
higher on average.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Analysis of the Attention Shifts</title>
        <sec id="sec-4-2-1">
          <title>This section reports the analysis of the attention shifts in</title>
          <p>duced by fine-tuning on ET data. We grouped tokens into
classes for the values of the linguistic features detailed in
Section 4. To enhance readability and interpretability, for
each linguistic feature we visualised only the most
representative values. Rather than applying a strict frequency
threshold, we heuristically excluded rare or degenerate
cases (e.g., for token length, extremely long tokens such
as URLs), retaining typical and frequent values that
better reflect standard linguistic patterns. Each figure also
includes an “AVG” column summarizing the average shift
across all layers, ofering a high-level view of the
attention reallocation patterns.
and sentence interpretation. Additionally, a language- tive data from the used UD treebanks show that early
specific efect is visible in Italian, where coordinating sentence positions largely correspond to syntactically
conjunctions (CCONJ) gain notable attention across sev- central elements—particularly the root, which anchors
eral layers. While similar shifts occur sporadically in the the clause and governs the structure of major
compleEnglish model, they are less consistent and often ofset ments. The observed shift in attention may therefore
by decreases in other layers. reflect the model’s increased sensitivity to syntactic
or</p>
          <p>As regards the attention shifts based on the word’s ganization cues at sentence onset, especially in specific
position within the sentence (Figure 4), we noted that layers. This behavior is also well-documented in
psyfor both languages tokens appearing earlier in the cholinguistic studies and indicative of incremental
parssentence generally receive slightly more attention ing, where early elements guide syntactic and semantic
after fine-tuning , whereas those occurring later receive expectations during sentence comprehension.
less. An exception is observed for the first two tokens, Figure 5 shows the attention shifts for the
headwhich deviate from this trend. Layer-specific behaviors dependent distance parameter. A positive value indicates
also emerge: for instance, layers 2 and 9 tend to increase that the head follows the dependent, while a negative one
attention toward later tokens, while most other layers that the head precedes it. The special value 0 is assigned
show the opposite efect, emphasizing earlier positions. to the root of the sentence. On average, it emerged that
Notably, layer 2 and layer 11 both show sharp increases tokens that are syntactically closer to their head
in attention to the first token, suggesting a potential tend to receive more attention after fine-tuning,
reweighting of sentence-initial information after expo- particularly when the head follows the dependent.
sure to human reading patterns. Interestingly, quantita- This suggests that fine-tuning on ET data encourages</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>This work has been supported by:</title>
        <p>• FAIR - Future AI Research (PE00000013) projects
under the NRRP MUR program funded by the</p>
        <p>
          NextGenerationEU.;
• The project “XAI-CARE” funded by the
European Union - Next Generation EU - NRRP M6C2
“Investment 2.1 Enhancement and
strengthening of biomedical rese
          <xref ref-type="bibr" rid="ref66">arch in the NHS”
(PNRR-MAD-2022</xref>
          -12376692_VADALA’ – CUP
        </p>
        <p>F83C22002470001)
• The project “Human in Neural Language Models”
(IsC93_HiNLM), funded by CINECA3 under the</p>
        <p>
          ISCRA initiative;
• Language Of Dreams: the relationship
between sleep mentation, neurophysiology,
and neurologi
          <xref ref-type="bibr" rid="ref30">cal disorders - PRIN 2022</xref>
          2022BNE97C_SH4_PRIN2022.
ized word representations? Comparing the geome- [36] A. Machina, R. Mercer, Anisotropy is not
intry of BERT, ELMo, and GPT-2 embeddings, in: herent to transformers, in: K. Duh, H. Gomez,
K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceed- S. Bethard (Eds.), Proceedings of the 2024
Conings
          <xref ref-type="bibr" rid="ref61">of the 2019</xref>
          Conference on Empirical Meth- ference of the North American Chapter of the
ods in Natural Language Processing and the 9th In- Association for Computational Linguistics:
Huternational Joint Conference on Natural Language man Language Technologies (Volume 1: Long
Processing (EMNLP-IJCNLP), Association for Com- Papers), Association for Computational
Linguisputational Linguistics, Hong Kong, Chin
          <xref ref-type="bibr" rid="ref66">a, 2019</xref>
          , tics, Mexico City, Mexico, 2024, pp. 4892–4907.
pp. 55–65. URL: https://aclanthology.org/D19-1006/. URL: https://aclanthology.org/2024.naacl-long.274/.
doi:10.18653/v1/D19-1006. doi:10.18653/v1/2024.naacl-long.274.
[29] N. Godey, É. Clergerie, B. Sagot, Anisotropy is in- [37] A. Ansuini, A. Laio, J. H. Macke, D. Zoccolan,
Inherent to self-attention in transformers, in: Y. Gra- trinsic dimension of data representations in deep
ham, M. Purver (Eds.), Proceedings of the 18th neural networks, Advances in Neural Information
Conference of the European Chapter of the Asso- Processing Systems 32 (2019).
ciation for Computational Linguistics (Volume 1: [38] E. Hernandez, J. Andreas, The low-dimensional
Long Papers), Association for Computational Lin- linear geometry of contextualized word
represenguistics, St. Julian’s, Malta, 2024, pp. 35–48. URL: tations, in: Conference on Computational
Nathttps://aclanthology.org/2024.eacl-long.3/. ural Language Learning, 2021. URL: https://api.
[30] J. Gao, D. He, X. Tan, T. Qin, L. Wang, T. Liu, Repre- semanticscholar.org/CorpusID:234742544.
sentation degeneration problem in training natural [39] E. Cheng, C. Kervadec, M. Baroni, Bridging
language generation models, in: International Con- information-theoretic and geometric compression
ference on Le
          <xref ref-type="bibr" rid="ref66">arning Representations, 2019</xref>
          . URL: in language models, in: Proceedings of the 2023
https://openreview.net/forum?id=SkEYojRqtm. Conference on Empirical Methods in Natural
Lan[31] X. Cai, J. Huang, Y. Bian, K. Church, Isotropy in guage Processing, Association for Computational
the contextual embedding space: Clusters and man- Linguis
          <xref ref-type="bibr" rid="ref18">tics, 2023</xref>
          , p. 12397–12420.
ifolds, in: International conference on learning [40] M.-C. de Marnefe, C. D. Manning, J. Nivre, D.
Zerepresentations, 2021. man, Universal dependencies, Computational
Lin[32] Z. Zhang, C. Gao, C. Xu, R. Miao, Q. Yang, J. Shao, guistics 47 (2021) 255–308. URL: https://doi.org/10.
        </p>
        <p>
          Revisiting representation degeneration problem in 1162/coli_a_00402. doi:10.1162/coli_a_00402.
language modeling, in: Findings of the Association [41] N. Silveira, T. Dozat, M.-C. de Marnefe, S. Bowman,
for Computational Linguis
          <xref ref-type="bibr" rid="ref12">tics: EMNLP 2020</xref>
          , 2020, M. Connor, J. Bauer, C. D. Manning, A gold standard
pp. 518–527. dependency corpus for English, in: Proceedings of
[33] T. Mickus, D. Paperno, M. Constant, K. van Deemter, the Ninth International Conference on Language
What do you mean, BERT? assessing bert as a distri- Resources and Evaluation (LREC-2014), 2014.
butional semantics model, in: A. Ettinger, G. Jarosz, [42] S. Brandl, N. Hollenstein, Every word counts: A
J. Pater (Eds.), Proceedings of the Society for Com- multilingual analysis of individual human
alignputation in Linguis
          <xref ref-type="bibr" rid="ref12">tics 2020</xref>
          , Association for Com- ment with model attention, in: Y. He, H. Ji, S. Li,
putational Linguis
          <xref ref-type="bibr" rid="ref12">tics, New York, New York, 2020</xref>
          , Y. Liu, C.-H. Chang (Eds.), Proceedings of the 2nd
pp. 279–290. URL: https://
          <xref ref-type="bibr" rid="ref13">aclanthology.org/2020</xref>
          . Conference of the Asia-Pacific Chapter of the
Asscil-1.35/. sociation for Computational Linguistics and the
[34] R. Diehl Martinez, Z. Goriely, A. Caines, P. But- 12th International Joint Conference on Natural
Lantery, L. Beinborn, Mitigating frequency bias guage Processing (Volume 2: Short Papers),
Asand anisotropy in language model pre-training sociation for Computational Linguistics, Online
with syntactic smoothing, in: Y.
          <xref ref-type="bibr" rid="ref66">Al-Onaizan, only, 2022</xref>
          , pp. 72–77. URL: https://aclanthology.
M. Bansal, Y.-N. Chen (Eds.), Proceedings of org/2022.aacl-short.10/. doi:10.18653/v1/2022.
the 2024 Conference on Empirical Methods in aacl-short.10.
        </p>
        <p>Natural Language Processing, Association for [43] A. J. Parker, T. J. Slattery, Spelling ability influences
Computational Linguistics, Miami, Florida, USA, early letter encoding during reading: Evidence from
2024, pp. 5999–6011. URL: https://aclanthology.org/ return-sweep eye movements, Quarterly Journal of
2024.emnlp-main.344/. doi:10.18653/v1/2024. Experimental Psychology 74 (2021) 135–149. URL:
emnlp-main.344. https://doi.org/10.1177/1747021820949150. doi:10.
[35] W. Rudman, C. Eickhof, Stable anisotropic reg- 1177/1747021820949150, pMID: 32705948.
ularization, in: The Twelfth International Con- [44] J. Ashby, K. Rayner, C. Clifton, Eye
moveference on Learning Representations, 2024. URL: ments of highly skilled and average readers:
Difhttps://openreview.net/forum?id=dbQH9AOVd5. ferential efects of frequency and predictability,
The Quarterly Journal of Experimental
Psychology Section A 58 (2005) 1065–1086. doi:10.1080/
02724980443000476.
[45] T. J. Slattery, M. Yates, Word skipping: Efects of
word length, predictability, spelling and reading
skill, Quarterly Journal of Experimental Psychology
71 (2018) 250–259. doi:10.1080/17470218.2017.</p>
        <p>1310264.
[46] N. Hollenstein, F. Pirovano, C. Zhang, L. Jäger,</p>
        <p>L. Beinborn, Multilingual language models
predict human reading behavior, in: K. Toutanova,
A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur,
I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty,
Y. Zhou (Eds.), Proceedings of the 2021
Conference of the North American Chapter of the
Association for Computational Linguistics: Human
Language Technologies, Association for
Computational Linguistics, Online, 2021, pp. 106–123.</p>
        <p>URL: https://aclanthology.org/2021.naacl-main.10/.</p>
        <p>
          doi:10.18653/v1/2021.naacl-main.10.
[47] W. Rudman, N. Gillman, T. Rayne, C.
Eickhof, IsoScore: Measuring the uniformity of
embedding space utilization, in: S. Muresan,
P. Nakov, A. Villavicencio (Eds.), Findings of the
Association for Comput
          <xref ref-type="bibr" rid="ref66">ational Linguistics: ACL
2022</xref>
          , Association for Computational Linguistics,
Dublin, Irel
          <xref ref-type="bibr" rid="ref66">and, 2022</xref>
          , pp. 3325–3339. URL: https:
//
          <xref ref-type="bibr" rid="ref66">aclanthology.org/2022</xref>
          .findings-acl.262/. doi: 10.
        </p>
        <p>18653/v1/2022.findings-acl.262.
[48] J. H. Lee, T. Jiralerspong, L. Yu, Y. Bengio, E. Cheng,</p>
        <p>Geometric signatures of compositionality across a
language model’s lifetime (2025). URL: https://arxiv.</p>
        <p>org/abs/2410.01444. arXiv:2410.01444.
[49] L. Dini, L. Moroni, D. Brunato, F. Dell’Orletta, In the
eyes of a language model: A comprehensive
examination through eye-tracking data, Neurocomputing
(2025). In press.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A. Shift in the embeddings space</title>
    </sec>
    <sec id="sec-7">
      <title>Extra features</title>
      <sec id="sec-7-1">
        <title>This Appendix section contains the analysis of Section</title>
        <p>5.3 conducted on the remaining linguistic features: word
length, Figures A.1 and A.2, and word index in sentence,
Features A.3 and A.4. As in Section 5.3, a clear hierarchy
emerges among the new feature classes. For word length,
tokens 6–10 characters long retain the highest IsoScore
and Linear-ID before collapsing, like all other bins, under
ifne-tuning.</p>
        <p>Figure A.2: Linear-ID before (left) and after (right)
finetuning, shown for word length (up to 15 tokens).</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>B. Shift in the embedding space</title>
    </sec>
    <sec id="sec-9">
      <title>English dataset</title>
      <sec id="sec-9-1">
        <title>We report the scores on the English word embeddings. The results are comparable to those on the italian dataset. Further exploration of parallels and diferences will be the focus of future work.</title>
        <p>Figure B.3: Isotropy before (top) and after (bottom)
finetuning, grouped by syntactic head distance (up to 7 words of
distance).</p>
        <p>Figure B.4: Linear-ID before (top) and after (bottom)
finetuning, grouped by syntactic head distance (up to 7 words of
distance).</p>
        <p>Declaration on Generative AI
During the preparation of this work, the author(s) used ChatGPT (OpenAI) in order to: Improve
writing style and Formatting assistance. After using these tool(s)/service(s), the author(s) reviewed</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>579</fpage>
          -
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          584. [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kanojia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dey</surname>
          </string-name>
          , P. Bhat-
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>ceedings of the 20th SIGNLL Conference on Com-</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <year>2016</year>
          , pp.
          <fpage>156</fpage>
          -
          <lpage>166</lpage>
          . URL: https://aclanthology.org/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <fpage>K16</fpage>
          -
          <lpage>1016</lpage>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>K16</fpage>
          -1016. [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Berzak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. Levy</surname>
          </string-name>
          , Assessing language
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>2018 Conference of the North American Chapter</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Human Language</surname>
            <given-names>Technologies</given-names>
          </string-name>
          , Volume
          <volume>1</volume>
          (Long
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>tics</surname>
          </string-name>
          , New Orleans, Louisiana,
          <year>2018</year>
          , pp.
          <fpage>1986</fpage>
          -
          <lpage>1996</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>URL: https://aclanthology.org/N18-1180/. doi:10.</mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <volume>18653</volume>
          /v1/
          <fpage>N18</fpage>
          -1180. [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Malmaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Berzak</surname>
          </string-name>
          , Bridging
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>T.</given-names>
            <surname>Linzen</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 24th Con-</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>tics</surname>
          </string-name>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>142</fpage>
          -
          <lpage>152</lpage>
          . URL: https://
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          aclanthology.org/
          <year>2020</year>
          .conll-
          <volume>1</volume>
          .11/. doi:
          <volume>10</volume>
          .18653/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          v1/
          <year>2020</year>
          .conll-
          <volume>1</volume>
          .
          <fpage>11</fpage>
          . [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hollenstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Troendle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bigiolli</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>language processing signals</article-title>
          , CoRR abs/
          <year>1904</year>
          .02682
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          (
          <year>2019</year>
          ). [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Evanson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lakretz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>King</surname>
          </string-name>
          , Language ac-
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>low similar learning stages?</article-title>
          , in: Annual Meet- [10]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tannert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bulling,
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>tics</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://api.semanticscholar.
          <article-title>org/ with human gaze-guided neural attention</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>CorpusID:259089351</article-title>
          . H.
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hadsell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Balcan</surname>
            , [3]
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yedetore</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Linzen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Frank</surname>
          </string-name>
          , R. T.
          <string-name>
            <surname>McCoy</surname>
          </string-name>
          ,
          <string-name>
            <surname>How H. Lin</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>poor is the stimulus? evaluating hierarchical gen-</article-title>
          <source>Processing Systems</source>
          , volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <article-title>eralization in neural networks trained on child- Inc</article-title>
          .,
          <year>2020</year>
          , pp.
          <fpage>6327</fpage>
          -
          <lpage>6341</lpage>
          . URL: https://proceedings.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>directed speech</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Boyd-Graber, neurips</article-title>
          .cc/paper_files/paper/2020/file/
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>N.</given-names>
            <surname>Okazaki</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 61st An- 460191c72f67e90150a093b4585e7eb4-Paper.pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>nual Meeting of the Association for Computational</source>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Eberle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pilot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Søgaard</surname>
          </string-name>
          , Do
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <article-title>Association transformer models show similar attention pat-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <year>2023</year>
          , pp.
          <fpage>9370</fpage>
          -
          <lpage>9393</lpage>
          . URL: https://aclanthology. san, P. Nakov,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Villavicencio (Eds.), Proceedings
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          org/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>521</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .
          <article-title>of the 60th Annual Meeting of the Association</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <article-title>acl-long.521. for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          : Long [4]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Just</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Carpenter</surname>
          </string-name>
          , A theory of reading: Papers), Association for Computational Linguis-
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <article-title>from eye fixations to comprehension</article-title>
          ., Psychologi- tics, Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>4295</fpage>
          -
          <lpage>4309</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>cal review 87</source>
          (
          <year>1980</year>
          )
          <article-title>329</article-title>
          . https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>296</volume>
          . doi:10. [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Rayner</surname>
          </string-name>
          , Eye movements in reading and informa-
          <volume>18653</volume>
          /v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>296</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <source>tion processing: 20 years of research</source>
          ., Psychological [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bensemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Prado</surname>
          </string-name>
          , Y. Chen,
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>bulletin 124</source>
          (
          <year>1998</year>
          )
          <article-title>372</article-title>
          . N. Ö. Tan,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Corballis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Riddle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Witbrock</surname>
          </string-name>
          , [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bingel</surname>
          </string-name>
          , F. Keller, A. Søgaard,
          <article-title>Eye gaze and self-attention: How humans and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <article-title>eye-tracking data</article-title>
          ,
          <source>in: Proceedings of the 54th An- ings of the Workshop on Cognitive Modeling and</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <article-title>nual Meeting of the Association for Computational Computational Linguistics (</article-title>
          <year>2022</year>
          ). URL: https://api.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          semanticscholar.org/CorpusID:248780077. [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xu</surname>
          </string-name>
          , Gaze- spective, in: T.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hinrichs</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>K</given-names>
          </string-name>
          . Liu,
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <source>put. Appl</source>
          .
          <volume>36</volume>
          (
          <year>2024</year>
          )
          <fpage>12461</fpage>
          -
          <lpage>12482</lpage>
          . URL: and
          <article-title>Symbols for Natural Language Processing</article-title>
          and
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          https://doi.org/10.1007/s00521-024-09725-8.
          <string-name>
            <given-names>Knowledge</given-names>
            <surname>Graphs Reasoning (NeusymBridge) @</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <source>doi:10.1007/s00521-024-09725-8</source>
          . LREC-COLING-
          <year>2024</year>
          ,
          <article-title>ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia, [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Dini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Domenichelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dell'Orletta</surname>
          </string-name>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          <article-title>From human reading to NLM understanding:</article-title>
          <source>Eval- neusymbridge-1</source>
          .1/.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          <article-title>uating the role of eye-tracking data in encoder-</article-title>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          , Attention is not Explanation,
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          <string-name>
            <surname>M. T. Pilehvar</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 63rd An- ings of the 2019 Conference of the North American</source>
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association guistics: Human Language Technologies</source>
          , Volume
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          <string-name>
            <surname>for Computational</surname>
            <given-names>Linguistics</given-names>
          </string-name>
          , Vienna, Austria,
          <volume>1</volume>
          (Long and Short Papers), Association for Com-
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          <year>2025</year>
          , pp.
          <fpage>17796</fpage>
          -
          <lpage>17813</lpage>
          . URL: https://aclanthology. putational Linguistics, Minneapolis, Minnesota,
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          org/
          <year>2025</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>870</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2025</year>
          . 2019, pp.
          <fpage>3543</fpage>
          -
          <lpage>3556</lpage>
          . URL: https://aclanthology.org/
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          <article-title>acl-long</article-title>
          .
          <volume>870</volume>
          .
          <fpage>N19</fpage>
          -1357/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1357. [15]
          <string-name>
            <given-names>U.</given-names>
            <surname>Cop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Dirix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Drieghe</surname>
          </string-name>
          , W. Duyck, Pre- [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Serrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          , Is attention interpretable?,
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          <article-title>gual and bilingual sentence reading, Behavior ceedings of the 57th Annual Meeting of the As-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          <source>Research Methods</source>
          <volume>49</volume>
          (
          <year>2017</year>
          )
          <fpage>602</fpage>
          -
          <lpage>615</lpage>
          . URL:
          <article-title>https: sociation for Computational Linguistics</article-title>
          , Associa-
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          //api.semanticscholar.org/CorpusID:11567309.
          <article-title>tion for Computational Linguistics</article-title>
          , Florence, Italy, [16]
          <string-name>
            <given-names>N.</given-names>
            <surname>Siegelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schroeder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Acartürk</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-D. Ahn</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>2931</fpage>
          -
          <lpage>2951</lpage>
          . URL: https://aclanthology.org/
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Alexeeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amenta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bertram</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bonandrini</surname>
          </string-name>
          ,
          <fpage>P19</fpage>
          -
          <lpage>1282</lpage>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P19</fpage>
          -1282.
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Brysbaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chernova</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Expanding</surname>
            hori- [23]
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Abnar</surname>
          </string-name>
          , W. Zuidema, Quantifying attention
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          <source>ior research methods</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . 58th Annual Meeting of the Association for Com[17]
          <string-name>
            <given-names>O.</given-names>
            <surname>Raymond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moldagali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Al</given-names>
            <surname>Madi</surname>
          </string-name>
          ,
          <article-title>A dataset putational Linguistics</article-title>
          , Association for Computa-
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          <article-title>of underrepresented languages in eye tracking tional Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>4190</fpage>
          -
          <lpage>4197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          research, in: Proceedings of the 2023 Sympo- URL: https://aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>385</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          <source>sium on Eye Tracking Research and Applications</source>
          , doi:10.18653/v1/
          <year>2020</year>
          .acl-main.
          <volume>385</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          <source>ETRA '23</source>
          ,
          <string-name>
            <surname>Association</surname>
            for Computing Machinery, [24]
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Chefer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gur</surname>
          </string-name>
          , L. Wolf, Transformer inter-
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          New York, NY, USA,
          <year>2023</year>
          . URL: https://doi.org/ pretability beyond attention visualization, in: Pro-
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          10.1145/3588015.3590128. doi:
          <volume>10</volume>
          .1145/3588015. ceedings of the IEEE/CVF Conference on Computer
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          3590128.
          <article-title>Vision and Pattern Recognition (CVPR</article-title>
          ),
          <year>2021</year>
          , pp. [18]
          <string-name>
            <given-names>E.</given-names>
            <surname>Sood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tannert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Frassinelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bulling</surname>
          </string-name>
          , N. T.
          <volume>782</volume>
          -
          <fpage>791</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          <string-name>
            <surname>Vu</surname>
          </string-name>
          ,
          <article-title>Interpreting attention models with human</article-title>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Khandelwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          <source>of the 24th Conference on Computational Natural D. Hupkes (Eds.)</source>
          ,
          <source>Proceedings of the 2019 ACL</source>
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          <string-name>
            <surname>Linguistics</surname>
          </string-name>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>12</fpage>
          -
          <lpage>25</lpage>
          . URL:
          <article-title>https: ing Neural Networks for NLP, Association for Com-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          //aclanthology.org/
          <year>2020</year>
          .conll-
          <volume>1</volume>
          .2/. doi:
          <volume>10</volume>
          .18653/ putational Linguistics, Florence, Italy,
          <year>2019</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          v1/
          <year>2020</year>
          .conll-
          <volume>1</volume>
          .2.
          <fpage>276</fpage>
          -
          <lpage>286</lpage>
          . URL: https://aclanthology.org/W19-4828/. [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Morger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beinborn</surname>
          </string-name>
          , N. Hollenstein, doi:10.18653/v1/
          <fpage>W19</fpage>
          -4828.
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          <article-title>A cross-lingual comparison of human</article-title>
          and model [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Vig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Belinkov</surname>
          </string-name>
          , Analyzing the structure
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Sayeed</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 2022 CLASP in: BlackboxNLP@ACL</source>
          ,
          <year>2019</year>
          . URL: https://api.
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          <article-title>Conference on (Dis)embodiment, Association for semanticscholar</article-title>
          .org/CorpusID:184486755.
        </mixed-citation>
      </ref>
      <ref id="ref68">
        <mixed-citation>
          <string-name>
            <given-names>Computational</given-names>
            <surname>Linguistics</surname>
          </string-name>
          , Gothenburg, Sweden, [27]
          <string-name>
            <surname>P. M. Htut</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Phang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bordia</surname>
            ,
            <given-names>S. R.</given-names>
          </string-name>
          <string-name>
            <surname>Bowman</surname>
          </string-name>
          , Do
        </mixed-citation>
      </ref>
      <ref id="ref69">
        <mixed-citation>
          <year>2022</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>23</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>attention heads in bert track syntactic dependen-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref70">
        <mixed-citation>
          <source>clasp-1</source>
          .2. cies? (
          <year>2019</year>
          ). URL: https://arxiv.org/abs/
          <year>1911</year>
          .12246. [20]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biemann</surname>
          </string-name>
          , Probing large arXiv:
          <year>1911</year>
          .12246.
        </mixed-citation>
      </ref>
      <ref id="ref71">
        <mixed-citation>
          <article-title>language models from a human behavioral per-</article-title>
          [28]
          <string-name>
            <given-names>K.</given-names>
            <surname>Ethayarajh</surname>
          </string-name>
          , How contextual are contextual-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>