<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1080/00031305.1980.10483031</article-id>
      <title-group>
        <article-title>From 'It's All Greek to Me' to 'Nur Bahnhof verstehen': An Investigation of mBERT's Cross-Linguistic Capabilities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aria Rastegar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pegah Ramezani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FAU Erlangen-Nuremberg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1980</year>
      </pub-date>
      <abstract>
        <p>This study investigates the impact of cross-linguistic similarities on idiom representations in mBERT, focusing on English and German idioms categorized by diferent degrees of similarity. We aim to determine whether diferent degrees of cross-linguistic similarities significantly afect mBERT's representations and to observe how these representations change across its 12 layers. Contrary to our initial hypothesis, cross-linguistic similarity did not uniformly impact idiom representations across all layers. While early and middle layers showed no significant diferences among idiom categories, higher layers (from Layer 8 onwards) revealed more nuanced processing. Specifically, significant diferences between the control category and idioms with similar meaning (SM), as well as between idioms with similar lexical items (SL) and those with similar semantics (SM) were observed. Our analysis revealed that early layers provided general representations, while higher layers showed increased diferentiation between literal and figurative meanings. This was evidenced by a general decrease in cosine similarities from Layer 5 onwards, with Layer 8 demonstrating the lowest cosine similarities across all categories. Interestingly, a trend suggests that mBERT performs slightly better with more literal hints. The order of cosine similarity for the categorizations was: idioms with a degree of formal similarity, control idioms, idioms with both formal and semantic similarity, and finally idioms with only semantic similarity. These findings indicate that mBERT's processing of idioms evolves significantly across its layers, with cross-linguistic might afect more significantly in higher layers where more abstract semantic processing likely occurs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;mBERT</kwd>
        <kwd>Multi-word Expressions</kwd>
        <kwd>Idioms</kwd>
        <kwd>Bertology</kwd>
        <kwd>computationally-aided cross-linguistic analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>However, their characteristics also make them a good</title>
        <p>case study in diferent experimental linguistics settings.</p>
        <p>
          Idioms are one of the most studied linguistic concepts Recent advancements in Large Language Models (LLMs)
that broadly can be defined as multi-word expressions and their widespread application have prompted linguists
that are often fixed in terms of their syntactic and lexical to investigate the performance of these models across
aspects, while they usually carry meanings that cannot be various linguistic concepts, including idioms [14, 15, 16].
directly deduced from the meaning of individual words In addition, in the case of multi-lingual models, an
inthey contain [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
          ]. Given their syntactic and struc- teresting research area is how these models encode the
tural fixedness and non-compositional aspects, they were diferent languages on which they are trained [17, 18].
perceived as peripheral, supplementary, or appendixes In this study, a categorization of English and German
to language grammars in earlier approaches to idioms [5, idioms based on three cross-linguistic degrees of
similarp. 5˜04]. However, with the increasing interest in corpus ity is proposed. One category includes idioms that have
studies of language, it has been observed that much of similar formal and semantic aspects in these languages;
human linguistic production is routinized and prefabri- the second includes idioms with formal similarities but
cated [
          <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
          ]. Multi-word expressions with a high degree diferent semantic aspects; and the third category
inof conventionality do not seem to be marginal or limited cludes idioms with similar semantic aspects but diferent
linguistic constructions, as they play an important role formal aspects. The goal of our work is to consider how
in our everyday life [
          <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11</xref>
          ]. In addition, they seem to cross-linguistic similarities among idioms afect the
repbe used in communication across various contexts, from resentation of idioms in mBERT. More specifically, the
novels to political debates and therapeutic dialogues [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. questions underlying the following experiment were:
Given their characteristics and their conventionalized
meanings, they pose many challenges to language
speakers, especially non-native language speakers [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          1. Does cross-linguistic similarity have a
significant impact on the representation of idioms in
mBERT?
2. Does the degree of cross-linguistic similarity and
the representation of the model change across the
12 layers of mBERT?
languages, it should perform consistently across all cross- ies on idioms, one of the aspects that have been studied
linguistic categories, similar to how it represents idioms is the concept of cross-linguistic similarity or
translatafrom the language it has been given, that is English in bility. Among language speakers, the degree of
transthis case. However, if it primarily retrieves data from latability of an idiom in their L1 and L2 seems to play a
specific languages, we expect to observe significant per- significant role in how they interpret and understand the
formance diferences among the categories, potentially idioms [
          <xref ref-type="bibr" rid="ref13">20, 21, 22, 23, 13, 24, 25</xref>
          ]. In one of the earliest
mirroring some of the patterns seen in cross-linguistic investigations of translatability’s efect on L2 idiom
comstudies with second language speakers. That is, identical prehension, [20] examined how advanced Venezuelan
cross-linguistic idioms should be represented almost sim- learners of English understood and produced English
idilarly to the control idioms (in this case, English idioms), ioms with varying degrees of translatability from Spanish.
and idioms with formal and lexical correspondence could Using multiple tasks (multiple-choice recognition,
openboth be represented similarly and, in some cases, more ended definition, discourse completion, and translation),
diferently from the control idioms. Finally, idioms with Irujo found that idioms with identical expressions in both
only corresponding semantics and diferent formal as- languages (e.g., "point of view" / "punto de vista") were
pects should be the most diferently represented idioms easiest to comprehend and produce. In contrast, idioms
compared to the control group. Furthermore, given the representing equivalent concepts without direct
translaproposed categorizations based on formal and semantic tions (e.g., "to pull his leg" vs. tomarle el pelo "to take to
similarities, we anticipated varying performance across him the hair") posed the greatest challenge. The study
mBERT’s 12 layers. Particularly, in lower layers, we ex- also found a negative interference in the form of transfer
pect less diferentiation among categories, as these lay- errors, when participants producing partially matching
ers typically capture more surface-level features. While idioms (e.g., "to catch him red-handed" vs. cogerle con
in higher layers, which represent more of the semantic las manos en la masa "to catch him with the hands in
aspects, we anticipate more varying trends and larger the dough"). Irujo [20] concluded that L1 knowledge
diferences among the categories. Mostly because we are can be both beneficial and detrimental to L2 idiom
proprimarily focused on the figurative meaning of idioms cessing. For idioms with direct translations, L1
knowlacross diferent categories. edge facilitates both comprehension and production in
L2. However, for idioms with partial similarities between
languages, L1 knowledge can lead to transfer errors.
Ad2. Related Works ditionally, a study by [21], which focused on 3rd-year
learners of Spanish, French, and German, found that the
Studies on idiomatic expressions generally focus on two translatability of idioms was a key factor in predicting the
main comparisons: the understanding of idioms by partic- speed and accuracy of their production, both with and
ipants (literal or figurative understanding of the phrases), without context. Furthermore, [21] observed that
transand the diference between understanding idioms and lation is one of the most common strategies employed by
non-idiomatic or novel phrases [
          <xref ref-type="bibr" rid="ref13 ref2">19, 13, 2</xref>
          ]. The figurative L2 users to comprehend idioms, as indicated by learners’
meaning of an idiom is usually conventionalized and rel- written reflections. Also, [ 23] discovered that idioms that
atively fixed; therefore, native speakers seem to simply could be translated literally from Latvian and Mandarin
access it. However, its literal interpretation can be logical, Chinese into English were better comprehended by
parnonsensical, or somewhere in between. For instance, as ticipants. Furthermore, they observed that regardless of
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] explains, while it is possible that someone is bathing the overall similarity of the studied languages to English,
in the example of being ’in hot water’ (with an idiomatic if the idioms were similar or if they were decomposable,
or figurative interpretation denoting "in trouble"), in the they would be understood by the participants. Although
idiom ’to be on cloud 9’ (with a figurative interpretation these studies are focused on language learners and
speakof "being very happy") there is no likely, logical interpre- ers, and they may include more variables, we can argue
tation in the real world in which a person can be found that, such cross-linguistic similarities, can afect how
on a cloud called "9". Furthermore, when considering the idioms are represented, in multi-lingual contexts.
literal interpretation of an idiom, research can remain In the case of large language models, the way they
at the phrasal level or can consider access to the literal embed and encode idioms and multi-word expressions
meaning of the constituent parts. When again consid- has been an ongoing debate [26, 27, 28, 16, 14]. Most
ering the idiom ’in hot water’, focus on access to the studies focusing on how language models encode idioms
ifgurative interpretation is possible, "in trouble," access examine the task of identifying idiomatic expressions
to the whole interpretation of the literal phrase, "to be in in a text. In early works on this task, researchers
deheated water such as a bath or hot springs," or access to veloped expression-specific models that can capture the
the meanings of the individual constituent words such idiomatic expressions in a text [29], while more recent
as "hot" or "water" is expected. In cross-linguistic stud- approaches have demonstrated that more generic models
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Dataset</title>
      <p>such as BERT and mBERT [30] are also able to capture
idioms [26, 27, 28]. Studies on the internal mechanisms
of how transformer-based language models process id- To investigate our research questions concerning the
imioms demonstrated that BERT, Multilingual BERT, and pact of cross-linguistic similarity on the representation
DistilBERT represent idioms distinctively compared to of idioms in mBERT and how this representation changes
literal language [16]. These studies also observed that across the model’s 12 layers, a list of idiomatic
expresthe semantic meaning of idioms is captured more efec- sions was compiled. the dataset consists of 72 idioms:
tively in deeper layers of the models. They found that 54 from German and 18 from English, the latter
servwords within idioms receive less attention from other ing as a control group. The German idioms are classified
words in the sentence compared to words in literal con- based on their similarity with English idioms, using three
texts. However, [14] argue that LLMs capture MWE se- categories of cross-linguistic correspondence. The first
mantics inconsistently, as shown by reliance on surface category includes idioms with the highest degree of
forpatterns and memorized information. MWE meaning is mal and semantic similarity. These idioms, such as die
also strongly localized, predominantly in early layers of Ruhe vor dem Sturm, have a corresponding form in
Enthe architecture. They also discuss that representations glish when translated word-for-word, e.g., the calm before
benefit from specific linguistic properties, such as lower the storm. In addition to the formal similarity, the
meansemantic idiosyncrasy and ambiguity of target expres- ing of the idiom in the target language is also similar to
sions. that of the originating language, in this case referring</p>
      <p>Moving from LLMs and idioms, there are diferent to a period of calmness before argument or trouble. The
arguments on how models such as BERT work [31], second category focuses on formal similarities without
and in the case of multi-lingual approaches, how multi- semantic correspondence. For instance, jemanden
auslingual they are [17, 18, 32]. Works on the mechanisms of nehmen wie eine Weihnachtsgans (’to gut someone like a
BERT demonstrate that it captures significant linguistic Christmas goose’) refers to financially exploiting
someinformation, with lower layers focusing on local syn- one. In English, there is an idiom that contains the word
tactic relationships and higher layers encoding more "goose" - to cook one’s goose - but it refers to sabotaging
complex linguistic features. The self-attention heads someone’s plans, demonstrating some degrees of formal
in BERT show specialization for certain linguistic func- and lexical similarity without semantic alignment. The
tions, though many exhibit redundant patterns, suggest- third category encompasses idioms with semantic
simiing overparameterization. While BERT demonstrates larities but no formal correspondence. For example, the
some ability to capture world knowledge, its reasoning German idiom Den Löfel abgeben (’to pass the spoon’)
capabilities appear limited. Despite impressive perfor- and the English idiom to kick the bucket both convey the
mance on many NLP tasks, BERT shows limitations in meaning of dying, while sharing no formal similarities.
handling negation, numerical reasoning, and complex After categorizing the idioms, the German idioms were
inference, often relying on shallow heuristics [31]. Inves- literally translated into English. We literally translated
tigations on mBERT across 39 languages found that it the idioms to ensure all expressions can be fed to the
performs well on high-resource languages but struggles model in a single language. This approach allows us to
with low-resource languages. For languages with lim- control for the language space in which idioms are
preited Wikipedia data (which was used to train mBERT), sented, given that in more complex tasks diferent subsets
performance drops significantly, especially for tasks like of mBERT can afect how idioms are represented [ 33].
named entity recognition. This suggests that the qual- Additionally, for each idiom, a brief entity or description
ity of representations learned by mBERT is not uniform is selected reflecting its figurative meanings. For
examacross all 104 languages it supports [32]. Additionally, ple, for "the calm before the storm", "episodic tranquility"
[18] conducted a series of probing experiments to under- is chosen, which refers to the figurative interpretations
stand mBERT’s cross-lingual abilities. They found that of the idiom. Table 1, summarizes the proposed
categomBERT performs surprisingly well on zero-shot cross- rizations, the original and translated idioms, along with
lingual model transfer, even between languages with dif- their figuratively related entities.
ferent scripts. Their analysis suggests that mBERT learns
multilingual representations that go beyond simple vo- 4. Model, and Experiment
cabulary memorization. However, they also note that
transfer works best between typologically similar
languages, indicating some limitations in mBERT’s ability
to generalize across very diferent language structures.</p>
      <sec id="sec-2-1">
        <title>For analyzing the embeddings of the studied idioms and</title>
        <p>their figurative meanings, the dataset was processed
using the "bert-base-multilingual-uncased" model [34]
without any fine-tuning. This model consists of 12 hidden
layers, each containing 768 neurons, and the activity of
each layer was extracted for the CLS token. Embeddings
for the CLS token from each of the 12 layers for every
idiom and its associated meanings were extracted. The
model is pretrained on the 102 languages with the largest
Wikipedias, which includes both German, the language
from which our idioms are derived, and English, which
is the target language for the translation of the idioms
and used for deriving the embeddings. For each sample,
the embeddings of the [CLS] token from all 12 layers of
mBERT are extracted. The [CLS] token was chosen
because it is designed to capture sentence-level semantics
in BERT models [35]. Using the [CLS] token’s embedding
from models can be used as a powerful method for
semantic comparison of texts, which can then be compared
using similarity measures.
vIdiom · vMeaning
Cosine Similarity = cos( ) =
|vIdiom||vMeaning|
(1)</p>
        <p>In Equation 1, v stands for word embedding, which is
a vector with a length of 768. To interpret the result of
the cosine similarity in the context of word embedding,
a score of 1 means the vectors are identical, 0 means the
vectors are orthogonal (no similarity), and -1 means the
vectors are opposed.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results</title>
      <p>After deriving the CLS embeddings from all layers of
mBERT for the translated idioms and their corresponding
ifgurative meanings, the cosine similarities among the
derived embeddings were calculated. Figure 1 illustrates
the cosine similarities across diferent layers of mBERT
for each idiom category. As it can be seen, the first layer
of mBERT showed identical cosine similarities (equal
to 1) for all idioms, representing the entry point of the
model. Therefore, this layer is excluded from subsequent
analyses to avoid skewing our results.</p>
      <p>0 1 2 3 4 5 6 7 8 9 10 11</p>
      <p>Layer</p>
      <sec id="sec-3-1">
        <title>Additionally, as the graph in Figure 1 indicates, the</title>
        <p>4.1. Similarity Calculation cosine similarities exhibited notable variations across
layIn the next step to measure how similar BERT’s under- ers. Layer 3 demonstrated the highest cosine similarities
standing is of each idiom, the similarity of embeddings across all categories; while layer 8 showed the lowest
for each idiom with its figurative meanings was calcu- cosine similarities for all four categories. In addition, as
lated. Cosine similarity is used, a widely used method it can be seen from Layer 5 onwards, we observed a
genbecause of its efectiveness and it is mainly used to de- eral decrease in cosine similarities, suggesting increasing
termine how similar or related two words are based on diferentiation between CLS representation of idioms and
their vector representations [36, 37]. their corresponding figurative meanings in higher layers.</p>
        <p>To test our hypothesis on how the embeddings of
mBERT would change given the proposed cross-linguistic</p>
        <p>To examine the predicted Cosine Similarity of
Figurative CLS representations for each combination of
Category and Layer, the estimated marginal means using
the emmeans package [41] in R computed. In this
analysis, the changes in the cosine similarities were compared
among the categories, in diferent layers. The results of
the pair-wise comparisons indicate that, For Layers 1-7,
there are no significant diferences between categories
(all p-values &gt; 0.05), this can be also seen in figure 3, in
which almost until the 7th layer all of the lines align with
each other. However, from layer 8 a significant diference
can be seen between the control category and category
SM that represents the idioms with cross-linguistically
similar semantics (estimate = 0.0272, p = 0.0179). In
addi</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Discussion and Conclusion</title>
      <p>Our study investigated how cross-linguistic similarities
among idioms afect their representation in mBERT, with
a focus on English and German idioms categorized based
on three degrees of cross-linguistic similarity. This study
aims to answer two main research questions concerning
whether cross-linguistic similarity has a significant
impact on the representation of idioms in mBERT, and how
the degree of cross-linguistic similarity and the
representation of the model change across the 12 layers of mBERT.</p>
      <p>Our findings provide insights into these questions and
our initial hypotheses. Contrary to our initial hypothesis,
we found that cross-linguistic similarity does not have
a uniformly significant impact on the representations of
idioms across all layers of mBERT. The main efects of
our translated idioms categorized into cross-linguistic
categories (SI: formal and semantic similarity, SL: similar
lexicon, SM: Similar Meaning), were not statistically
significant when compared to the control category (English
idioms) in the early and middle layers of the model. This
result may suggest that mBERT might be utilizing
knowledge from all languages in its training data as a collective
pool, at least in the case of the studied idioms. This aligns
with the idea that mBERT learns multilingual
representations that go beyond simple vocabulary memorization,
as suggested by Pires et al. [18]. However, the emergence
of significant diferences in higher layers (particularly
from Layer 8 onwards) might indicate that mBERT’s
processing of idioms becomes more nuanced as information
6.1. Limitations and future research
This research also has limitations, that can be tackled in
the further and future studies. One of the primary
limitations of our study is the size of the dataset. However, the
dataset has a good variety of samples but a bigger dataset
may improve the generalizability and robustness of our
ifndings. Future research should aim to include a more
extensive dataset to confirm and extend these findings.</p>
      <p>Moreover, literally translating the idioms and the
figuratively related entities, can afect on the representations
of the model, and the derived cosine similarities;
therefore, in further studies, it can be insightful to compare
also, how the representations of the model change if the
[14] F. Miletić, S. S. i. Walde, Semantics of multiword org/P16-1019. doi:10.18653/v1/P16-1019.
expressions in transformer-based models: A survey, [28] V. Nedumpozhimana, J. Kelleher, Finding bert’s
Transactions of the Association for Computational idiomatic key (2021).</p>
      <p>Linguistics 12 (2024) 593–612. [29] A. Fazly, P. Cook, S. Stevenson, Unsupervised
[15] M. TAN, J. JIANG, Does bert understand idioms? Type and Token Identification of Idiomatic
Expresa probing-based empirical study of bert encodings sions, Computational Linguistics 35 (2009) 61–103.
of idioms.(2021), in: Proceedings of the Interna- URL: https://doi.org/10.1162/coli.08-010-R1-07-048.
tional Conference on Recent Advances in Natural doi:10.1162/coli.08-010-R1-07-048.
Language Processing (RANLP 2021), Virtual Con- [30] J. Devlin, M. Chang, K. Lee, K. Toutanova,
ference, September, ????, pp. 1–3. BERT: pre-training of deep bidirectional
trans[16] Y. Tian, I. James, H. Son, How are idioms processed formers for language understanding, CoRR
inside transformer language models?, in: Proceed- abs/1810.04805 (2018). URL: http://arxiv.org/abs/
ings of the 12th Joint Conference on Lexical and 1810.04805. arXiv:1810.04805.
Computational Semantics (* SEM 2023), 2023, pp. [31] A. Rogers, O. Kovaleva, A. Rumshisky, A primer
174–179. in bertology: What we know about how bert
[17] H. Gonen, S. Ravfogel, Y. Elazar, Y. Goldberg, It’s works, 2020. URL: https://arxiv.org/abs/2002.12327.
not greek to mbert: inducing word-level trans- arXiv:2002.12327.
lations from multilingual bert, arXiv preprint [32] S. Wu, M. Dredze, Are all languages created equal in
arXiv:2010.08275 (2020). multilingual bert?, arXiv preprint arXiv:2005.09093
[18] T. Pires, E. Schlinger, D. Garrette, How mul- (2020).</p>
      <p>tilingual is multilingual bert?, arXiv preprint [33] J. Libovicky`, R. Rosa, A. Fraser, How
languagearXiv:1906.01502 (2019). neutral is multilingual bert?, arXiv preprint
[19] C. Cacciari, P. Tabossi, The comprehension of id- arXiv:1911.03310 (2019).</p>
      <p>ioms, Journal of memory and language 27 (1988) [34] J. Devlin, M. Chang, K. Lee, K. Toutanova,
668–683. BERT: pre-training of deep bidirectional
trans[20] S. Irujo, Don’t put your leg in your mouth: Transfer formers for language understanding, CoRR
in the acquisition of idioms in a second language, abs/1810.04805 (2018). URL: http://arxiv.org/abs/
tesol Quarterly 20 (1986) 287–304. 1810.04805. arXiv:1810.04805.
[21] J. I. Liontas, Killing two birds with one stone: Un- [35] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
derstanding spanish vp idioms in and out of context, Bert: Pre-training of deep bidirectional
transformHispania (2003) 289–301. ers for language understanding, arXiv preprint
[22] D. Titone, G. Columbus, V. Whitford, J. Mercier, arXiv:1810.04805 (2018).</p>
      <p>M. Libben, Contrasting bilingual and monolingual [36] N. Reimers, I. Gurevych, Sentence-bert: Sentence
idiom processing. (2015). embeddings using siamese bert-networks, arXiv
[23] H. Bortfeld, Comprehending idioms cross- preprint arXiv:1908.10084 (2019).
linguistically., Experimental psychology 50 (2003) [37] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger,
217. Y. Artzi, Bertscore: Evaluating text generation with
[24] M. S. Senaldi, D. A. Titone, Less direct, more analyt- bert, arXiv preprint arXiv:1904.09675 (2019).
ical: Eye-movement measures of l2 idiom reading, [38] D. Bates, M. Mächler, B. Bolker, S. Walker, Fitting
Languages 7 (2022) 91. linear mixed-efects models using lme4, Journal of
[25] M. S. Senaldi, J. Wei, J. W. Gullifer, D. Titone, Statistical Software 67 (2015) 1–48. doi:10.18637/
Scratching your tête over language-switched id- jss.v067.i01.
ioms: Evidence from eye-movement measures of [39] R Core Team, R: A Language and Environment for
reading, Memory &amp; cognition 50 (2022) 1230–1256. Statistical Computing, R Foundation for Statistical
[26] V. Nedumpozhimana, F. Klubička, J. D. Kelleher, Computing, Vienna, Austria, 2023. URL: https://
Shapley idioms: Analysing bert sentence embed- www.R-project.org/.
dings for general idiom token identification, Fron- [40] D. J. Schad, S. Vasishth, S. Hohenstein, R. Kliegl,
tiers in Artificial Intelligence 5 (2022) 813967. How to capitalize on a priori contrasts in linear
[27] G. Salton, R. Ross, J. Kelleher, Idiom token classi- (mixed) models: A tutorial, Journal of memory and
ifcation using sentential distributed semantics, in: language 110 (2020) 104038.</p>
      <p>K. Erk, N. A. Smith (Eds.), Proceedings of the 54th [41] F. M. S. S. R. Searle, G. A. Milliken, Population
Annual Meeting of the Association for Computa- marginal means in the linear model: An alternative
tional Linguistics (Volume 1: Long Papers), Asso- to least squares means, The American Statistician
ciation for Computational Linguistics, Berlin, Ger- 34 (1980) 216–221. URL: https://www.tandfonline.
many, 2016, pp. 194–204. URL: https://aclanthology. com/doi/abs/10.1080/00031305.1980.10483031.
Groups
idiom
1.00
-0.00
-0.00
-0.00
0.00
0.00
0.00
-0.04
-0.07
-0.10
-0.15
-0.13
-0.12
-0.10
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
-0.00
0.01
0.00
-0.00
0.01
-0.00
-0.01
0.01
-0.01
-0.01
0.00
-0.03
-0.01
0.01
-0.02
-0.01
0.01
-0.02
-0.01
0.00
-0.02</p>
    </sec>
    <sec id="sec-5">
      <title>A. Appendix A. LMER Model full summary</title>
      <p>Fixed Efects</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pustejovsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>O. Batiukova,</surname>
          </string-name>
          <article-title>The lexicon</article-title>
          , Cambridge University Press,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Libben</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Titone</surname>
          </string-name>
          ,
          <article-title>The multidetermined nature of idiom processing</article-title>
          ,
          <source>Memory &amp; cognition 36</source>
          (
          <year>2008</year>
          )
          <fpage>1103</fpage>
          -
          <lpage>1121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Abel</surname>
          </string-name>
          ,
          <article-title>English idioms in the first language and second language lexicon: A dual representation approach</article-title>
          ,
          <source>Second language research 19</source>
          (
          <year>2003</year>
          )
          <fpage>329</fpage>
          -
          <lpage>358</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. W. Gibbs</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Nayak</surname>
          </string-name>
          ,
          <article-title>Psycholinguistic studies on the syntactic behavior of idioms</article-title>
          ,
          <source>Cognitive psychology 21</source>
          (
          <year>1989</year>
          )
          <fpage>100</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kay</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. C.</surname>
          </string-name>
          <article-title>O'connor, Regularity and idiomaticity in grammatical constructions: The case of let alone</article-title>
          ,
          <source>Language</source>
          (
          <year>1988</year>
          )
          <fpage>501</fpage>
          -
          <lpage>538</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Christiansen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Arnon</surname>
          </string-name>
          ,
          <article-title>More than words: The role of multiword sequences in language learning</article-title>
          and use,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Jackendof</surname>
          </string-name>
          ,
          <article-title>Précis of foundations of language: Brain, meaning</article-title>
          , grammar,
          <source>evolution„ Behavioral and Brain Sciences</source>
          <volume>26</volume>
          (
          <year>2003</year>
          )
          <fpage>651</fpage>
          -
          <lpage>665</lpage>
          . doi:
          <volume>10</volume>
          . 1017/S0140525X03000153.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sinclair</surname>
          </string-name>
          , Corpus, Concordance, Collocation, Describing English language, Oxford University Press,
          <year>1991</year>
          . URL: https://books.google.de/books? id=
          <fpage>L8l4AAAAIAAJ</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Siyanova-Chanturia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Conklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <article-title>Adding more fuel to the fire: An eye-tracking study of idiom processing by native and non-native speakers</article-title>
          ,
          <source>Second Language Research</source>
          <volume>27</volume>
          (
          <year>2011</year>
          )
          <fpage>251</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wulf</surname>
          </string-name>
          , Rethinking idiomaticity, Rethinking
          <string-name>
            <surname>Idiomaticity</surname>
          </string-name>
          (
          <year>2008</year>
          )
          <fpage>1</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Siyanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          ,
          <article-title>Native and nonnative use of multi-word vs. one-word verbs</article-title>
          ,
          <source>International Review of Applied Linguistics in Language Teaching</source>
          <volume>45</volume>
          (
          <year>2007</year>
          )
          <fpage>119</fpage>
          -
          <lpage>139</lpage>
          . URL: https://doi.org/ 10.1515/IRAL.
          <year>2007</year>
          .
          <volume>005</volume>
          . doi:doi:10.1515/IRAL.
          <year>2007</year>
          .
          <volume>005</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <article-title>Processing of idioms by l2 learners of english</article-title>
          ,
          <source>TESOL quarterly 33</source>
          (
          <year>1999</year>
          )
          <fpage>233</fpage>
          -
          <lpage>262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Beck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <article-title>Bilingual and monolingual idiom processing is cut from the same cloth: The role of the l1 in literal and figurative meaning activation</article-title>
          ,
          <source>Frontiers in psychology 7</source>
          (
          <year>2016</year>
          )
          <fpage>1350</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>