1. Introduction

Is It Still a Village? Tracing Grammaticalization with Word Embeddings

Joseph Larson

Patrícia Amaral

0 0 Department of Spanish and Portuguese, Indiana University , Bloomington Indiana , USA

2025

Computational studies of language change tend to focus on predicting lexical semantic change that reflects cultural and societal changes. In this paper we focus instead on the syntactic and semantic shift from lexical to grammatical (grammaticalization), and we choose an understudied variety of Spanish. This paper investigates the grammaticalization of the noun caleta 'cove, village' to a degree expression (an intensifier) meaning 'a lot', as part of the system of degree words in Chilean Spanish. We use word embeddings trained on a corpus of tweets to show the ongoing syntactic and semantic change of caleta. Our distributional analysis also reveals how high degree is expressed in this variety of Spanish, showing the potential of these methods to explore lesser-known linguistic subsystems. Our study unveils degree expressions not previously studied in contemporary colloquial Chilean Spanish and also provides further evidence for an existing typology of degree modifiers across languages.

eol>grammaticalization degree quantifiers historical linguistics Chilean Spanish word embeddings

1. Introduction

shifts. We investigate the grammaticalization of caleta in Chilean Spanish, from a noun denoting ‘cove, hiding Studies of language change using distributional methods place (where merchandise can be stored)’, ‘village’, as in have shown the potential of word embeddings to trace ex. ( 1 ), to a quantifier and degree adverb ‘much, a lot’, as syntactic and semantic change over time [1, 2, a.o.]. How- in ( 2 ), where caleta modifies the verb and denotes high ever, such research tends to focus on predicting changes degree. that afect sets of lexical items shifting from one semantic domain to another, which typically reflects cultural ( 1 ) Esta experiencia la realizamos en and societal changes. Fewer studies have explored both this experience cl.fem.sg.acc do.pst.1pl in semantic and morphosyntactic change (but see Fonteyn Zapallar, en la caleta de pescadores et al. 3). In this paper, we focus on the semantic and Zapallar in the caleta of fishermen syntactic shift from lexical to grammatical, known as “We did this experience in Zapallar, in the fishergrammaticalization [4, 5], and the stages of this process. men’s cove” Specifically, we study the creation of degree expressions. ( 2 ) me gustó caleta

Traditionally, degree expressions have been associated cl.1sg.dat like.pst.3sg caleta with adjectives, considered the prototypical gradable cat- “I liked it a lot.” egory. However, degree modification is also compatible with nouns and verbs, which shows that gradability cuts We use word embeddings to examine to what extent across syntactic categories [6, 7, 8]. As a word becomes a the grammatialization of caleta has developed while also degree expression over time, it typically expands its dis- shedding light on the system of degree modifiers in tribution along diferent categories: e.g. it first combines Chilean Spanish. We ask, (i) how far along has caleta with nouns before co-occurring with verbs and adjectives. grammaticalized in Chilean Spanish, and (ii) what types Hence, the grammaticalization of degree expressions pro- of evidence do word embeddings provide of diferent vides insight into the semantics of degree and patterns stages of grammaticalization of degree words? in the distribution of degree words [9, 10]. This paper examines an understudied variety, Chilean Spanish, and uses word embeddings to investigate the emerging sys- 2. Previous Work tem of degree words to which one grammaticalized word Linguists have provided analyses of the gradual process CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- by which lexical items acquire grammatical functions: tics, September 24 — 26, 2025, Cagliari, Italy. for example, in this diachronic change, nouns lose their $ joelarso@iu.edu (J. Larson); pamaral@iu.edu (P. Amaral) categorial properties like occurring after a determiner or httphst:t/p/ssi:/t/egsi.gthouobg.lceo.cmom/jo/seiltaer/spoa/tr(iJc.iLamarasotons)a;maral/home (P. Amaral) being pluralized. The grammaticalization of nouns into 0000-0001-6651-0319 (P. Amaral) degree adverbs (e.g. the development from lot ‘a set of ob© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License jects’ to a lot ‘much’) is well attested cross-linguistically: Attribution 4.0 International (CC BY 4.0). other examples are French adverb beaucoup from un beau Category Word Class coup ‘a good strike’ and English a bit from ‘a bite, a portion that fits in the mouth’ [11, 12, 13, 14, 15]. Type A

This research has shown that a typical structure in I gradable adjectives Type B which nouns occur - modification by a prepositional veryE ergD Type C phrase, as in a lot [ of chairs], a mountain [ of books] - provides a starting point for quantity and degree IIa gradable nominal predicates očen’R tropF interpretations. This structure undergoes subsequent IIb gradable verbs Type D syntactic reanalysis, where the head noun (e.g. lot) loses muitoP nominal properties and a lot of becomes an adverb mod- eventive verbs beaucoupF ifying the second noun. The development of so-called moltoI binominal structures Det 1 of 2, which may or may III eventive adjectives a lotE Type E cnroutcfiuarltrhoelre einvotlhvee gtoraamfmulalyticaadlvizeartbiioanl coaftedgeogrrye,epwlaoyrsdsa. comparatives veelD In our study, we also include the structure (Det) caleta of IV mass nouns mnogoR Type F N, hence we investigate the distribution of caleta de.

As argued by 8, degree words across languages show a mountainE Type G a systematic behavior in terms of classes of words they V plural nouns can modify. These well-attested patterns correspond to manyE types along a continuum of word classes defined by their syntactic-semantic properties. For example, since French dFiisgturirbeut1io:nTyaploonloggaycoofndtienguruemeeoxfpwreosrsdiocnlassasecsc.oTrdabinlegatdoatphteeidr trop ‘too much’ can modify all word classes, within this with modifications from [8, 138]. Superscripts indicate lantypology it is considered to be a Type C modifier. On guage: R for Russian, D for Dutch, F for French, E for English, the other hand, English very can only modify gradable P for Portuguese, and I for Italian. adjectives (“very kind” is possible, while expressions like “*I traveled very” or “*very water” are not grammatical), therefore very is classified as a Type A modifier. For a 3. Methodology complete summary of the continuum of word classes and typology, see Figure 1. As words develop into one type, they are predicted to modify words in the order along the 3.1. Corpus Creation continuum; for instance, if a word co-occurs with words To ensure we had a good representation of colloquial of category V, it is expected to co-occur with words of cat- Chilean Spanish, we created a subcorpus from an already egory IV before it appears with words of category III. 1 As existing corpus of online data [19]. The already existwe investigate whether caleta has grammaticalized into ing corpus contained roughly 19GB of data, from diverse a degree word, we will examine its stage of development sources, including news, tweets, online reviews and other with respect to Doetjes’ continuum. miscellaneous web content. We chose to create a subcor

While some computational studies of grammaticaliza- pus just from tweets to reduce the computational load for tion have adopted case-driven approaches similar to ours our later experiments and since we only wanted informal [16, 17, 18], we also investigate how a distributional anal- instances of language; caleta typically only occurs in less ysis of caleta can provide insight on the set of degree formal registers. The resulting subcorpus of 27, 306, 582 expressions currently used in colloquial Chilean Spanish. tweets consisted of exactly 342, 979, 307 tokens. The In other words, we aim to examine not just the gram- time span of these tweets is from 2010 to 2020. maticalization of caleta but also how this word fits in the system of degree words in Chilean Spanish and in types 3.2. Preprocessing of degree expressions across languages.

We first normalized the text in the corpus: we removed case, punctuation, diacritics, URLs, hashtags, and any repeated letters. For this last step, we only allowed double letters where they occur within normative Spanish 1Doetjes diferentiates between ‘gradable’ and ‘eventive’ adjectives orthography (i.e. < >, < >, < >), elsewhere only and verbs by whether or not the modifier is targeting the degree or single letters were allowed. Then we input the corpus is quantifying over events. The example she gives is from Dutch: into a plain text file separated by newlines. The resulting Jan is veel ziek ‘Jan is sick a lot’ vs. Jan is erg ziek ‘Jan is very sick.’ ifle was then lemmatized using SpaCy’s Spanish lemmaIn the former, veel as a quantifier targets eventive adjectives, thus it can only modify the quantity of sick events. In the latter, erg expresses the degree of sickness, i.e. the severity of his illness. tizer [20].2

Text Normalization Text File Preparation

Lemmatization

Normalization Substeps

Case Punctuation Diacritics URLs

Hashtags Repeated Letters for details). For both models, the analogy tests returned the expected word, except for the last pair with = 1: where perra ‘dog (female)’ was expected, the most similar word embedding was for quiltra ‘mutt (female)’.

Relationship Age-based Familial Feline Canine

Word Pair 1 Word Pair 2 Word A Word B Word A Word B ‘HMoamnb’re ‘MWuojemran’ ‘NBioñyo’ ‘NGiñiral’ 1.0 ‘PFaadtrheer’ ‘MMaodtrheer’ ‘HSiojon’ ‘HDijaaughter’ 1.0 Niño Gato Niña Gata ‘Boy’ ‘Cat (male)’ ‘Girl’ ‘Cat (female)’ 1.0 Niño Perro Niña Perra ‘Boy’ ‘Dog (male)’ ‘Girl’ ‘Dog (female)’ 0.5

Accuracy

3.4. Window Size 3.3. Model Selection

As mentioned in the previous section, the only hyperparameter we adjusted for the model was the window size. We extracted models for = [1, 10].3 Although other authors have shown that small window sizes often produce noisy and unstable embeddings [23], for this project we expected small window sizes to be appropriate.

Our hypothesis was that in our case, lower window sizes would capture the grammaticalized meaning of caleta, since the scope of grammatical words like quantifiers lies within its immediate neighbors, whereas higher window sizes show neighbors within the same semantic field (therefore its lexical use). However, since we use a corpus of tweets, window size is fairly limited by the genre itself (a possible limitation we address later).

To represent the distributional patterns of words in our corpus, we decided to use static word embeddings over contextualized word embeddings. Non-contextualized embeddings allow us to compare our target word with other words in Chilean Spanish to examine the current stage of grammaticalization of caleta as determined by its closeness to diferent subsystems in the language.

The algorithm we use is Skip-Gram with Negative Sampling (SGNS) implemented in word2vec [21] to extract embeddings, based on previous research that showed 4. Results good results for studies of semantic change [22, a.o.]. For this reason, we do not consider it necessary to use a more 4.1. Caleta computationally expensive operation (e.g. dynamic word Here we display only the results of the experiments with embeddings). We trained each model for five epochs, a a small ( = 1) and a large ( = 10) window size.4This minimum token count of 10 and the skip-gram algorithm. allows us to compare the information obtained by manipInitially, we experimented with several hyperparameters: ulating this parameter. In Figure 3, the word embeddings the window size, the minimal word count and the vector show both neighbors of the lexical noun and neighbors size. The only hyperparameter that proved to be significant was the window size (see next section for more details). The resulting model used a vector length of 100 and a minimal word count of 10. To verify the validity of the model, we used analogy tests targetting genderbased morphological and semantic relations (see Table 1 for specifics). We performed the tests on both models we used for the embeddings (see following section 3As a reviewer suggested, we experimented with other window sizes e.g. = 2. While we do not show the results for this window size, we note that there was not a signficiant diference for this window size and = 1 for caleta de, but there was for caleta. For = 2, caleta had almost no neighbors that were quantifiers. The other neighbors were ene, caleta de and then mostly toponyms, similar to the t-SNE’s we show here for both strings with = 10. This demonstrates that instances of just caleta within our corpus are more lexical uses, whereas caleta de demonstrates more grammaticalized uses. 4To generate the t-SNE graphs for both caleta and caleta de, we used the PCA (Principal Component Analysis) method since our data points were dense vectors, and we used a perplexity of 10. 2As an anonymous reviewer noted, our preprocessing might have worked better if we had normalized the text and lemmatized in one step. This is something we will consider for future experiments. of the degree word. Nearest neighbors of the noun are toponyms (i.e. names of villages) and other nouns with related meanings (e.g. playa ‘beach’ and muelle ‘wharf’).

As for the neighbors of the degree word, we find degree expressions, both adverbs and quantifiers like mucho and ene, both meaning ‘a lot’. Caleta de also appears among the neighbors (please see subsequent section for these results).

The co-occurrence of neighbors of both meanings shows that caleta has partially grammaticalized; it still retains its lexical use as a noun. These findings provide evidence for a situation of layering [24], i.e. the synchronic co-existence of older and more recent functions of a form in a language.

If we now use a larger window size, the results are diferent, with more neighbors associated with the lexical item. In Figure 4 we find the plural noun ( caletas); as mentioned in historical analyses, the ability to be pluralized is a syntactic property of nouns. This attests to the persistence of some nominal categorial properties of caleta. We also find the noun pescadores ‘fishermen’, as the noun caleta typically refers to a village of fishermen and hence the nouns often co-occur (in caleta de pescadores), and related nouns like muelle ‘pier’ and poza ‘puddle’. 4.2. Caleta de We analyzed the results of caleta de separately from those of caleta since the former is the vestige of a binominal quantifier preceding the grammaticalization of the latter.

Figure 5 and Figure 6 show the TSNE representations of the nearest neighbors of caleta de. For the smaller window size, we see other quantifiers like ene (more in the next section), caleta, etc. The majority of neighbors here are quantifiers in their orthographical variants found in tweets (e.g. mucho, mxo, nucho, etc). Two other words that form part of binominal quantifiers are also present, monton and montones, both meaning ‘pile’ and ‘piles’, but which have grammaticalized in the same fashion as caleta to denote a large quantity (un montón de N ‘a lot of N’). In this window size, only one proper noun is present, Chorromil, the name of a village. Lastly, we find other quantifiers, like cualquiers and cualesquiers, both orthographical variations of cualquier, ‘whichever’, and puras, a determiner in Chilean Spanish.

In the larger window size, we see caleta as its nearest neighbor. Other quantifiers like mucho, ene, harto, etc. are present, but they are much further away than semantically related nouns like pescadores ‘fishermen’, artesanales ‘craftsmen’, reinetas, a plural noun denoting a variety of white fish, as well as toponyms that are names of caletas. These results show once more how important the hyperparameter of window size is in capturing distributional properties of relatively newly grammaticalized words in a language.

In the following, we provide further analysis of the nearest neighbors of caleta and caleta de. 4.3. Ene We decided to display the top 10 neighbors for the word ene, since ene always appeared as a top neighbor for caleta and caleta de. Ene comes from the Spanish pronunciation such example could be found in our corpus. Example ( 4 ) shows the degree adverb (here, modifying a verb), i.e. the grammaticalized item. Lastly, example ( 5 ) shows ene in combination with ctm, a commonly used abbreviation of the phrase concha (de) tu madre (literally ‘your mother’s pussy’), which is used as a vulgar intensifier similar to fucking in English. ( 3 ) ( 4 ) ( 5 )

El fenómeno se repite ene

The phenomenon cl.refl repeat.prs.3sg n veces. times “The phenomenon is repeated n times.” me gustó ene cl.1sg.dat like.pst.3sg ene “I liked it a lot.” me gustó ene ctm cl.1sg.dat like.pst.3sg ene ctm “I fucking liked it a lot.” Table 2 and 3 show the closest neighbors for ene in our corpus. For both window sizes, none of the neighbors are semantically related to Mathematics, which would be expected if ene still retained some of its original lexical meaning. For the smaller window size, all of the neighbors are degree words meaning ‘much’ (including the noun cantidad which can appear in a binominal structure cantidad de N ‘a large quantity of N’). For the larger window size, half of the neighbors are quantifiers. We also see the expressive puxis (an orthographical variation of pucha, meaning ‘darn’), spellings of laughter and the vulgar term autodelicioso. This is evidence for what has been previously described in the literature that degree modifiers, as highly volatile units of language, are subject to rapid change and become expressives [26]. of the grapheme < > and is used in Mathematics to denote an unspecified integer. Over time, in this variety of Spanish ene has grammaticalized like caleta to denote a large quantity and high degree. Our results show that ene is another example of a grammaticalized degree word, al- 4.4. Other Quantifiers beit in a diferent stage of grammaticalization. To the best of our knowledge, this has not been observed or studied.

Example ( 3 ) shows a lexical use of ene, taken from the Dictionary of the Spanish Real Academy [25], since no Lastly, we show word embeddings of other degree words, in this case ‘stable’ quantifiers in Chilean Spanish: harto ‘a lot’, mucho ‘a lot’, tanto ‘so many.’ It is worth mentioning that unlike caleta, caleta de and ene (which syntactically can be considered degree adverbs), these quantifiers inflect for gender and number when modifying a noun.

The purpose of using the lemmatizer was to control for this, but as the results show, some inflected tokens of these quantifiers were not properly lemmatized.

Tables 4, 5, 6, 7, 8 and 9 show the nearest neighbors for harto, mucho and tanto at the two window sizes. For harto, we see that the majority of its neighbors are other quantifiers for both window sizes, as well as orthographical variations (e.g. harrto, arto) and inflected versions of the lexeme, like the feminine form harta. Likewise, tanto as its neighbors for the smaller window size shows mostly orthographical variations (e.g. tsnto, tabto), while for the larger window size we can see similar results to ene, where nouns like ‘laughter’ are amongst the neighbors.

For mucho, we can see mostly orthographical variants for the smaller window size (e.g. muxo, muxho) and for the larger window size we see less orthographical variations and more of other quantifiers, even its antonym poco, which also occurs with intensifying afixes: re-poco and poc-azo ‘very little’.

Rank

5. Discussion

Our word embedding results for caleta show that nowadays the word is used to express high degree. In addition,

Word (Gloss) muchisimo ‘mucho’ (superlative)

harto ‘a lot’ tanto ‘so much’ poco ‘a little’ muchoy (mucho y as one word, ‘a lot and’ muccho‘mucho’ (orthographical variation)

bastante ‘quite’ muchopero (mucho pero as one word, ‘a lot but’)

aunpero (aún pero as one word, ‘still but’) muchisisisismo ‘mucho’ (repeated superlative) ‘hunger’, pena, ‘sorrow’, as in ( 10 ). ( 6 ) ( 7 ) ( 8 ) ( 9 )

Hace caleta de años

make.prs.3sg caleta of years “Many years ago” es caleta de plata be.prs.3sg caleta of money “it’s a lot of money.” Yo igual reí caleta. 1sg.nom same laugh.pst.1sg caleta “I laughed a lot, anyway.” hay que cuidarse caleta mejor... be.exist.prs.3sg that care.inf.ref caleta better in our results both the lexical noun and the degree modiifer are present. The choice of hyperparameters, specifically window size, has important consequences: a small “one has to take care of themselves much better.” window size yields nearest neighbors for both forms, ( 10 ) Hace caleta de frío. while a larger window size results in more neighbors of make.prs.3sg caleta of coldness the lexical noun. We hypothesize that this is due to the “It’s really cold.” fact that as a degree word, caleta is a modifier, and occurs in close adjacency to the modified word. Hence, a small There were no cases of caleta modifying either evenwindow captures this distribution. On the other hand, tive adjectives or gradable adjectives within our corpus. as a lexical noun caleta is less syntactically constrained, This, according to Doetjes’s classification, indicates that with more positional freedom and semantic content. caleta has evolved into a type D degree modifier. Figure

While cosine similarity scores give us insight into a 7 shows caleta’s position in this typology, in comparison changing word’s distribution, they alone do not tell us to the other degree expressions in Chilean Spanish that about its syntactic properties in detail. To better under- we have discussed in this paper. Our results align with stand caleta’s current status as a degree modifier, we claims in the literature that Type C and D are the most performed a post-hoc analysis of the top 20 collocates of common in the Romance languages [8]. Lastly, within caleta and caleta de. We looked specifically at the top our results, caleta has no nearest neighbors with Type A tokens that immediately precede and proceed the two modifiers (e.g. muy ‘very’), which combine exclusively strings in our unlemmatized corpus. We were interested with gradable adjectives. This is not surprising since in the kinds of words that caleta and caleta de have come Type A modifiers have no overlap in word classes with to modify, in accordance to Doetjes’s typology of degree Type D modifiers; their distributions are disjoint. This modifiers (see Section 2). highlights how embeddings capture syntactic properties

Our analysis shows that caleta has evolved extensively of words, as opposed to just similarity of meaning. beyond its original lexical usage, wherein it was only com- Our study has two main findings, which answer the repatible with count nouns that were semantically related search questions above. First, we have shown that caleta e.g. pescadores ‘fishermen’ camarones ‘shrimp (plural)’, is undergoing grammaticalization: both the older and the headed by the preposition de. The structure caleta de is new meaning are captured by the word embeddings. Imnow compatible with count nouns beyond the semantic portantly, we see a diference in the results depending on domain of a fishing village: años ‘years’, veces ‘times/in- the window size, when compared to other degree words stances’ (see ( 6 )), as well as mass nouns e.g. plata ‘money which are grammatical items and not undergoing change, (informal), tiempo ‘time’ (see ( 7 )). It can also modify like mucho and harto. In the latter case, window size does comparatives e.g. mejor ‘better’, peor ‘worse’ (see ( 9 )); not significantly impact the neighbors. Additionally, our eventive verbs e.g. dormir ‘to sleep’, reír ‘to laugh’ (see post-hoc analysis provided insight on the properties of ( 8 )); gradable verbs gustar ‘to like’, querer ‘to want’ (see caleta as a degree word. ( 2 ); and finally gradable nominal predicates 5 e.g. hambre Second, our word embeddings have allowed us to reveal the inventory of degree words in colloquial Chilean Spanish, including a word that to date had never been investigated, ene. These words denote high degree (intensifiers), words that are known to change rapidly due to social and expressive pressure [26]. Since caleta and ene are not normative forms, they are left out of tradi5Gradable nominal predicates, in Doetjes’s definition, are nouns which are generally the objects of light verb expressions. The examples she gives are from French e.g. Elle a très soif ‘She is very thirsty.’ In Spanish, such light verb constructions also exist, so we consider cases like tener sed ‘to be thirsty (lit. to have thirst)’ to also be examples of nominal predicates. I IIa IIb III IV V gradable nominal predicates Type D Type B Type C gradable adjectives

Type A gradable verbs eventive verbs caleta ene eventive adjectives

mucho comparatives mass nouns plural nouns tanto Type E harto bastante demasiado

Type F

un montón cantidad Type G montones vario tional studies. This entails that we may miss instances of change possibly of interest to current linguistic theory.

Hence, word embeddings can be a tool to study lesserknown subsystems of a language and capture ongoing changes in synchrony.

6. Conclusion

Our study contributes to studies of language change by analyzing intensifiers in colloquial Chilean Spanish (an understudied variety) from the past twenty years. We do not yet have data from multiple temporal slices to demonstrate direct evidence of changes in grammatical behavior. For this reason, we infer grammaticalization from synchronic distributional patterns. Nevertheless, we reveal an ongoing change that had not been previously studied. Using spontaneous speech from tweets, we gained access to informal speech where speakers communicate in an unedited way, which has allowed us to study the use of older and more recent degree expressions. In the future, we plan on expanding the time span of the data, depending on the availability of more text reflecting spontaneous speech in this variety of Spanish.

We have shown that static word embeddings provide evidence for this change and can reveal meaning relations not previously studied. Moreover, we show that diferent choices of hyperparameters have an efect on which meaning of the word undergoing change (the lexical vs. the grammatical) is represented. Nevertheless, comparing our results with dynamic embeddings in the future could prove interesting.

Some limitations of our study are due to the genre itself. One such limitation is the dificulty with lemmatization: as we have mentioned, these are tweets, so we find strings that do not conform to normative orthography (for example, typos, abbreviations etc), therefore the lemmatizer has dificulty with detecting words of the same lexeme. In addition, Twitter users tend to adopt orthographical forms that reflect pronunciation and sometimes are intended to be expressive, like repeating vowels in a word to express a very high degree. Furthermore, using a corpus of tweets means that the character limit has an impact on the possible window sizes. To obviate this problem, further studies on caleta could use longer texts that have the same register as tweets, e.g. blog posts.

Lastly, the only hyperparmeter we significantly experimented with were the window size and the minimal word count. More hyperparameter fine tuning (e.g. adjustment of negative sampling and vector size) could potentially yield more robust results.

Acknowledgments

This research was supported in part by Lilly Endowment, Inc., through its support for the Indiana University Pervasive Technology Institute.

Declaration on Generative AI During the preparation of this work, the author(s) did not use any generative AI tools or services.

[1]

W. L.

Hamilton ,

Leskovec ,

Jurafsky , Cultural shift or linguistic drift? comparing two computational measures of semantic change , in: J. Su , K. Duh , X. Carreras (Eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing , Association for Computational Linguistics, Austin, Texas, 2016 , pp. 2116 - 2121 . URL: https://aclanthology.org/D16-1229/. doi: 10 . 18653/v1/ D16 -1229.

[2]

Kutuzov ,

Øvrelid ,

Szymanski , E. Velldal, Diachronic word embeddings and semantic shifts: a survey , in: E. M. Bender , L. Derczynski , P. Isabelle (Eds.), Proceedings of the 27th International Conference on Computational Linguistics , Association for Computational Linguistics, Santa Fe, New Mexico, USA, 2018 , pp. 1384 - 1397 . URL: https: //aclanthology.org/C18-1117/.

[3]

Fonteyn ,

Manjavacas ,

Budts , Exploring morphosyntactic varation and change with distributional semantic models , Journal of Historical Syntax 6 ( 2022 ) 1 - 41 .

[4]

Meillet , L' évolution des formes grammaticales , Scientia 12 ( 1912 ) 130 - 148 .

[5]

P. J.

Hopper ,

E. C.

Traugott , Grammaticalization, Cambridge Textbooks in Linguistics, 2 ed., Cambridge University Press, 2003 .

[6]

Bolinger , Degree Words, De Gruyter [18]

Nagata ,

Kawasaki ,

Otani ,

Takamura , Mouton, Berlin, Boston, 1972 . URL: A Computational Approach to Quantifying Gramhttps://doi.org/10.1515/9783110877786. maticization of English Deverbal Prepositions, in: doi:doi:10.1515/9783110877786. N. Calzolari , M.- Y.

Kan , V.

Hoste , A.

Lenci , S. Sakti,

[7]

Neeleman , H. Van de Koot, J. Doetjes, Degree N. Xue (Eds.), Proceedings of the 2024 Joint Inexpressions, The Linguistic Review 21 ( 2004 ) 1 - 66 . ternational Conference on Computational Linguisdoi:doi:10.1515/tlir. 2004 . 001. tics, Language Resources and Evaluation (LREC-

[8]

Doetjes , Adjectives and

Degree

Modification , COLING 2024 ), ELRA and ICCL , Torino , Italia, 2024 , in: L. McNally , C. Kennedy (Eds.), Adjectives and pp. 211 - 220 . URL: https://aclanthology.org/ 2024 . Adverbs: Syntax, Semantics, and Discourse , Oxford lrec-main.19. University Press, 2008 , pp. 123 - 155 . doi: 10 .1093/ [19]

Ortiz-Fuentes , Chilean Spanish Corpus, oso/9780199211616.003.0006 . 2023 . URL: https://huggingface.co/datasets/

[9]

Amaral , When Something Becomes a Bit, Di- jorgeortizfuentes/chilean-spanish-corpus . achronica 33 ( 2016 ) 151 - 186 . doi: 10 .1075/dia. doi: 10 .57967/hf/3181. 33.2.01ama. [20]

Honnibal , I. Montani, S. Van Landeghem ,

[10]

Luo ,

Jurafsky ,

Levin , From insanely jeal- A. Boyd, spacy: Industrial-strength natural lanous to insanely delicious: Computational models guage processing in python, The Journal of for the semantic bleaching of English intensifiers , Open Source Software 5 ( 2020 ) 2914 . doi: 10 .5281/ in: N. Tahmasebi , L.

Borin , A.

Jatowt , Y. Xu (Eds.), zenodo.1212303. Proceedings of the 1st International Workshop on [21]

Mikolov ,

Chen , G. Corrado,

Dean , EfiComputational Approaches to Historical Language cient Estimation of Word Representations in VecChange, Association for Computational Linguis- tor Space , Proceedings of Workshop at ICLR 2013 tics, Florence, Italy, 2019 , pp. 1 - 13 . URL: https: ( 2013 ) 1 - 12 . //aclanthology.org/W19-4701/. doi: 10 .18653/v1/ [22]

Hu ,

Amaral ,

Kübler , Word embeddings W19-4701. and semantic shifts in historical spanish: Method-

[11]

Abeillé ,

Bonami ,

Godard , J. Tseng, The ological considerations, Digital Scholarship in the Syntax of French de-N' Phrases , Proceedings of the Humanities 37 ( 2022 ) 441 - 461 . International Conference on Head-Driven Phrase [23]

Levy ,

Goldberg , Dependency-based word Structure Grammar ( 2004 ) 6 - 26 . doi: 10 .21248/ embeddings, in: K. Toutanova, H. Wu (Eds.), hpsg. 2004 . 1 . Proceedings of the 52nd Annual Meeting of the

[12]

Marchello-Nizia , Grammaticalisation et change- Association for Computational Linguistics (Volment linguistique , De Boeck, 2006 . ume 2: Short

Papers)

, Association for Computa-

[13]

Verveckken , Towards a Constructional Account tional Linguistics , Baltimore, Maryland, 2014 , pp. of High and Low Frequency binominal Quantifiers 302 - 308 . URL: https://aclanthology.org/P14-2050/. in Spanish, Cognitive Linguistics 23 ( 2012 ). doi:10. doi: 10 .3115/v1/ P14 -2050. 1515/cog-2012- 0013 . [24]

Hopper , On some principles of grammaticization,

[14]

Traugott , Grammaticalization, Constructions in: Approaches to Grammaticalization, Benjamins, and the Incremental Development of Language: 1991 , pp. 17 - 35 . Suggestions from the Development of Degree Mod- [25]

Real

Academia Española , Diccionario de la lengua ifiers in English, Variation, Selection, Develop- española, 2025 . URL: <https://dle.rae.es>[6/1/2025]. ment: Probing the Evolutionary Model of Language [26]

Ito ,

Tagliamonte , Well weird, right dodgy, Change ( 2008 ) 219 - 250 . very strange, really cool: Layering and recycling in

[15]

Amaral , Bocado: Scalar Semantics and Polarity english intensifiers , Language in Society 32 ( 2003 ) Sensitivity, Zeitschrift für romanische Philologie 257-279 . doi: 10 .1017/S0047404503322055. 136 ( 2020 ) 1114 - 1136 .

[16]

Fonteyn , E. Manjavacas, Adjusting scope: a computational approach to case-driven research on semantic change , in: Proceedings of the Workshop on Computational Humanities Research (CHR 2021 ), volume 2898 of CEUR Workshop Proceedings , 2021 , pp. 280 - 298 . URL: http://ceur-ws. org/ Vol- 2989 /long_paper26.pdf.

[17]

Amaral ,

Hu ,

Kübler , Tracing semantic change with distributional methods: The contexts of algo , Diachronica 40 ( 2023 ) 153 - 194 .