<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Is It Still a Village? Tracing Grammaticalization with Word Embeddings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joseph Larson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patrícia Amaral</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Spanish and Portuguese, Indiana University</institution>
          ,
          <addr-line>Bloomington Indiana</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Computational studies of language change tend to focus on predicting lexical semantic change that reflects cultural and societal changes. In this paper we focus instead on the syntactic and semantic shift from lexical to grammatical (grammaticalization), and we choose an understudied variety of Spanish. This paper investigates the grammaticalization of the noun caleta 'cove, village' to a degree expression (an intensifier) meaning 'a lot', as part of the system of degree words in Chilean Spanish. We use word embeddings trained on a corpus of tweets to show the ongoing syntactic and semantic change of caleta. Our distributional analysis also reveals how high degree is expressed in this variety of Spanish, showing the potential of these methods to explore lesser-known linguistic subsystems. Our study unveils degree expressions not previously studied in contemporary colloquial Chilean Spanish and also provides further evidence for an existing typology of degree modifiers across languages.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;grammaticalization</kwd>
        <kwd>degree</kwd>
        <kwd>quantifiers</kwd>
        <kwd>historical linguistics</kwd>
        <kwd>Chilean Spanish</kwd>
        <kwd>word embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        shifts. We investigate the grammaticalization of caleta
in Chilean Spanish, from a noun denoting ‘cove, hiding
Studies of language change using distributional methods place (where merchandise can be stored)’, ‘village’, as in
have shown the potential of word embeddings to trace ex. (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), to a quantifier and degree adverb ‘much, a lot’, as
syntactic and semantic change over time [1, 2, a.o.]. How- in (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), where caleta modifies the verb and denotes high
ever, such research tends to focus on predicting changes degree.
that afect sets of lexical items shifting from one
semantic domain to another, which typically reflects cultural (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Esta experiencia la realizamos en
and societal changes. Fewer studies have explored both this experience cl.fem.sg.acc do.pst.1pl in
semantic and morphosyntactic change (but see Fonteyn Zapallar, en la caleta de pescadores
et al. 3). In this paper, we focus on the semantic and Zapallar in the caleta of fishermen
syntactic shift from lexical to grammatical, known as “We did this experience in Zapallar, in the
fishergrammaticalization [4, 5], and the stages of this process. men’s cove”
Specifically, we study the creation of degree expressions. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) me gustó caleta
      </p>
      <p>Traditionally, degree expressions have been associated cl.1sg.dat like.pst.3sg caleta
with adjectives, considered the prototypical gradable cat- “I liked it a lot.”
egory. However, degree modification is also compatible
with nouns and verbs, which shows that gradability cuts We use word embeddings to examine to what extent
across syntactic categories [6, 7, 8]. As a word becomes a the grammatialization of caleta has developed while also
degree expression over time, it typically expands its dis- shedding light on the system of degree modifiers in
tribution along diferent categories: e.g. it first combines Chilean Spanish. We ask, (i) how far along has caleta
with nouns before co-occurring with verbs and adjectives. grammaticalized in Chilean Spanish, and (ii) what types
Hence, the grammaticalization of degree expressions pro- of evidence do word embeddings provide of diferent
vides insight into the semantics of degree and patterns stages of grammaticalization of degree words?
in the distribution of degree words [9, 10]. This paper
examines an understudied variety, Chilean Spanish, and
uses word embeddings to investigate the emerging sys- 2. Previous Work
tem of degree words to which one grammaticalized word
Linguists have provided analyses of the gradual process
CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- by which lexical items acquire grammatical functions:
tics, September 24 — 26, 2025, Cagliari, Italy. for example, in this diachronic change, nouns lose their
$ joelarso@iu.edu (J. Larson); pamaral@iu.edu (P. Amaral) categorial properties like occurring after a determiner or
httphst:t/p/ssi:/t/egsi.gthouobg.lceo.cmom/jo/seiltaer/spoa/tr(iJc.iLamarasotons)a;maral/home (P. Amaral) being pluralized. The grammaticalization of nouns into
0000-0001-6651-0319 (P. Amaral) degree adverbs (e.g. the development from lot ‘a set of
ob© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License jects’ to a lot ‘much’) is well attested cross-linguistically:
Attribution 4.0 International (CC BY 4.0).
other examples are French adverb beaucoup from un beau Category Word Class
coup ‘a good strike’ and English a bit from ‘a bite, a
portion that fits in the mouth’ [11, 12, 13, 14, 15]. Type A</p>
      <p>This research has shown that a typical structure in I gradable adjectives Type B
which nouns occur - modification by a prepositional veryE ergD Type C
phrase, as in a lot [  of chairs], a mountain [  of
books] - provides a starting point for quantity and degree IIa gradable nominal predicates očen’R tropF
interpretations. This structure undergoes subsequent IIb gradable verbs Type D
syntactic reanalysis, where the head noun (e.g. lot) loses muitoP
nominal properties and a lot of becomes an adverb mod- eventive verbs beaucoupF
ifying the second noun. The development of so-called moltoI
binominal structures Det 1 of 2, which may or may III eventive adjectives a lotE Type E
cnroutcfiuarltrhoelre einvotlhvee gtoraamfmulalyticaadlvizeartbiioanl coaftedgeogrrye,epwlaoyrsdsa. comparatives veelD
In our study, we also include the structure (Det) caleta of IV mass nouns mnogoR Type F
N, hence we investigate the distribution of caleta de.</p>
      <p>As argued by 8, degree words across languages show a mountainE Type G
a systematic behavior in terms of classes of words they V plural nouns
can modify. These well-attested patterns correspond to manyE
types along a continuum of word classes defined by their
syntactic-semantic properties. For example, since French dFiisgturirbeut1io:nTyaploonloggaycoofndtienguruemeeoxfpwreosrsdiocnlassasecsc.oTrdabinlegatdoatphteeidr
trop ‘too much’ can modify all word classes, within this with modifications from [8, 138]. Superscripts indicate
lantypology it is considered to be a Type C modifier. On guage: R for Russian, D for Dutch, F for French, E for English,
the other hand, English very can only modify gradable P for Portuguese, and I for Italian.
adjectives (“very kind” is possible, while expressions like
“*I traveled very” or “*very water” are not grammatical),
therefore very is classified as a Type A modifier. For a 3. Methodology
complete summary of the continuum of word classes and
typology, see Figure 1. As words develop into one type,
they are predicted to modify words in the order along the 3.1. Corpus Creation
continuum; for instance, if a word co-occurs with words To ensure we had a good representation of colloquial
of category V, it is expected to co-occur with words of cat- Chilean Spanish, we created a subcorpus from an already
egory IV before it appears with words of category III. 1 As existing corpus of online data [19]. The already
existwe investigate whether caleta has grammaticalized into ing corpus contained roughly 19GB of data, from diverse
a degree word, we will examine its stage of development sources, including news, tweets, online reviews and other
with respect to Doetjes’ continuum. miscellaneous web content. We chose to create a
subcor</p>
      <p>While some computational studies of grammaticaliza- pus just from tweets to reduce the computational load for
tion have adopted case-driven approaches similar to ours our later experiments and since we only wanted informal
[16, 17, 18], we also investigate how a distributional anal- instances of language; caleta typically only occurs in less
ysis of caleta can provide insight on the set of degree formal registers. The resulting subcorpus of 27, 306, 582
expressions currently used in colloquial Chilean Spanish. tweets consisted of exactly 342, 979, 307 tokens. The
In other words, we aim to examine not just the gram- time span of these tweets is from 2010 to 2020.
maticalization of caleta but also how this word fits in the
system of degree words in Chilean Spanish and in types 3.2. Preprocessing
of degree expressions across languages.</p>
      <p>We first normalized the text in the corpus: we removed
case, punctuation, diacritics, URLs, hashtags, and any
repeated letters. For this last step, we only allowed
double letters where they occur within normative Spanish
1Doetjes diferentiates between ‘gradable’ and ‘eventive’ adjectives orthography (i.e. &lt;  &gt;, &lt;  &gt;, &lt;  &gt;), elsewhere only
and verbs by whether or not the modifier is targeting the degree or single letters were allowed. Then we input the corpus
is quantifying over events. The example she gives is from Dutch: into a plain text file separated by newlines. The resulting
Jan is veel ziek ‘Jan is sick a lot’ vs. Jan is erg ziek ‘Jan is very sick.’ ifle was then lemmatized using SpaCy’s Spanish
lemmaIn the former, veel as a quantifier targets eventive adjectives, thus
it can only modify the quantity of sick events. In the latter, erg
expresses the degree of sickness, i.e. the severity of his illness.
tizer [20].2</p>
      <p>Text Normalization
Text File Preparation</p>
      <p>Lemmatization</p>
      <p>Normalization Substeps</p>
      <p>Case
Punctuation
Diacritics
URLs</p>
      <p>Hashtags
Repeated Letters
for details). For both models, the analogy tests returned
the expected word, except for the last pair with  = 1:
where perra ‘dog (female)’ was expected, the most similar
word embedding was for quiltra ‘mutt (female)’.</p>
      <p>Relationship
Age-based
Familial
Feline
Canine</p>
      <p>Word Pair 1 Word Pair 2
Word A Word B Word A Word B
‘HMoamnb’re ‘MWuojemran’ ‘NBioñyo’ ‘NGiñiral’ 1.0
‘PFaadtrheer’ ‘MMaodtrheer’ ‘HSiojon’ ‘HDijaaughter’ 1.0
Niño Gato Niña Gata
‘Boy’ ‘Cat (male)’ ‘Girl’ ‘Cat (female)’ 1.0
Niño Perro Niña Perra
‘Boy’ ‘Dog (male)’ ‘Girl’ ‘Dog (female)’ 0.5</p>
      <p>Accuracy</p>
      <sec id="sec-1-1">
        <title>3.4. Window Size</title>
      </sec>
      <sec id="sec-1-2">
        <title>3.3. Model Selection</title>
        <p>As mentioned in the previous section, the only
hyperparameter we adjusted for the model was the window
size. We extracted models for  = [1, 10].3 Although
other authors have shown that small window sizes
often produce noisy and unstable embeddings [23], for this
project we expected small window sizes to be appropriate.</p>
        <p>Our hypothesis was that in our case, lower window sizes
would capture the grammaticalized meaning of caleta,
since the scope of grammatical words like quantifiers
lies within its immediate neighbors, whereas higher
window sizes show neighbors within the same semantic field
(therefore its lexical use). However, since we use a corpus
of tweets, window size is fairly limited by the genre itself
(a possible limitation we address later).</p>
        <p>To represent the distributional patterns of words in our
corpus, we decided to use static word embeddings over
contextualized word embeddings. Non-contextualized
embeddings allow us to compare our target word with
other words in Chilean Spanish to examine the current
stage of grammaticalization of caleta as determined by
its closeness to diferent subsystems in the language.</p>
        <p>The algorithm we use is Skip-Gram with Negative
Sampling (SGNS) implemented in word2vec [21] to extract
embeddings, based on previous research that showed 4. Results
good results for studies of semantic change [22, a.o.]. For
this reason, we do not consider it necessary to use a more 4.1. Caleta
computationally expensive operation (e.g. dynamic word Here we display only the results of the experiments with
embeddings). We trained each model for five epochs, a a small ( = 1) and a large ( = 10) window size.4This
minimum token count of 10 and the skip-gram algorithm. allows us to compare the information obtained by
manipInitially, we experimented with several hyperparameters: ulating this parameter. In Figure 3, the word embeddings
the window size, the minimal word count and the vector show both neighbors of the lexical noun and neighbors
size. The only hyperparameter that proved to be
significant was the window size (see next section for more
details). The resulting model used a vector length of 100
and a minimal word count of 10. To verify the validity
of the model, we used analogy tests targetting
genderbased morphological and semantic relations (see Table
1 for specifics). We performed the tests on both
models we used for the embeddings (see following section
3As a reviewer suggested, we experimented with other window
sizes e.g.  = 2. While we do not show the results for this
window size, we note that there was not a signficiant diference
for this window size and  = 1 for caleta de, but there was for
caleta. For  = 2, caleta had almost no neighbors that were
quantifiers. The other neighbors were ene, caleta de and then mostly
toponyms, similar to the t-SNE’s we show here for both strings with
 = 10. This demonstrates that instances of just caleta within our
corpus are more lexical uses, whereas caleta de demonstrates more
grammaticalized uses.
4To generate the t-SNE graphs for both caleta and caleta de, we used
the PCA (Principal Component Analysis) method since our data
points were dense vectors, and we used a perplexity of 10.
2As an anonymous reviewer noted, our preprocessing might have
worked better if we had normalized the text and lemmatized in one
step. This is something we will consider for future experiments.
of the degree word. Nearest neighbors of the noun are
toponyms (i.e. names of villages) and other nouns with
related meanings (e.g. playa ‘beach’ and muelle ‘wharf’).</p>
        <p>As for the neighbors of the degree word, we find degree
expressions, both adverbs and quantifiers like mucho and
ene, both meaning ‘a lot’. Caleta de also appears among
the neighbors (please see subsequent section for these
results).</p>
        <p>The co-occurrence of neighbors of both meanings
shows that caleta has partially grammaticalized; it still
retains its lexical use as a noun. These findings provide
evidence for a situation of layering [24], i.e. the
synchronic co-existence of older and more recent functions
of a form in a language.</p>
        <p>If we now use a larger window size, the results are
diferent, with more neighbors associated with the
lexical item. In Figure 4 we find the plural noun ( caletas);
as mentioned in historical analyses, the ability to be
pluralized is a syntactic property of nouns. This attests to
the persistence of some nominal categorial properties
of caleta. We also find the noun pescadores ‘fishermen’,
as the noun caleta typically refers to a village of
fishermen and hence the nouns often co-occur (in caleta de
pescadores), and related nouns like muelle ‘pier’ and poza
‘puddle’.
4.2. Caleta de
We analyzed the results of caleta de separately from those
of caleta since the former is the vestige of a binominal
quantifier preceding the grammaticalization of the latter.</p>
        <p>Figure 5 and Figure 6 show the TSNE representations
of the nearest neighbors of caleta de. For the smaller
window size, we see other quantifiers like ene (more in
the next section), caleta, etc. The majority of neighbors
here are quantifiers in their orthographical variants found
in tweets (e.g. mucho, mxo, nucho, etc). Two other words
that form part of binominal quantifiers are also present,
monton and montones, both meaning ‘pile’ and ‘piles’,
but which have grammaticalized in the same fashion
as caleta to denote a large quantity (un montón de N ‘a
lot of N’). In this window size, only one proper noun is
present, Chorromil, the name of a village. Lastly, we find
other quantifiers, like cualquiers and cualesquiers, both
orthographical variations of cualquier, ‘whichever’, and
puras, a determiner in Chilean Spanish.</p>
        <p>In the larger window size, we see caleta as its
nearest neighbor. Other quantifiers like mucho, ene, harto,
etc. are present, but they are much further away than
semantically related nouns like pescadores ‘fishermen’,
artesanales ‘craftsmen’, reinetas, a plural noun denoting a
variety of white fish, as well as toponyms that are names
of caletas. These results show once more how important
the hyperparameter of window size is in capturing
distributional properties of relatively newly grammaticalized
words in a language.</p>
        <p>
          In the following, we provide further analysis of the
nearest neighbors of caleta and caleta de.
4.3. Ene
We decided to display the top 10 neighbors for the word
ene, since ene always appeared as a top neighbor for caleta
and caleta de. Ene comes from the Spanish pronunciation
such example could be found in our corpus. Example (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
shows the degree adverb (here, modifying a verb), i.e. the
grammaticalized item. Lastly, example (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) shows ene in
combination with ctm, a commonly used abbreviation of
the phrase concha (de) tu madre (literally ‘your mother’s
pussy’), which is used as a vulgar intensifier similar to
fucking in English.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
        </p>
        <sec id="sec-1-2-1">
          <title>El fenómeno se repite ene</title>
          <p>The phenomenon cl.refl repeat.prs.3sg n
veces.
times
“The phenomenon is repeated n times.”
me gustó ene
cl.1sg.dat like.pst.3sg ene
“I liked it a lot.”
me gustó ene ctm
cl.1sg.dat like.pst.3sg ene ctm
“I fucking liked it a lot.”
Table 2 and 3 show the closest neighbors for ene in our
corpus. For both window sizes, none of the neighbors
are semantically related to Mathematics, which would be
expected if ene still retained some of its original lexical
meaning. For the smaller window size, all of the
neighbors are degree words meaning ‘much’ (including the
noun cantidad which can appear in a binominal
structure cantidad de N ‘a large quantity of N’). For the larger
window size, half of the neighbors are quantifiers. We
also see the expressive puxis (an orthographical variation
of pucha, meaning ‘darn’), spellings of laughter and the
vulgar term autodelicioso. This is evidence for what has
been previously described in the literature that degree
modifiers, as highly volatile units of language, are subject
to rapid change and become expressives [26].
of the grapheme &lt;  &gt; and is used in Mathematics to
denote an unspecified integer. Over time, in this variety
of Spanish ene has grammaticalized like caleta to denote a
large quantity and high degree. Our results show that ene
is another example of a grammaticalized degree word, al- 4.4. Other Quantifiers
beit in a diferent stage of grammaticalization. To the best
of our knowledge, this has not been observed or studied.</p>
          <p>
            Example (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) shows a lexical use of ene, taken from the
Dictionary of the Spanish Real Academy [25], since no
Lastly, we show word embeddings of other degree words,
in this case ‘stable’ quantifiers in Chilean Spanish: harto
‘a lot’, mucho ‘a lot’, tanto ‘so many.’ It is worth
mentioning that unlike caleta, caleta de and ene (which
syntactically can be considered degree adverbs), these quantifiers
inflect for gender and number when modifying a noun.
          </p>
          <p>The purpose of using the lemmatizer was to control for
this, but as the results show, some inflected tokens of
these quantifiers were not properly lemmatized.</p>
          <p>Tables 4, 5, 6, 7, 8 and 9 show the nearest neighbors for
harto, mucho and tanto at the two window sizes. For harto,
we see that the majority of its neighbors are other
quantifiers for both window sizes, as well as orthographical
variations (e.g. harrto, arto) and inflected versions of the
lexeme, like the feminine form harta. Likewise, tanto as
its neighbors for the smaller window size shows mostly
orthographical variations (e.g. tsnto, tabto), while for
the larger window size we can see similar results to ene,
where nouns like ‘laughter’ are amongst the neighbors.</p>
          <p>For mucho, we can see mostly orthographical variants for
the smaller window size (e.g. muxo, muxho) and for the
larger window size we see less orthographical variations
and more of other quantifiers, even its antonym poco,
which also occurs with intensifying afixes: re-poco and
poc-azo ‘very little’.</p>
          <p>Rank</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>5. Discussion</title>
      <p>Our word embedding results for caleta show that
nowadays the word is used to express high degree. In addition,</p>
      <p>Word (Gloss)
muchisimo ‘mucho’ (superlative)</p>
      <p>harto ‘a lot’
tanto ‘so much’
poco ‘a little’
muchoy (mucho y as one word, ‘a lot and’
muccho‘mucho’ (orthographical variation)</p>
      <p>bastante ‘quite’
muchopero (mucho pero as one word, ‘a lot but’)</p>
      <p>
        aunpero (aún pero as one word, ‘still but’)
muchisisisismo ‘mucho’ (repeated superlative)
‘hunger’, pena, ‘sorrow’, as in (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ).
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
      </p>
      <sec id="sec-2-1">
        <title>Hace caleta de años</title>
        <p>
          make.prs.3sg caleta of years
“Many years ago”
es caleta de plata
be.prs.3sg caleta of money
“it’s a lot of money.”
Yo igual reí caleta.
1sg.nom same laugh.pst.1sg caleta
“I laughed a lot, anyway.”
hay que cuidarse caleta mejor...
be.exist.prs.3sg that care.inf.ref caleta better
in our results both the lexical noun and the degree
modiifer are present. The choice of hyperparameters,
specifically window size, has important consequences: a small “one has to take care of themselves much better.”
window size yields nearest neighbors for both forms, (
          <xref ref-type="bibr" rid="ref10">10</xref>
          ) Hace caleta de frío.
while a larger window size results in more neighbors of make.prs.3sg caleta of coldness
the lexical noun. We hypothesize that this is due to the “It’s really cold.”
fact that as a degree word, caleta is a modifier, and occurs
in close adjacency to the modified word. Hence, a small There were no cases of caleta modifying either
evenwindow captures this distribution. On the other hand, tive adjectives or gradable adjectives within our corpus.
as a lexical noun caleta is less syntactically constrained, This, according to Doetjes’s classification, indicates that
with more positional freedom and semantic content. caleta has evolved into a type D degree modifier. Figure
        </p>
        <p>While cosine similarity scores give us insight into a 7 shows caleta’s position in this typology, in comparison
changing word’s distribution, they alone do not tell us to the other degree expressions in Chilean Spanish that
about its syntactic properties in detail. To better under- we have discussed in this paper. Our results align with
stand caleta’s current status as a degree modifier, we claims in the literature that Type C and D are the most
performed a post-hoc analysis of the top 20 collocates of common in the Romance languages [8]. Lastly, within
caleta and caleta de. We looked specifically at the top our results, caleta has no nearest neighbors with Type A
tokens that immediately precede and proceed the two modifiers (e.g. muy ‘very’), which combine exclusively
strings in our unlemmatized corpus. We were interested with gradable adjectives. This is not surprising since
in the kinds of words that caleta and caleta de have come Type A modifiers have no overlap in word classes with
to modify, in accordance to Doetjes’s typology of degree Type D modifiers; their distributions are disjoint. This
modifiers (see Section 2). highlights how embeddings capture syntactic properties</p>
        <p>
          Our analysis shows that caleta has evolved extensively of words, as opposed to just similarity of meaning.
beyond its original lexical usage, wherein it was only com- Our study has two main findings, which answer the
repatible with count nouns that were semantically related search questions above. First, we have shown that caleta
e.g. pescadores ‘fishermen’ camarones ‘shrimp (plural)’, is undergoing grammaticalization: both the older and the
headed by the preposition de. The structure caleta de is new meaning are captured by the word embeddings.
Imnow compatible with count nouns beyond the semantic portantly, we see a diference in the results depending on
domain of a fishing village: años ‘years’, veces ‘times/in- the window size, when compared to other degree words
stances’ (see (
          <xref ref-type="bibr" rid="ref6">6</xref>
          )), as well as mass nouns e.g. plata ‘money which are grammatical items and not undergoing change,
(informal), tiempo ‘time’ (see (
          <xref ref-type="bibr" rid="ref7">7</xref>
          )). It can also modify like mucho and harto. In the latter case, window size does
comparatives e.g. mejor ‘better’, peor ‘worse’ (see (
          <xref ref-type="bibr" rid="ref9">9</xref>
          )); not significantly impact the neighbors. Additionally, our
eventive verbs e.g. dormir ‘to sleep’, reír ‘to laugh’ (see post-hoc analysis provided insight on the properties of
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )); gradable verbs gustar ‘to like’, querer ‘to want’ (see caleta as a degree word.
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ); and finally gradable nominal predicates 5 e.g. hambre Second, our word embeddings have allowed us to
reveal the inventory of degree words in colloquial Chilean
Spanish, including a word that to date had never been
investigated, ene. These words denote high degree
(intensifiers), words that are known to change rapidly due
to social and expressive pressure [26]. Since caleta and
ene are not normative forms, they are left out of
tradi5Gradable nominal predicates, in Doetjes’s definition, are nouns
which are generally the objects of light verb expressions. The
examples she gives are from French e.g. Elle a très soif ‘She is very
thirsty.’ In Spanish, such light verb constructions also exist, so we
consider cases like tener sed ‘to be thirsty (lit. to have thirst)’ to
also be examples of nominal predicates.
I
IIa
IIb
III
IV
V
gradable nominal predicates Type D Type B Type C
gradable adjectives
        </p>
        <p>Type A
gradable verbs
eventive verbs
caleta
ene
eventive adjectives</p>
        <p>mucho
comparatives
mass nouns
plural nouns
tanto Type E
harto
bastante
demasiado</p>
        <p>Type F</p>
        <p>un
montón
cantidad Type G
montones vario
tional studies. This entails that we may miss instances of
change possibly of interest to current linguistic theory.</p>
        <p>Hence, word embeddings can be a tool to study
lesserknown subsystems of a language and capture ongoing
changes in synchrony.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusion</title>
      <p>Our study contributes to studies of language change by
analyzing intensifiers in colloquial Chilean Spanish (an
understudied variety) from the past twenty years. We
do not yet have data from multiple temporal slices to
demonstrate direct evidence of changes in grammatical
behavior. For this reason, we infer grammaticalization
from synchronic distributional patterns. Nevertheless,
we reveal an ongoing change that had not been
previously studied. Using spontaneous speech from tweets,
we gained access to informal speech where speakers
communicate in an unedited way, which has allowed us to
study the use of older and more recent degree
expressions. In the future, we plan on expanding the time span
of the data, depending on the availability of more text
reflecting spontaneous speech in this variety of Spanish.</p>
      <p>We have shown that static word embeddings provide
evidence for this change and can reveal meaning
relations not previously studied. Moreover, we show that
diferent choices of hyperparameters have an efect on
which meaning of the word undergoing change (the
lexical vs. the grammatical) is represented. Nevertheless,
comparing our results with dynamic embeddings in the
future could prove interesting.</p>
      <p>Some limitations of our study are due to the genre
itself. One such limitation is the dificulty with
lemmatization: as we have mentioned, these are tweets, so we find
strings that do not conform to normative orthography
(for example, typos, abbreviations etc), therefore the
lemmatizer has dificulty with detecting words of the same
lexeme. In addition, Twitter users tend to adopt
orthographical forms that reflect pronunciation and sometimes
are intended to be expressive, like repeating vowels in a
word to express a very high degree. Furthermore, using
a corpus of tweets means that the character limit has
an impact on the possible window sizes. To obviate this
problem, further studies on caleta could use longer texts
that have the same register as tweets, e.g. blog posts.</p>
      <p>Lastly, the only hyperparmeter we significantly
experimented with were the window size and the minimal word
count. More hyperparameter fine tuning (e.g. adjustment
of negative sampling and vector size) could potentially
yield more robust results.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This research was supported in part by Lilly Endowment,
Inc., through its support for the Indiana University
Pervasive Technology Institute.</p>
      <p>Declaration on Generative AI
During the preparation of this work, the author(s) did not use any generative AI tools or services.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>Cultural shift or linguistic drift? comparing two computational measures of semantic change</article-title>
          , in: J.
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Duh</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          Carreras (Eds.),
          <source>Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Austin, Texas,
          <year>2016</year>
          , pp.
          <fpage>2116</fpage>
          -
          <lpage>2121</lpage>
          . URL: https://aclanthology.org/D16-1229/. doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>D16</fpage>
          -1229.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kutuzov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Øvrelid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Szymanski</surname>
          </string-name>
          , E. Velldal,
          <article-title>Diachronic word embeddings and semantic shifts: a survey</article-title>
          , in: E. M.
          <string-name>
            <surname>Bender</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Derczynski</surname>
          </string-name>
          , P. Isabelle (Eds.),
          <source>Proceedings of the 27th International Conference on Computational Linguistics</source>
          , Association for Computational Linguistics, Santa Fe, New Mexico, USA,
          <year>2018</year>
          , pp.
          <fpage>1384</fpage>
          -
          <lpage>1397</lpage>
          . URL: https: //aclanthology.org/C18-1117/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fonteyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Manjavacas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Budts</surname>
          </string-name>
          ,
          <article-title>Exploring morphosyntactic varation and change with distributional semantic models</article-title>
          ,
          <source>Journal of Historical Syntax</source>
          <volume>6</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Meillet</surname>
          </string-name>
          ,
          <string-name>
            <surname>L'</surname>
          </string-name>
          <article-title>évolution des formes grammaticales</article-title>
          ,
          <source>Scientia</source>
          <volume>12</volume>
          (
          <year>1912</year>
          )
          <fpage>130</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Hopper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Traugott</surname>
          </string-name>
          , Grammaticalization, Cambridge Textbooks in Linguistics, 2 ed., Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bolinger</surname>
          </string-name>
          , Degree Words, De Gruyter [18]
          <string-name>
            <given-names>R.</given-names>
            <surname>Nagata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kawasaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Otani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takamura</surname>
          </string-name>
          , Mouton, Berlin, Boston,
          <year>1972</year>
          . URL: A Computational Approach to Quantifying Gramhttps://doi.org/10.1515/9783110877786. maticization of English Deverbal Prepositions, in: doi:doi:10.1515/9783110877786. N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
          </string-name>
          , S. Sakti,
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Neeleman</surname>
          </string-name>
          , H. Van de Koot, J. Doetjes, Degree N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint Inexpressions, The Linguistic Review</source>
          <volume>21</volume>
          (
          <year>2004</year>
          )
          <fpage>1</fpage>
          -
          <lpage>66</lpage>
          . ternational Conference on Computational Linguisdoi:doi:10.1515/tlir.
          <year>2004</year>
          .
          <article-title>001. tics, Language Resources and Evaluation (LREC-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Doetjes</surname>
          </string-name>
          , Adjectives and
          <string-name>
            <given-names>Degree</given-names>
            <surname>Modification</surname>
          </string-name>
          ,
          <string-name>
            <surname>COLING</surname>
          </string-name>
          <year>2024</year>
          ),
          <article-title>ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , in: L.
          <string-name>
            <surname>McNally</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Kennedy (Eds.), Adjectives and pp.
          <fpage>211</fpage>
          -
          <lpage>220</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          . Adverbs: Syntax, Semantics, and
          <string-name>
            <surname>Discourse</surname>
          </string-name>
          , Oxford lrec-main.19. University Press,
          <year>2008</year>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>155</lpage>
          . doi:
          <volume>10</volume>
          .1093/ [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ortiz-Fuentes</surname>
          </string-name>
          ,
          <source>Chilean Spanish Corpus, oso/9780199211616.003.0006</source>
          .
          <year>2023</year>
          . URL: https://huggingface.co/datasets/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Amaral</surname>
          </string-name>
          ,
          <article-title>When Something Becomes a Bit, Di- jorgeortizfuentes/chilean-spanish-corpus</article-title>
          .
          <source>achronica 33</source>
          (
          <year>2016</year>
          )
          <fpage>151</fpage>
          -
          <lpage>186</lpage>
          . doi:
          <volume>10</volume>
          .1075/dia. doi:
          <volume>10</volume>
          .57967/hf/3181. 33.2.01ama. [20]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          , I. Montani,
          <string-name>
            <surname>S. Van Landeghem</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Levin</surname>
          </string-name>
          ,
          <article-title>From insanely jeal- A. Boyd, spacy: Industrial-strength natural lanous to insanely delicious: Computational models guage processing in python, The Journal of for the semantic bleaching of English intensifiers</article-title>
          ,
          <source>Open Source Software</source>
          <volume>5</volume>
          (
          <year>2020</year>
          )
          <article-title>2914</article-title>
          . doi:
          <volume>10</volume>
          .5281/ in: N.
          <string-name>
            <surname>Tahmasebi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Borin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jatowt</surname>
          </string-name>
          , Y. Xu (Eds.),
          <source>zenodo.1212303. Proceedings of the 1st International Workshop on</source>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>EfiComputational Approaches to Historical Language cient Estimation of Word Representations in VecChange, Association for Computational Linguis-</article-title>
          tor
          <string-name>
            <surname>Space</surname>
          </string-name>
          , Proceedings of Workshop at ICLR 2013 tics, Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . URL: https: (
          <year>2013</year>
          )
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . //aclanthology.org/W19-4701/. doi:
          <volume>10</volume>
          .18653/v1/ [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Amaral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kübler</surname>
          </string-name>
          ,
          <article-title>Word embeddings W19-4701. and semantic shifts in historical spanish: Method-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abeillé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bonami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Godard</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Tseng,</surname>
          </string-name>
          <article-title>The ological considerations, Digital Scholarship in the Syntax of French de-N' Phrases</article-title>
          ,
          <source>Proceedings of the Humanities</source>
          <volume>37</volume>
          (
          <year>2022</year>
          )
          <fpage>441</fpage>
          -
          <lpage>461</lpage>
          . International Conference on Head-Driven Phrase [23]
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Dependency-based word Structure Grammar (</article-title>
          <year>2004</year>
          )
          <fpage>6</fpage>
          -
          <lpage>26</lpage>
          . doi:
          <volume>10</volume>
          .21248/ embeddings, in: K. Toutanova, H. Wu (Eds.), hpsg.
          <year>2004</year>
          .
          <article-title>1</article-title>
          .
          <source>Proceedings of the 52nd Annual Meeting of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Marchello-Nizia</surname>
          </string-name>
          ,
          <article-title>Grammaticalisation et change- Association for Computational Linguistics (Volment linguistique</article-title>
          , De Boeck,
          <year>2006</year>
          . ume 2:
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , Association for Computa-
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Verveckken</surname>
          </string-name>
          ,
          <article-title>Towards a Constructional Account tional Linguistics</article-title>
          , Baltimore, Maryland,
          <year>2014</year>
          , pp.
          <source>of High and Low Frequency binominal Quantifiers</source>
          <volume>302</volume>
          -
          <fpage>308</fpage>
          . URL: https://aclanthology.org/P14-2050/. in Spanish,
          <source>Cognitive Linguistics 23</source>
          (
          <year>2012</year>
          ). doi:10. doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>P14</fpage>
          -2050. 1515/cog-2012-
          <volume>0013</volume>
          . [24]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hopper</surname>
          </string-name>
          ,
          <article-title>On some principles of grammaticization,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Traugott</surname>
          </string-name>
          , Grammaticalization, Constructions in: Approaches to Grammaticalization,
          <article-title>Benjamins, and the Incremental Development of Language:</article-title>
          <year>1991</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>35</lpage>
          .
          <article-title>Suggestions from the Development of Degree Mod-</article-title>
          [25]
          <string-name>
            <given-names>Real</given-names>
            <surname>Academia</surname>
          </string-name>
          <string-name>
            <surname>Española</surname>
          </string-name>
          , Diccionario de la lengua ifiers in English, Variation, Selection, Develop- española,
          <year>2025</year>
          . URL: &lt;https://dle.rae.es&gt;[6/1/2025].
          <source>ment: Probing the Evolutionary Model of Language</source>
          [26]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tagliamonte</surname>
          </string-name>
          , Well weird, right dodgy,
          <source>Change</source>
          (
          <year>2008</year>
          )
          <fpage>219</fpage>
          -
          <lpage>250</lpage>
          .
          <article-title>very strange, really cool: Layering and</article-title>
          recycling in
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Amaral</surname>
          </string-name>
          , Bocado: Scalar Semantics and
          <article-title>Polarity english intensifiers</article-title>
          ,
          <source>Language in Society</source>
          <volume>32</volume>
          (
          <year>2003</year>
          )
          <article-title>Sensitivity, Zeitschrift für romanische Philologie 257-279</article-title>
          . doi:
          <volume>10</volume>
          .1017/S0047404503322055. 136 (
          <year>2020</year>
          )
          <fpage>1114</fpage>
          -
          <lpage>1136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Fonteyn</surname>
          </string-name>
          , E. Manjavacas,
          <article-title>Adjusting scope: a computational approach to case-driven research on semantic change</article-title>
          ,
          <source>in: Proceedings of the Workshop on Computational Humanities Research (CHR</source>
          <year>2021</year>
          ), volume
          <volume>2898</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>280</fpage>
          -
          <lpage>298</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2989</volume>
          /long_paper26.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Amaral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kübler</surname>
          </string-name>
          ,
          <article-title>Tracing semantic change with distributional methods: The contexts of algo</article-title>
          ,
          <source>Diachronica</source>
          <volume>40</volume>
          (
          <year>2023</year>
          )
          <fpage>153</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>