Detecting linguistic change based on word co-occurrence
                                 patterns
            Carmen Klaussner                                     Carl Vogel                            Arnab Bhattacharya
            Trinity College Dublin                         Trinity College Dublin                       Trinity College Dublin
               klaussnc@tcd.ie                                  vogel@tcd.ie                               bhattaca@tcd.ie

ABSTRACT                                                                 items might appear across different frequency strata depending on
Diachronic linguistic analysis focuses on detecting elements of          the exact time period examined.
language change over time. This change can take different forms,             We aim to show that even the open-class/closed-class view of
as for instance certain words could show a slow increase or decrease     features might be insufficient when observed through the lens of
in frequency over time as these become more popular or obsolete.         temporal text representation. While something might bear the label
We are interested in sudden change in words that are attested for in     of common noun, it could in fact be closer to a function word,
every time slice of the overall examined time period. In particular,     behaving and being affected by similar factors, such as regular
we are trying to relate the change in frequency of words that are        occurrence in different contexts.
always there to words that emerge around certain points in time              In this work, we consider the analysis of the more regular and
and only remain frequent for shorter periods of time, suggesting         possibly also frequent items, in particular those appearing in all
they are more prone to sudden changes in popular topics or could be      years of the time period examined. Typically, these features are
influenced by historical events. This addresses the question of how      more general in meaning (e.g. temporal expressions) rendering them
the more regular word expressions’ frequencies are influenced by         suitable for a variety of language contexts other than strongly topic-
new clusters of words appearing and disappearing. Although there         related words, such as hurricane or computer. Our main hypothesis
might be links to collocation analysis, words that occur frequently      is that these items change through other less frequent items that
next to each other are not of primary interest here, but rather words    are more prone to topic change over time, such as concepts relating
that are conceptually related, where one is causing or affecting         to the outbreak of a war or natural disaster. The type of change we
the frequency of the other, which causes them vary in a similar          are looking for is a change in mean, whereby a feature changes its
fashion. We use statistical change point analysis for identification     relative frequency fairly abruptly at time t, rising or falling to a new
of significant change over time and seek to validate our findings by     level and remaining there for at least some time. If one compares
randomly extracting example sentences from the data.                     the mean over the samples before time t to the mean taken over
                                                                         the samples after time t, one obtains significantly different means.
                                                                         This type of research has to be distinguished from two related areas
KEYWORDS
                                                                         of research, i.e. semantic change in the form of neologisms and
diachronic analysis, language change, change point analysis              collocation analysis. Semantic change analysis is different in that it
                                                                         considers cases whereby a word acquires a new sense and possibly
                                                                         also a second part-of-speech class and could subsequently be used
1   INTRODUCTION                                                         in different syntactic contexts, whereas here we consider changes
                                                                         of word frequencies and their possible non-semantic change related
Findings from temporal studies offer an important source of enrich-
                                                                         causes, as for instance a particular temporal expression used for
ment and validation for non-temporal studies, especially when one
                                                                         contrasting different situations, e.g. ‘If I had only known then, what I
needs to disentangle (general) temporal effects from non-temporal
                                                                         know now.’ Also, conceptually, these regular and irregular appearing
ones, as for instance in the realm of stylometry and authorship
                                                                         words could be relatable through collocations or otherwise longer
attribution [2]. Change in linguistic variables can occur in differ-
                                                                         n-gram sequences. While the method presented here could be used
ent shapes and forms: slow gradual change as opposed to sudden
                                                                         for their detection as well, it is not limited to relationships between
and abrupt, short as well as long-term effects. Differences could be
                                                                         words that occur close to each other, but also words or expressions
rooted in levels of linguistic abstraction, as for instance individual
                                                                         that only share a conceptual rather than spatial relationship, such
words are likely to show more variation over time than entire word
                                                                         as the first (‘Detecting’) and last (‘patterns’) word of the title of
classes, where smaller fluctuations would be averaged and only
                                                                         this work, that as 8-grams are rarely computed, is less likely to
larger trends pervading the entire group would be more easily dis-
                                                                         be captured by collocation analysis, while the terms are clearly
cernible. Features are usually classified along one or more different
                                                                         conceptually related.
dimensions, such as membership of either an open-or closed-class
                                                                             The remainder of this work is organized as follows: section 2
or according to the frequency strata, i.e. frequent, medium-frequent
                                                                         discusses related research in the field; section 3 provides informa-
and rare, they belong to. However, even though these classifica-
                                                                         tion about the data and pre-processing steps; section 4 presents
tions are often represented as categories suggesting there is a clear
                                                                         the methods we employed. Before moving on to the experiments
boundary between, for instance what is frequent and infrequent, a
                                                                         in section 6, section 5 considers trends in larger groups of word
continuous representation, especially considering temporal effects
                                                                         expressions to anchor our findings based on individual words. Sec-
would occasionally seem more suited. This is especially true in the
                                                                         tion 6 then presents our change point experiments, and empirical
realm of diachronic analyses, the analysis of texts over time, as most
validation of these through the actual data. We discuss these results                           one frequency based, whereby sudden changes in word usage are
in section 7 and conclude the work in section 8.                                                captured. The second one involves a syntactic time-series analysis,
                                                                                                analyzing word’s part-of-speech tag distributions and finally they
2     PREVIOUS RESEARCH                                                                         construct a distributional time-series by considering contextual
Different areas of linguistic research consider the change of broader                           cues from word co-occurrence statistics. Using human evaluators
categories of words, such as frequency effects in syntax [1], largely                           to assess the performance of their models, they find the highest
distinguishing between type and token frequency of a particular                                 amount of agreement between annotators and method with respect
variable or category. Bybee and Thompson [1] discuss three fre-                                 to words that have undergone change is the distributional method
quency effects that are important not only in shaping phonology                                 with c.53% average agreement compared to c.22% (syntactic) and
and morphology, but also syntax; two effects are caused by high                                 c.13% (frequency). Another change point oriented analysis was
token frequency, which have adverse tendencies that can only be                                 addressed by Riba and Ginebra [9], which investigates a possible
explained by considering the influence of the third frequency effect                            change in authorship of Tirant lo Blanc, identifying a clear single
of high type frequency. A high token frequency of an item pro-                                  sudden change point that is supported by cluster analysis. Our
motes its reduction, as visible in conventionalized contractions in                             work examines possible changes in features that are both regular
English (I’m, can’t). In contrast, the ‘Conserving Effect’ is visible                           in occurrence and highly frequent caused by features that are only
with high token items, where the more the form is used the more it                              highly frequent over a short period of time, similar to semantic
is strengthened, compare normalization of the English past tense                                cultural shifts, but for the difference that these words would not
of ‘weep’ from wept to weeped, compared to high frequency items,                                necessarily take on a new meaning.
such as ‘sleep’ (slept). A syntactic example of this is the fact that
pronouns, although derived from full noun phrases show much                                     3     DATA
more conservative behaviour (e.g. case marking) due to their higher                             For this analysis, we consider a 100-year long extract from The Cor-
frequency. The type of change that is resisted in the high token                                pus of Historical American English (COHA) [3].2 This is a 400-million
frequency items is change on the basis of combinatorial patterns or                             word corpus, which contains samples of American English from
constructions that are productive. “The more lexical items that are                             1810–2009 balanced in size, genre and sub-genre in each decade
heard in a certain position in a construction, the less likely it is that                       (1000–2500 files each). It therefore contains balanced language sam-
the construction will be associated with a particular lexical item”                             ples from fiction, popular magazines, newspapers and non-fiction
[1, p.384]. This is observable in the ditransitive construction, which                          books, which are again balanced across sub-genre, such as drama
is only acceptable with very specific lexical verbs of high frequency,                          and poetry.3
compare: “ He told the woman the news” vs. “He whispered the                                       For this study, we selected all data from the years of 1880-1979
woman the news” [1, p.385], where the verb tell is a lot more fre-                              covering all genre of news, magazine, fiction and non-fiction. For
quent than whispered. To a limited extent, this is also productive in                           most of the experiments, we only use the news section of the data,
that the construction can apply to a few new high frequency verbs,                              as it is most likely to contain the types of change we are targeting,
such as e-mailed or telephoned.                                                                 though occasionally comparing to the other three genre. In order
   Hamilton et al. [4] consider the function aspect of diachronic                               to arrive at a relative frequency count for each feature, we combine
change by taking a closer look at global and local shifts in a word’s                           the individual files on a per year basis and relativize by the overall
distributional semantics in historical texts from English, French and                           token count for that year. 4 As features, we consider the set of word
German.1 For the local or cultural shifts, they use a local neighbour-                          bigrams marked for syntactic context, e.g. the word like has different
hood measure and for the global measure they compute the cosine                                 meanings depending on its context. It can be used as both a verb and
distance between two word vectors capturing the co-occurrence                                   a preposition, which should subsequently be treated as two separate
statistics at consecutive time points t and t+1. Based on previous                              items. We chose bigram size as it provides more context and is richer
results in the literature, they predict that nouns are more likely to                           in meaning allowing us to discern more specific items of change
undergo change because of cultural shifts, whereas verbs are more                               than with unigram size. We decided against analyzing items of
likely to change because of regular semantic change. Across all                                 higher rank and abstraction, such as part-of-speech sequences as
languages as predicted, the local neighbourhood measure assigns                                 these are more difficult to evaluate, while word sequences offer
higher rates of semantic change to nouns than verbs with the op-                                more possibilities for human evaluation.
posite applying to the global measure. This also remains the case,                                 In order to extract part-of-speech (POS) features needed for syn-
when adverbs and adjectives are included among the verbs, sup-                                  tactic word features, we used the TreeTagger POS tagger [8, 10].
porting previous results in the literature suggesting that adverbial                            Our new syntactic word features were then created by using the tag
and adjectival modifiers are often the target of regular or global                              sequence as a suffix to the original word in context that gave rise
linguistic change [4].                                                                          to it. Thus, “He likes her” becomes “he.PP likes.VBZ her.PP”.5 Items
   The research presented by Kulkarni et al. [7] considers change
point analysis in the context of investigating statistically significant                        2 free version accessible on: http://corpus.byu.edu/coha –last verified July 2017.
                                                                                                3 There   is an excel file with a detailed list of sources available on:
shifts of semantic change. They consider three different approaches,
                                                                                                http://corpus.byu.edu/coha/–last verified July 2017.
                                                                                                4 In the case of higher sequence features, such as word bigrams the unigram token
1 Local or cultural shifts are deemed less regular and stable than global shifts, as they       count is replaced by the unique bigram token count.
are caused by more changeable factors, such as new technologies, whereas global                 5 In this, the difference between the original word in context and the lemma of the
shifts are associated to regular semantic change, such as grammaticalization.                   word would primarily be reflected in verbs.
                                                                                            2
are then joined to bigram sequences and each two syntactic word                                 sequences are 50 new components that group related features to-
sequence is relativized by the total number of bigram sequences in                              gether.6 A feature can be negatively or positively related to a new
that year. For this work, we are primarily interested in changes in                             component. The components themselves account for decreasing
common nouns, requiring us to extract these from all other types.                               proportions of variance, e.g. in this case the first component ac-
We only retain adjective-noun or noun-noun combinations, as we                                  counts for 25% and the second component for 12% of the variance
expect the other types that can occur in noun phrases, i.e. deter-                              with the rest being more broadly spread out. Inspection of first
miners, proper nouns and pronouns to follow a different frequency                               principal component allows for discovery of the three highest as-
distribution that might introduce noise.                                                        sociated items: last week, last year and next year with very similar
                                                                                                weights: 0.259558004, 0.256880159 and 0.252325884 respectively
                                                                                                (Figure 1).

Figure 1: PCA results for the 10 highest associated bigrams.
                                                                                                Figure 2: 19th/20th century corpus: word bigrams ’last year’
                  1.0                                                                           and ’last week’ shown over different genre types: news, mag-
                                                                                                azines, fiction and non-fiction.
                                                                                                                               All genre: 'last year'
                  0.5
                                                                   first.JJ time.NN
                                                                                                                                            ●   fiction
                                                                                                                                                nfiction
                                                                                                                      0.0003
                           young.JJ man.NN
                                                                                                                                                magazine


                                                                                                 Relative Frequency
                                                             many.JJ people.NNS
      pc2 axis


                                                                    next.JJ year.NN                                                             news
                  0.0
                         few.JJ moments.NNS
                          young.JJ lady.NN                           last.JJ week.NN
                                                                     last.JJ year.NN

                        good.JJ deal.NN
                                                                                                                      0.0002
                            great.JJ deal.NN


                 −0.5
                                                                                                                      0.0001

                                                                                                                                                                   ●
                                                                                                                                                                         ●         ●
                                                                                                                                                                            ●●                           ●           ●
                                                                                                                                   ●                                                 ●  ● ●    ●              ●● ●       ●● ● ●
                                                                                                                                     ● ● ●●   ● ●●●● ●    ● ●● ● ● ●      ●     ●
                                                                                                                                                                              ●● ● ● ●●●   ●● ● ●●●●●●●●● ●● ●        ●●   ● ● ●
                                                                                                                               ●●●●
                                                                                                                                    ●
                                                                                                                                      ● ●● ●●● ●    ●● ●●● ●● ● ●●   ● ●●        ●       ●   ●              ●   ●● ●●   ●
                                                                                                                                                                      ●
                                                                                                                      0.0000                                               ●
                 −1.0

                                                                                                                             1880     1890      1900       1910     1920     1930      1940      1950     1960      1970

                         −1.0              −0.5     0.0      0.5                      1.0
                                                                                                                                                                            Time
                                                  pc1 axis
                                                                                                                               All genre: 'last week'

                                                                                                                      0.0020                ●   fiction
                                                                                                                                                nfiction
                                                                                                                                                magazine
4     METHODS
                                                                                                 Relative Frequency


                                                                                                                      0.0015
                                                                                                                                                news
In this section, we describe the methods used for initial detection of
interesting constant features, our data exploration and the change                                                    0.0010
point analysis.

                                                                                                                      0.0005
4.1         Detecting changing features
As we are interested in change in variables appearing in all time
                                                                                                                      0.0000 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
instances of a temporally-ordered data series, we consider only
                                                                                                                             1880     1890      1900       1910     1920     1930      1940      1950     1960      1970
those bigram adjective-noun/noun-noun types that appear in all
                                                                                                                                                                            Time
time slices and discard all others. Even when reducing the set of fea-
tures to these constant noun types, some 350 sequences remain for
examination. In order to discover interesting (and possibly related)                               Figure 2 shows two of the features: last year and last week. Both
features more easily, we first order them according to mean relative                            display rather sudden changes around 1920, albeit for different
frequency and then use principal component analysis (PCA) on sets                               genre, ‘last year’ becomes more frequent in the news domain and
of 50 bigram features, as we have found estimation and later inter-                             after that magazines, while for ‘last week’, the order is reversed:
pretation of components to be better, when the document-feature                                 first magazines and then news.
ratio is in favour of more samples. PCA is an unsupervised statisti-
cal technique to convert a set of possibly related variables to a new                           4.2                        Type-token analysis
uncorrelated representation or principal components. This type of                               In order to understand the underlying development in the data bet-
analysis groups features according to common variance patterns                                  ter, yielding a more informed analysis, we explore some methods
and can help to detect features that vary in a similar way. The results                         6 For this experiment only, we took logarithms of relative frequencies before applying
of running PCA on the 50 most frequent noun-noun/adjective-noun                                 PCA.
                                                                                            3
inspired from data analysis in the financial sector, Rate of Change             detection in cloud data in the presence of anomalies.7 The proposed
(RoC). The methods described here are used in section 5. In par-                approach (‘E-divisive with Medians’(EDM)) is a non-parametric
ticular, we are interested in what way different groups of features             technique using medians and estimating the statistical significance
change over time with respect to different quantities. For this, we             of a change point through a permutation test. We found this tech-
consider two basic linguistic measures, type-token ratio (TTR) cal-             nique to return fewer change points that were more spread out than
culated as the number of types in a particular category divided                 distribution based change point methods, rendering it even more
by all the number of tokens and a type-vs-all-types ratio (TYPR),               desirable as our data is not always normally-distributed. We found
whereby we compare the number of types in a particular category                 that using transformations occasionally smooths over interesting
to all types. Further we combine these two ideas with the RoC in                developments making these less desirable to use in this context.
order to gain insights in how types of features not only change
over time, but are related to the respective previous time instance.            5                 DATA EXPLORATION
   The log RoC is defined as, the value of the variable Vt today                In the following section, we look at unigram and bigram instances of
divided by the value yesterday Vt −1 as shown in eq. 1. Thus, this              both function and content types to gain an intuition about general
is showing how a particular value changes, for instance from one                trends in the data. We begin by exploring changes in type and token
year to the next.                                                               relations in unigrams of both a function (determiner) and a content
                                         Vt
                                              !
                                                                                (noun) category. For the determiner category, we considered both
                          RoC ln = ln                               (1)
                                        Vt −1                                   instances corresponding to ⟨ DT⟩ and ⟨ WDT⟩, thus both the.DT and
   For most of the analysis, we use simple TTR and TYPR, except                 which.WDT in contexts, such as ‘Which/The book...’.
for figure 5, for which a slightly modified version of the RoC is used.
Rather than having the current value at time t in the numerator,
we consider the set of linguistic types that two time points have               Figure 3: News corpus: determiner types divided by total to-
in common , as shown in Eq. 2. Thus, |typesyt | ∩ |typesyt −1 | refers          kens(TTR) or types (TYPR).
to the size of the group of features found in year yt and yt −1 with
respect to the number of tokens in yt −1 . The second version, shown                              0.0020
in eq. 3 relativizes with respect to the number of total types in yt −1 .
                                                                                                                                                                                    ●    TTR
In the following, we refer to these to measures as TTR’ and TYPR’ to
                                                                                                                                                                                         TYPR
distinguish these from static type-token ratios. As is common with                                0.0015
                                                                                 Proportion (%)


financial data to achieve symmetry between decrease and increase,
we take natural logarithms.                                                                       0.0010

                                |typesyt | ∩ |typesyt −1 |
                                                             !
               TT Ry′ t = ln                                         (2)
                                      |tokensyt −1 |                                              0.0005

                                                                                                                    ●                           ●
                                                                                                                ●●●● ●●
                                |typesyt | ∩ |typesyt −1 |                                                                     ●●
                                                             !                                                         ●●●●●       ●●         ●
                                                                                                           ●● ●●            ●●● ●●● ●●●●●●●●●● ●                                         ●            ●
                                                                                                             ●
               TY PRy′ t = ln                                        (3)                          0.0000
                                                                                                                                                 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●●


                                       |typesyt −1 |                                                   1880       1890      1900      1910     1920      1930      1940      1950       1960    1970
                                                                                                                                                        Time
   These two measures allow us to observe how the broader cate-
gories behave with respect to feature types and what proportion
these take of either all types or tokens.                                           Figure 3 shows the number of determiner types with respect to
                                                                                all types/all tokens over time. The top line shows a sharp decrease
4.3    Change-point Detection                                                   in 1920 for the determiner vs. all types ratio, with this being also
Change point analysis is the analysis of a time-series with the aim             visible but less pronounced for the type-to-token ratio. One aspect
to detect specific points t in time that separate the points before             that needs to be taken into account in this context is the influence
and after it with respect to some criterion. More formally, aspects of          of types attested for in all time instances. We refer to these as the
change point analysis can be defined as follows: given a time-series            ‘universally’ constant types to distinguish between these and the
{yt : t ∈ 1, ...n}, a change point occurs if there exists a time k,             types that are ‘partially’ constant appearing in a few consecutive
where 1 ≤ k ≤ n − 1, such that the distributions of {y1 ...yk } and             years but not in all.8 The shortest span of constancy is two instances
{yk +1 ...yn } are different with respect to some criterion, i.e. change        (years), which we refer to here as ‘pairwise’ constancy. The concept
in mean, change in regression or change in variance. For this analysis,         of constancy in itself has to be distinguished from possible asso-
we are primarily interested in changes in mean as these would sig-              ciated frequency distributions. A feature could appear in all time
nal a higher or lower average usage of a feature with respect to an             instances and be therefore constant, but might vary considerably
earlier time period, while for instance a change in variance would              with respect to its relative frequency. With respect to determiners,
indicate greater or lesser variability in how a feature is used. As we          the proportion of constant determiners of all occurring determiners
are interested in long-term change, that lasts at least 10 years or so          is relatively high indicating that other constancy types shared less
we require a change point detection technique that is less volatile             7 This is implemented in the R package ecp [6].
to short-term fluctuations in the data. For our experiments, we                 8 By ‘universally’ the span of our entire data set is meant rather than any data and
chose the approach by James et al. [5], originally used for breakout            time space that could be examined in this way.
                                                                            4
of the variation observed in figure 3. Thus, as would be expected
with a true function category, most of its types account for a high                                                           Figure 5: News corpus: pairwise appearing adjective-noun
proportion of all of its types as well as variation in frequency over                                                         types against all types/tokens.
time. With the noun category, in this case only considering singular
and plural common nouns the situation is somewhat reversed. All                                                                                       −1

common noun types account for c. 38% of the entire token variety,                                                                                                           ●   TTR'
but only 0.05% of these are types that are universally constant. Sim-                                                                                                           TYPR'                     ●


                                                                                                                               log (Proportion (%))
ilarly, the non-universally constant types also account for most of
                                                                                                                                                      −2
the tokens, indicating that variety rather than constancy of types is
predominant here. So we expect the proportions of universally con-                                                                                                                                                           ●
                                                                                                                                                                                                                                                                                  ●
stant features to behave differently for nouns and determiners. We                                                                                                                                                                 ●                       ●         ●
                                                                                                                                                                                                                                           ●                                            ●
                                                                                                                                                                                                                                                   ●           ●                  ●
test this by subtracting the proportion of constant determiner types                                                                                                    ●
                                                                                                                                                                                        ●             ●         ●                     ●
                                                                                                                                                                                                                                                   ●
                                                                                                                                                                                                                                                                ●        ●●
                                                                                                                                                                                                                                                                                      ●
                                                                                                                                                                                                                                                                                       ●
                                                                                                                                                      −3                                                                ●                              ●         ●   ●
                                                                                                                                                                            ●               ● ●                     ●       ●                  ●
                                                                                                                                                                        ●●                                    ●         ●               ●                                               ●
of all types from the proportion of all determiner types of all types                                                                                      ●
                                                                                                                                                            ● ●
                                                                                                                                                                  ●  ● ● ●●                     ● ●          ●
                                                                                                                                                                                                            ● ●     ●
                                                                                                                                                                                                                       ● ●●
                                                                                                                                                                                                                           ●
                                                                                                                                                                                                                              ● ●
                                                                                                                                                                                                                               ●●
                                                                                                                                                                                                                                  ●    ● ● ● ●●
                                                                                                                                                                                                                                                           ●
                                                                                                                                                                                                                                                                 ●
                                                                                                                                                                                                                                                                         ●
                                                                                                                                                                                                                                                                             ●●
                                                                                                                                                                                ● ●●               ●            ●
                                                                                                                                                                                     ●●● ●                 ●         ●   ●
for each year and compare these differences using the Wilcoxon                                                                                                     ●●
                                                                                                                                                                            ● ●     ●   ● ● ●                                                  ●
                                                                                                                                                                                                                                                       ●
                                                                                                                                                                                                                                                                              ●
                                                                                                                                                               ● ●                                ●    ●
signed rank test to the same quantities for nouns. The difference                                                                                                                 ●
                                                                                                                                                           ●
in means over these yearly differences is significant, meaning that                                                                                    1880          1890       1900    1910          1920          1930    1940          1950         1960              1970
universally constant types behave differently in each group. Con-                                                                                                                                               Time

versely, comparisons based on the complement of those universally
constant features is also significant. In terms of frequency changes,
                                                                                                                              to these groups might only be found in this particular domain
one can observe a sharp drop in tokens (TTR) and a slightly more
                                                                                                                              of news articles, where language might be more variable than in
temperate downward curve in types (TTYR) after 1920.
                                                                                                                              other domains, such as fiction or non-fiction. Thus, one would not
                                                                                                                              necessarily expect these findings to translate to other genres.
Figure 4: News corpus: adjective-noun types divided by all
unigram types/tokens.
                                                                                                                              6                       CHANGE-POINT EXPERIMENTS
                                                                                                                              Based on the exploratory analysis described in section 4.1, we chose
                                                                                                                              ‘last year’ as our variable of interest and determine by change point
                                                                                                                              analysis whether there is an abrupt sort of change as opposed to
                  0.12                 ●    TTR                                                                               a more gradual trend. Figure 2 shows two adjective-noun phrase
                                            TYPR                                                                              bigrams, ‘last week’ and ‘last year’. Both are universally constant
 Proportion (%)


                  0.10
                                                                                                                              over both the news and magazine corpus, but not constant for fiction
                                                                                                                              and non-fiction.
                  0.08

                                                                                                                              6.1                          Discovery of Related Variables
                                                                                                      ●●          ●●
                  0.06                                                         ●
                                                                                ●     ●          ● ●●● ●●●●●●●●●●● ●●●●       Given our variable of interest, we seek to find variables that display
                                                                                    ●● ●●●●
                                                              ●
                                                                            ●    ●●●       ●●●●●● ●●
                                                             ●          ● ● ●●●
                                                            ●    ●  ●●●● ●●                                                   similar change or as we hypothesise are somewhat responsible for
                         ●
                                  ●●
                                 ● ●●● ●             ●     ●   ●● ●●
                          ●●●●●●●     ● ●      ●       ●●●●
                                         ●●● ●● ●●●●● ●
                  0.04                      ●●                                                                                the change observed in that universally constant feature.
                     1880       1890       1900    1910     1920     1930      1940     1950     1960      1970                  Our set of suitable candidate variables comprises the set of bi-
                                                                    Time                                                      gram adjective-noun/noun-noun combinations, that in contrast to
                                                                                                                              our main variable need not be universally constant, but might only
   As a final step, we consider word bigram sequences, in particu-                                                            turn out to be partially constant over the entire time span. Each
lar the group of bigrams comprised of either adjective + common                                                               variable’s frequency in the group is relativized with respect to the
noun (plural/singular) or bigrams of only common nouns (plu-                                                                  entire token count for the respective year.
ral/singular). Figure 4 shows the adjective-noun bigram types with                                                               The very first step in this is to run a change point analysis for
respect to all types/all tokens. Interestingly, although the com-                                                             ‘last year’ over the entire 100-year span in order to ascertain the
mon noun unigram types are decreasing over time, these bigram                                                                 exact point of change. As could be observed from figure 2, a change
types are increasing in proportion to both all types and tokens.                                                              happened a little after 1920 with the period afterwards giving rise
The universally constant features in this set are very limited in                                                             to a higher frequency pattern than the years leading up to it. One
this subgroup and largely all variation is accounted for by partially                                                         then chooses an interval of a certain length after the change point
constant features.                                                                                                            to limit the number of candidate features for examination. The
   Figure 5 shows the pairwise ‘appearing’ features, i.e. those fea-                                                          rationale in this case being that given a rise in frequency after the
tures found at time t, that are not at time t − 1 with respect to all                                                         change point one would expect correlated features to be partially
types (TYPR’)/tokens (TTR’) at time t − 1. As is observable, there                                                            constant for at least a certain period of time afterwards, e.g 10 years.
is a comparatively large increase of new features in 1920 and a                                                               We therefore extract the partially constant features for this time
somewhat smaller increase again at c. 1939. This indicates that                                                               period only. We then take the remaining features and calculate their
there might be new concepts emerging for this feature type that                                                               individual change points over the entire 100-year period. Given all
have not been there previously. The variation found with respect                                                              change points over all features, these are divided into three different
                                                                                                                          5
         Table 1: Correlation between ‘last year’ and chosen features. Universally constant features are marked in italics.

          No.              1920-1930                      1930-1940                         1940-1950                         1950-1960                        1960-1970
            1.   first half               0.88   first quarter          0.85    first year                   0.8   automobile industry       −0.87    other countries        −0.76
            2.   floor leader             0.85   second quarter         0.83    international law         −0.76    european countries        −0.82    government officials   −0.75
            3.   first quarter            0.84   common stock           0.82    other words               −0.72    democratic leaders         0.81    political parties       −0.7
            4.   recent weeks             0.55   current year           0.82    public utility            −0.66    british government        −0.74    european countries     −0.69
            5.   current year             0.77   first time            −0.74    international relations   −0.65    overwhelming majority        0.7   political leaders      −0.66
            6.   farm relief              0.76   tomorrow morning       0.67    national policy           −0.61    last week                   0.69   other words            −0.64
            7.   same period              0.75   first half             0.65    late today                 −0.6    british empire            −0.62    open market             0.59
            8.   automobile industry      0.75   same period            0.64    near future               −0.57    floor leader              −0.62    low prices              0.57
            9.   weather conditions       0.73   stock market           0.63    european countries          0.53   american people            −0.6    american government    −0.57
           10.   same time                0.72   low prices             0.63    oil production              0.53   other countries           −0.55    vice president          0.55
           11.   executive session        0.72   american people       −0.62    american people           −0.51    next year                   0.54   other nations          −0.53
           12.   whole world             −0.71   business conditions    0.61    political parties         −0.51    next month                  0.54   present conditions     −0.53
           13.   second quarter           0.69   present time            0.6    low prices                   0.5   current year              −0.54    third quarter           0.51
           14.   motion picture           0.67   whole world           −0.58    last week                   0.49   last summer                 0.53   several occasions      −0.51
           15.   vice president           0.66   last week              0.58    vice president              0.45   disarmament conference      0.53   war debt                −0.5
           16.   american people         −0.65   law enforcement        0.56    crude oil                   0.45   first year                  0.52   past week               −0.5
           17.   first year               0.65   good business          0.55    next year                   0.45   crude oil                   0.51   american people         0.46
           18.   recent years             0.63   present indications    0.54    next few                  −0.45    present time              −0.46    farm products           0.46
           19.   american government     −0.62   past year              0.53    present conditions        −0.45    recent years                0.45   large majority         −0.45
           20.   public interest          −0.6   near future            0.52    recent months               0.43   political leaders         −0.45    present time           −0.44
           21.   important factor          0.6   income tax             0.48    stock market               -0.42   international relations    -0.45   foreign countries      -0.44


groups of features, those whose change points occur before the                                      after that, so somewhat of a transition period where different con-
main feature’s change point (in this case ‘last year’), those whose                                 cepts have similar trends to ‘last year’. There are a few temporal
occur exactly at the same time and those whose occur after. Only                                    expressions and politically/industry-related terms, such floor leader,
those features that change significantly with respect to their mean                                 executive session,vice president and automobile industry and a few
within 10 years before or after the main feature’ change are retained,                              expressions (possibly temporal), that would probably be anchored
the reason being that we deem it unlikely that those changes more                                   more strongly in the business context, such as first quarter and
remote in time would be related. Using the present method for                                       second quarter. The second time window spanning 1930-1940, fea-
detection, features usually do not have more than one change point                                  tures various concepts related to the stock exchange and business,
and in the cases that they have two, these are separated by a time                                  such common stock, stock market, business conditions and income
span of at least 20 years. A change point indicates a change in                                     tax as well as a few temporal expressions possibly used in this
mean and what follows could either be an increase or a decrease in                                  context, such as first quarter and second quarter. Interestingly, over
frequency.9 As we only focus on features with similar trends, we                                    the next time span covering 1940-1950, for instance stock market
discard those with an opposing trend to our candidate feature, by                                   goes from being reasonably positively correlated (0.63) to being
calculating the correlation between ‘last year’ and each feature over                               negatively correlated (−0.42). and other concepts, such as european
the interval covering 15 years on either side of a feature’s change                                 countries and oil production take precedence instead. In the next
point and only retaining those features for which this correlation is                               time window (1950-60), the highest rated concepts are negatively
positive.10 In the present case, the specifications were set as follows:                            correlated with ‘last year’, this effect becoming even stronger in the
the change point for ‘last year’ was estimated at 1923, so we choose                                very last time frame of 1960-70. Overall, we interpret this to mean
the interval spanning the years 1924-1934 to look for features that                                 that very different concepts come to be used with ‘last year’ than
are constant over this period of time. We would not expect the                                      provided the basis for this set of correlated features. Certain events,
exact time frame to be of high importance, as one would expect                                      such the surprising wall street crash in 1929 could have caused
most features to level off more gradually over time. After discarding                               temporal expressions to gain more prominence and created an at-
features not constant over this interval, 103 features are left, where                              mosphere of immediacy that at least in the news world made the
at least 22 of these are also temporal expressions. In fact, when                                   use of temporal expressions more likely. With WWII and the cold
we consider the universally constant adjective-noun combinations                                    war shortly following, this might have kept the temporal dimension
that are constant over the entire 100-span, the majority of these                                   palpable. When we examine the list of change points, including the
turn out to be temporal expressions (12/16). The fact that not more                                 ones more than 10 years after ‘last year’, it is noticeable that a few
features are constant over the entire span hints at the domain being                                expressions’ points of change lie very close together, for instance
somewhat volatile with respect to content sequences.                                                stock market, preferred stock, financial position, first quarter and third
   Table 1 shows the highest pairwise correlations (either nega-                                    quarter all change in either the year 1915 or 1916 and in 1945 or
tive or positive) between ‘last year’ and each of the 104 features                                  1946. Figure 6 depicts this overlap in increase after the first change
over smaller intervals of 10 years from 1920 to 1970, where the                                     point and return to initial mean frequency pattern after the second
universally constant features are marked in italics. The first inter-                               change point.
val covers a few years before the change point and a few years                                         Another aspect that is noticeable in the results is that various
                                                                                                    temporal expressions appear in the list of features highly correlated
9 We focus on synonymous changes and causes here, i.e. the parallel increase of two                 with ‘last year’. This suggests that temporal expressions in general
features together, rather than assuming that a decrease in one feature causes an increase           increased in usage over time with respect to this genre. Figure 7
in the other feature, although this would also be a valid scenario.                                 shows a few of the expressions from table 1. All seem to increase
10 We used the Spearman rank coefficient for this, as available from the core R package.

                                                                                              6
Figure 6: News corpus: relative frequency of items with                                                                                                                                                           Figure 8: Fiction corpus: relative frequency of highest ‘last
change points around 1915-16 and 1945-46.                                                                                                                                                                         year’ correlated features in the fiction genre.

                                                  ●       stock market                                                                                            ●                                                                                                                                          ●   last year
                                                          first quarter                                            ●                                                                                                                                                                                             good evening
                                                          preferred stock                                                                                                                                                               0.00015                                                                  little girl
 Relative Frequency


                                                                                                                                                                                                                   Relative Frequency
                      0.00010                             third quarter                                                                                                                                                                                                                                          good night
                                                          financial position                                                                                                                                                                                                                                     good time
                                                                                                              ● ●
                                                                                                                                                                                                                                        0.00010
                                                                                                                    ●


                      0.00005                                                                         ●
                                                                                                                            ●
                                                                ●     ●                       ●                                     ●
                                                                                                                                                                                                                                                                                      ●
                                                                                                                                 ●                                            ●                                                         0.00005
                                                                                                         ●
                                                                                                      ● ●              ●                                                                                                                                                                    ●
                                                                                                                        ●                                                                                                                                                                              ●
                                                                           ●                                   ●                                                      ●                                                                                                                         ●                             ●          ●
                                                                                                                                                                                                                                                                                                 ●                                           ●
                                                                                          ●       ●                      ●                                                                                                                           ●
                                                                                                                                                                                                                                                                          ●                         ●
                                                                                                                                                                                                                                                                                                         ●   ● ●    ●
                                                                                                                                                                                                                                                                                                                         ● ●       ●● ●    ● ● ●●
                                                                                                                                                                                                                                                                                                                                                 ●
                                                           ●●          ●              ●                                                                         ●       ●●                                                                              ●
                                                                                                                                                                                                                                                          ●
                                                                                                                                                                                                                                                             ●●   ● ● ●       ● ●●           ●          ● ● ●          ●● ●● ● ●● ●            ●  ●
                                    ● ●                                         ●
                                                                                              ●   ●                             ●              ●         ●●      ●                                                                                                   ●●            ● ● ●          ●             ●●
                                                                                                                                                                                                                                                                                                                   ● ●●               ● ● ●
                                              ●                                                                                          ●           ●            ● ● ●                                                                              ● ●       ●●●      ●● ●●● ●● ● ●●  ● ●        ● ●●    ● ●                       ● ●
                                     ●                          ● ●●           ●    ●                                                                      ● ● ●   ●
                                                                                                                                                                          ● ●●●●                                                                  ●●●    ● ●●      ●
                                                                                                                                                                                                                                                                                           ●                      ●              ●          ●
                                                                                                                                        ● ●●●● ●●● ●● ●     ●        ●                                                                                                                   ●
                      0.00000 ● ● ●● ●●●●● ● ●                            ●●   ● ●●● ●● ●                                                     ●   ●    ●     ●             ●                                                            0.00000                                               ●


                               1880           1890                  1900           1910       1920             1930                  1940             1950            1960              1970                                                  1880       1890     1900      1910     1920         1930    1940      1950       1960   1970
                                                                                                              Time                                                                                                                                                                              Time


Figure 7: News corpus: relative frequency of temporal ex-                                                                                                                                                         interval of ± 15 years around its own change point, with the highest
pressions.                                                                                                                                                                                                        correlation being around 0.4. Figure 8 shows them side-by-side with
                                                                                                                                                                                                                  ‘last year’. This suggests that the temporal aspect has not grown
                                                      ●       last year                                            ●
                                                                                                                                                                                                                  as much in importance in this genre and is less closely linked to
                                                              last week
                                                                                                                                                                                                                  adjective-noun types as it seems to be the case in the news domain.
                      0.0003                                  current year                                                           ●
                                                                                                                                                                              ●               ●                       In order to validate this in the news data, we consider actual
                                                                                                                                                                      ●
                                                                                                                                                                                                                  language samples for co-occurrence of items highly correlated with
 Relative Frequency


                                                                                                                                                                                           ● ●            ●
                                                              past year                                                     ●
                                                                                                                                                                          ●
                                                                                                                                                                                  ● ●
                                                                                                                                                                                              ●
                                                                                                              ● ●                   ●                                                       ●

                                                              next year                                ●
                                                                                                               ●                                                 ●
                                                                                                                                                                              ●●
                                                                                                                                                                                   ●
                                                                                                                                                                                        ●
                                                                                                                                                                                                  ●               ‘last year’. We randomly extract sentences containing ‘last year’
                                                                                                                                ●                               ●
                                      ●                                                                                                                                                               ● ●
                      0.0002                                                                                        ●        ●
                                                                                                                                              ●        ●
                                                                                                                                                                      ●
                                                                                                                                                                                                      ●
                                                                                                                                                                                                                  and observe what concepts co-occur in the same sentences. We
                                                                                                                     ●                                ●
                                                           ●                                               ●                                                  ●                                   ●
                                 ●
                                  ●
                                                       ●
                                                                                                      ●
                                                                                                                                        ●     ●       ●                   ●
                                                                                                                                                                                       ●                          take ten samples each from before 1923 (1910-1920), immediately
                                          ●
                                              ●                                                                         ●                   ●●                    ●
                                                                                                          ●
                                ●                                                     ●
                                                                                           ●
                                                                                          ● ●●
                                                                                                                                                           ●●                                                     after (1924-1934) and again at a later stage (1950-1960).
                                                      ●                    ●                                                                      ●
                                                                                            ●
                      0.0001 ●       ●       ●                             ●   ●
                                                                                                                                         ●                ●                                                           Table 2 shows salient concepts occurring in the same sentence as
                                                                                                                                                  ●
                                                   ●      ●                    ●
                                                  ●           ●● ●                   ●
                                                                                                                                                                                                                  ‘last year’ for all three time periods. The number in bracket indicates
                                                                    ●
                                         ●                      ●                  ● ● ●●
                                                                 ●   ●●             ● ●
                                                                   ●                     ●
                                                                                                                                                                                                                  in how many sentences of the ten selected ones the term occurred.
                      0.0000                                                                                                                                                                                      The terms occurring in the first time span are mostly related to
                           1880              1890               1900           1910       1920                 1930                 1940              1950            1960              1970                      elections and governments with some more general political topics,
                                                                                                              Time
                                                                                                                                                                                                                  such as company and wages entering into it as well. The second
                                                                                                                                                                                                                  time span set around the change in ‘last year’ seems to contain
in frequency over time. However, correlation analysis might be                                                                                                                                                    almost exclusively stock exchange related news items. The final
a little volatile in that smaller spans of the entire period are not                                                                                                                                              period, set after the end of WWII contains very mixed samples from
representative of the overall correlation. For this reason, we seek to                                                                                                                                            sports, to international politics, companies and space programs.
validate our results further, which will be done in the next section.                                                                                                                                             Although extracting a few random samples from a large set of
                                                                                                                                                                                                                  texts cannot provide very fixed conclusions, these results seem
6.2                       Validation of Results                                                                                                                                                                   to support our earlier findings of a strong correlation between
As the final part of this analysis, we seek to further validate our                                                                                                                                               stock exchange related items and the temporal expression ‘last
results. One part of this is to see whether this effect also exists in less                                                                                                                                       year’ during a particular time period, where this seems to have
changeable genre, such as fiction. Thus, we repeat the exact same                                                                                                                                                 dominated the news. In order to see to what extent this effect
experiment, but using the fiction corpus as a basis rather than the                                                                                                                                               generalizes to other temporal expressions, we need to analyze these
news corpus. We first estimate possible change points on the basis                                                                                                                                                separately.
of the new corpus. Interestingly, the change point for ‘last year’ in
the fiction genre happens earlier, around 1917. There seems to be                                                                                                                                                 7                      DISCUSSION
a lot less variety in adjective-noun combinations as the partially                                                                                                                                                We have reported an exploratory analysis to investigate the re-
constant features over 1918-1928 only add up to 27. Of these 27,                                                                                                                                                  lationship between temporal expressions, such as ‘last year’ and
only 9 are positively correlated to ‘last year’ based on a span of ±                                                                                                                                              temporally less stable word expressions that appear and disap-
15 around their individual change point.                                                                                                                                                                          pear over time. We hypothesized that these fluctuating words that
   However, only good evening, little girl, good night, good time and                                                                                                                                             are more strongly connected to current events would somewhat
very well are actually positively correlated with ‘last year’ over an                                                                                                                                             influence the rise in frequency of more stable concepts, such as
                                                                                                                                                                                                              7
Table 2: News corpus: salient words occurring with ‘last year’ in 10 randomly selected sentences for each of the time periods:
1910-1920, 1924-1934, 1950-1960.

     News corpus      salient words occurring with ‘last year’
                     (primary) election(2), party(2), board of education(2), mexican bullets (1), company (3),
        1910-1920
                     director(s)(2), railroad(1), wages(1), shareholders(1), submarine(1), national committee(1)
                     adjustment bond(1), common stock(2), stock (dividend)(2), (cash) investment (2), congress (1), sales(1), share(1),
        1924-1934
                     corporation(1), net profit(1), minor purchases(1), liquidation(1), dividend rate(1), president(1), preferred dividends(1)
                     tournament(1), basketball coach(1), chicago medical society(1), tax bill(1), (space) administration(2), international agreement(1),
        1950-1960
                     wage(1), arbitration(1), net income(1), auto companies(1), production schedules(1), national aeronautics(1), russians pioneer(1)


temporal expressions. Our results suggest that there might indeed                 words or expressions that are stable in occurrence, might be rather
be a connection between ‘last year’ and clusters of words linked to               volatile with respect to their relative frequency distribution. As
historical events, such as the stock marked crash. However, while                 temporal expressions have fewer semantic associations, they might
stock market related words are only constant and very frequent                    depend more strongly on features that do.
for a limited time frame, ‘last year’ and other temporal expressions                 The results we have obtained are tentative and in order to claim
remain frequent. We believe that this could be due to temporal                    an increase of temporal expressions possibly related to certain
aspect in news language having become more important after 1923,                  historical events, one needs to show this effect to hold for other
having gathered momentum through events, such as the stock mar-                   temporal expressions as well as exclude any possible semantic shift.
ket crash and then remained to stay. Our parallel analysis of fiction             We also need validation from historians to interpret and relate our
data at the same time seems to confirm this insofar as this effect                results to historical and cultural changes in or around 1923. Partic-
is not found with the same strength in fiction data. Based on our                 ular language usage and change therein can reflect shifts in society
language sample analysis that appears to support our change point                 and general opinion, adding a more subtle basis for interpretation
and correlation analysis, ‘last year’ is continued to be used in vari-            of past events.
ous different concepts, possibly more varied than before 1923. Our
analysis using change points adds to a simpler relative frequency                 Acknowledgement
detection approach by considering the uncertainties associated with               We would like to thank our anonymous reviewers for their helpful
our predictions. Although without having conducted a semantic                     suggestions on how to improve the earlier version of this paper. This
change analysis, we cannot be entirely certain that this change is                research is supported by Science Foundation Ireland (SFI) through
not caused by a shift in semantics, however, the possible semantic                the CNGL Programme (Grant 12/CE/I2267 and 13/RC/2106) in the
space of temporal expressions could be seen as more limited than                  ADAPT Centre (www.adaptcentre.ie)
that for regular common nouns or adjectives. In fact, these temporal
expressions might semantically be closer to function types than to                REFERENCES
content types, in spite of belonging to the latter word class. In a                [1] Joan Bybee and Sandra Thompson. 1997. Three frequency effects in syntax. In
                                                                                       Annual Meeting of the Berkeley Linguistics Society, Vol. 23.
sense, temporal adverbs are similar to prepositions, only anchored                 [2] Walter Daelemans. 2013. Explanation in computational stylometry. In Computa-
in time rather than in space and consequently there might be less                      tional Linguistics and Intelligent Text Processing. Springer, 451–462.
room for reinterpretation of their meaning.                                        [3] Mark Davies. 2010. The Corpus of Historical American English: 400 million
                                                                                       words, 1810-2009. http://corpus. byu. edu/coha/. 24 (2010), 2011. (last verified:
   Our results also need validation from historians, especially with                   24.08.2015).
respect to events in 1923 that could have caused temporal expres-                  [4] William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2016. Cultural Shift or
sions to become more frequent. The type of analysis we have done                       Linguistic Drift? Comparing Two Computational Measures of Semantic Change.
                                                                                       In Empirical Methods in Natural Language Processing (EMNLP).
here shows changes in words’ relative frequency patterns that could                [5] Nicholas A James, Arun Kejariwal, and David S Matteson. 2016. Leveraging cloud
reflect political or cultural changes. In this, we are at the mercy                    data to mitigate user experience from Breaking Bad. In Big Data (Big Data), 2016
                                                                                       IEEE International Conference on. IEEE, 3499–3508.
of the sampling of our newspaper corpus that although balanced                     [6] Nicholas A James and David S Matteson. 2013. ecp: An R package for non-
over different sources is not impervious to other external factors                     parametric multiple change point analysis of multivariate data. arXiv preprint
that could influence the language samples. For instance, by the                        arXiv:1309.3295 (2013).
                                                                                   [7] Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2015. Sta-
mid-1920s, the businessman William Randolph Hearst had acquired                        tistically Significant Detection of Linguistic Change. In Proceedings of the 24th
28 newspapers, that consequently have been subject to same edi-                        International Conference on World Wide Web (WWW ’15). International World
torial decisions, distorting our perception of what language was                       Wide Web Conferences Steering Committee, Republic and Canton of Geneva,
                                                                                       Switzerland, 625–635. https://doi.org/10.1145/2736277.2741627
representative for that time.                                                      [8] Meik Michalke. 2014. koRpus: An R Package for Text Analysis. http://reaktanz.de/
                                                                                       ?c=hacking&s=koRpus (Version 0.05-4).
                                                                                   [9] Alex Riba and Josep Ginebra. 2006. Diversity of vocabulary and homogeneity of
8   CONCLUSION AND FUTURE WORK                                                         literary style. Journal of Applied Statistics 33, 7 (2006), 729–741.
                                                                                  [10] Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees.
In essence, this work has been exploratory trying to connect groups                    In Proceedings of international conference on new methods in language processing,
of words that might not occur close to each other in space making                      Vol. 12. Manchester, UK, 44–49.
their relatedness less tangible. Although, additional work is needed
to further support our findings, our results tentatively suggest that
                                                                              8