Detecting linguistic change based on word co-occurrence patterns Carmen Klaussner Carl Vogel Arnab Bhattacharya Trinity College Dublin Trinity College Dublin Trinity College Dublin klaussnc@tcd.ie vogel@tcd.ie bhattaca@tcd.ie ABSTRACT items might appear across different frequency strata depending on Diachronic linguistic analysis focuses on detecting elements of the exact time period examined. language change over time. This change can take different forms, We aim to show that even the open-class/closed-class view of as for instance certain words could show a slow increase or decrease features might be insufficient when observed through the lens of in frequency over time as these become more popular or obsolete. temporal text representation. While something might bear the label We are interested in sudden change in words that are attested for in of common noun, it could in fact be closer to a function word, every time slice of the overall examined time period. In particular, behaving and being affected by similar factors, such as regular we are trying to relate the change in frequency of words that are occurrence in different contexts. always there to words that emerge around certain points in time In this work, we consider the analysis of the more regular and and only remain frequent for shorter periods of time, suggesting possibly also frequent items, in particular those appearing in all they are more prone to sudden changes in popular topics or could be years of the time period examined. Typically, these features are influenced by historical events. This addresses the question of how more general in meaning (e.g. temporal expressions) rendering them the more regular word expressions’ frequencies are influenced by suitable for a variety of language contexts other than strongly topic- new clusters of words appearing and disappearing. Although there related words, such as hurricane or computer. Our main hypothesis might be links to collocation analysis, words that occur frequently is that these items change through other less frequent items that next to each other are not of primary interest here, but rather words are more prone to topic change over time, such as concepts relating that are conceptually related, where one is causing or affecting to the outbreak of a war or natural disaster. The type of change we the frequency of the other, which causes them vary in a similar are looking for is a change in mean, whereby a feature changes its fashion. We use statistical change point analysis for identification relative frequency fairly abruptly at time t, rising or falling to a new of significant change over time and seek to validate our findings by level and remaining there for at least some time. If one compares randomly extracting example sentences from the data. the mean over the samples before time t to the mean taken over the samples after time t, one obtains significantly different means. This type of research has to be distinguished from two related areas KEYWORDS of research, i.e. semantic change in the form of neologisms and diachronic analysis, language change, change point analysis collocation analysis. Semantic change analysis is different in that it considers cases whereby a word acquires a new sense and possibly also a second part-of-speech class and could subsequently be used 1 INTRODUCTION in different syntactic contexts, whereas here we consider changes of word frequencies and their possible non-semantic change related Findings from temporal studies offer an important source of enrich- causes, as for instance a particular temporal expression used for ment and validation for non-temporal studies, especially when one contrasting different situations, e.g. ‘If I had only known then, what I needs to disentangle (general) temporal effects from non-temporal know now.’ Also, conceptually, these regular and irregular appearing ones, as for instance in the realm of stylometry and authorship words could be relatable through collocations or otherwise longer attribution [2]. Change in linguistic variables can occur in differ- n-gram sequences. While the method presented here could be used ent shapes and forms: slow gradual change as opposed to sudden for their detection as well, it is not limited to relationships between and abrupt, short as well as long-term effects. Differences could be words that occur close to each other, but also words or expressions rooted in levels of linguistic abstraction, as for instance individual that only share a conceptual rather than spatial relationship, such words are likely to show more variation over time than entire word as the first (‘Detecting’) and last (‘patterns’) word of the title of classes, where smaller fluctuations would be averaged and only this work, that as 8-grams are rarely computed, is less likely to larger trends pervading the entire group would be more easily dis- be captured by collocation analysis, while the terms are clearly cernible. Features are usually classified along one or more different conceptually related. dimensions, such as membership of either an open-or closed-class The remainder of this work is organized as follows: section 2 or according to the frequency strata, i.e. frequent, medium-frequent discusses related research in the field; section 3 provides informa- and rare, they belong to. However, even though these classifica- tion about the data and pre-processing steps; section 4 presents tions are often represented as categories suggesting there is a clear the methods we employed. Before moving on to the experiments boundary between, for instance what is frequent and infrequent, a in section 6, section 5 considers trends in larger groups of word continuous representation, especially considering temporal effects expressions to anchor our findings based on individual words. Sec- would occasionally seem more suited. This is especially true in the tion 6 then presents our change point experiments, and empirical realm of diachronic analyses, the analysis of texts over time, as most validation of these through the actual data. We discuss these results one frequency based, whereby sudden changes in word usage are in section 7 and conclude the work in section 8. captured. The second one involves a syntactic time-series analysis, analyzing word’s part-of-speech tag distributions and finally they 2 PREVIOUS RESEARCH construct a distributional time-series by considering contextual Different areas of linguistic research consider the change of broader cues from word co-occurrence statistics. Using human evaluators categories of words, such as frequency effects in syntax [1], largely to assess the performance of their models, they find the highest distinguishing between type and token frequency of a particular amount of agreement between annotators and method with respect variable or category. Bybee and Thompson [1] discuss three fre- to words that have undergone change is the distributional method quency effects that are important not only in shaping phonology with c.53% average agreement compared to c.22% (syntactic) and and morphology, but also syntax; two effects are caused by high c.13% (frequency). Another change point oriented analysis was token frequency, which have adverse tendencies that can only be addressed by Riba and Ginebra [9], which investigates a possible explained by considering the influence of the third frequency effect change in authorship of Tirant lo Blanc, identifying a clear single of high type frequency. A high token frequency of an item pro- sudden change point that is supported by cluster analysis. Our motes its reduction, as visible in conventionalized contractions in work examines possible changes in features that are both regular English (I’m, can’t). In contrast, the ‘Conserving Effect’ is visible in occurrence and highly frequent caused by features that are only with high token items, where the more the form is used the more it highly frequent over a short period of time, similar to semantic is strengthened, compare normalization of the English past tense cultural shifts, but for the difference that these words would not of ‘weep’ from wept to weeped, compared to high frequency items, necessarily take on a new meaning. such as ‘sleep’ (slept). A syntactic example of this is the fact that pronouns, although derived from full noun phrases show much 3 DATA more conservative behaviour (e.g. case marking) due to their higher For this analysis, we consider a 100-year long extract from The Cor- frequency. The type of change that is resisted in the high token pus of Historical American English (COHA) [3].2 This is a 400-million frequency items is change on the basis of combinatorial patterns or word corpus, which contains samples of American English from constructions that are productive. “The more lexical items that are 1810–2009 balanced in size, genre and sub-genre in each decade heard in a certain position in a construction, the less likely it is that (1000–2500 files each). It therefore contains balanced language sam- the construction will be associated with a particular lexical item” ples from fiction, popular magazines, newspapers and non-fiction [1, p.384]. This is observable in the ditransitive construction, which books, which are again balanced across sub-genre, such as drama is only acceptable with very specific lexical verbs of high frequency, and poetry.3 compare: “ He told the woman the news” vs. “He whispered the For this study, we selected all data from the years of 1880-1979 woman the news” [1, p.385], where the verb tell is a lot more fre- covering all genre of news, magazine, fiction and non-fiction. For quent than whispered. To a limited extent, this is also productive in most of the experiments, we only use the news section of the data, that the construction can apply to a few new high frequency verbs, as it is most likely to contain the types of change we are targeting, such as e-mailed or telephoned. though occasionally comparing to the other three genre. In order Hamilton et al. [4] consider the function aspect of diachronic to arrive at a relative frequency count for each feature, we combine change by taking a closer look at global and local shifts in a word’s the individual files on a per year basis and relativize by the overall distributional semantics in historical texts from English, French and token count for that year. 4 As features, we consider the set of word German.1 For the local or cultural shifts, they use a local neighbour- bigrams marked for syntactic context, e.g. the word like has different hood measure and for the global measure they compute the cosine meanings depending on its context. It can be used as both a verb and distance between two word vectors capturing the co-occurrence a preposition, which should subsequently be treated as two separate statistics at consecutive time points t and t+1. Based on previous items. We chose bigram size as it provides more context and is richer results in the literature, they predict that nouns are more likely to in meaning allowing us to discern more specific items of change undergo change because of cultural shifts, whereas verbs are more than with unigram size. We decided against analyzing items of likely to change because of regular semantic change. Across all higher rank and abstraction, such as part-of-speech sequences as languages as predicted, the local neighbourhood measure assigns these are more difficult to evaluate, while word sequences offer higher rates of semantic change to nouns than verbs with the op- more possibilities for human evaluation. posite applying to the global measure. This also remains the case, In order to extract part-of-speech (POS) features needed for syn- when adverbs and adjectives are included among the verbs, sup- tactic word features, we used the TreeTagger POS tagger [8, 10]. porting previous results in the literature suggesting that adverbial Our new syntactic word features were then created by using the tag and adjectival modifiers are often the target of regular or global sequence as a suffix to the original word in context that gave rise linguistic change [4]. to it. Thus, “He likes her” becomes “he.PP likes.VBZ her.PP”.5 Items The research presented by Kulkarni et al. [7] considers change point analysis in the context of investigating statistically significant 2 free version accessible on: http://corpus.byu.edu/coha –last verified July 2017. 3 There is an excel file with a detailed list of sources available on: shifts of semantic change. They consider three different approaches, http://corpus.byu.edu/coha/–last verified July 2017. 4 In the case of higher sequence features, such as word bigrams the unigram token 1 Local or cultural shifts are deemed less regular and stable than global shifts, as they count is replaced by the unique bigram token count. are caused by more changeable factors, such as new technologies, whereas global 5 In this, the difference between the original word in context and the lemma of the shifts are associated to regular semantic change, such as grammaticalization. word would primarily be reflected in verbs. 2 are then joined to bigram sequences and each two syntactic word sequences are 50 new components that group related features to- sequence is relativized by the total number of bigram sequences in gether.6 A feature can be negatively or positively related to a new that year. For this work, we are primarily interested in changes in component. The components themselves account for decreasing common nouns, requiring us to extract these from all other types. proportions of variance, e.g. in this case the first component ac- We only retain adjective-noun or noun-noun combinations, as we counts for 25% and the second component for 12% of the variance expect the other types that can occur in noun phrases, i.e. deter- with the rest being more broadly spread out. Inspection of first miners, proper nouns and pronouns to follow a different frequency principal component allows for discovery of the three highest as- distribution that might introduce noise. sociated items: last week, last year and next year with very similar weights: 0.259558004, 0.256880159 and 0.252325884 respectively (Figure 1). Figure 1: PCA results for the 10 highest associated bigrams. Figure 2: 19th/20th century corpus: word bigrams ’last year’ 1.0 and ’last week’ shown over different genre types: news, mag- azines, fiction and non-fiction. All genre: 'last year' 0.5 first.JJ time.NN ● fiction nfiction 0.0003 young.JJ man.NN magazine Relative Frequency many.JJ people.NNS pc2 axis next.JJ year.NN news 0.0 few.JJ moments.NNS young.JJ lady.NN last.JJ week.NN last.JJ year.NN good.JJ deal.NN 0.0002 great.JJ deal.NN −0.5 0.0001 ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●●●● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ● ●●●●●●●●● ●● ● ●● ● ● ● ●●●● ● ● ●● ●●● ● ●● ●●● ●● ● ●● ● ●● ● ● ● ● ●● ●● ● ● 0.0000 ● −1.0 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 −1.0 −0.5 0.0 0.5 1.0 Time pc1 axis All genre: 'last week' 0.0020 ● fiction nfiction magazine 4 METHODS Relative Frequency 0.0015 news In this section, we describe the methods used for initial detection of interesting constant features, our data exploration and the change 0.0010 point analysis. 0.0005 4.1 Detecting changing features As we are interested in change in variables appearing in all time 0.0000 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● instances of a temporally-ordered data series, we consider only 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 those bigram adjective-noun/noun-noun types that appear in all Time time slices and discard all others. Even when reducing the set of fea- tures to these constant noun types, some 350 sequences remain for examination. In order to discover interesting (and possibly related) Figure 2 shows two of the features: last year and last week. Both features more easily, we first order them according to mean relative display rather sudden changes around 1920, albeit for different frequency and then use principal component analysis (PCA) on sets genre, ‘last year’ becomes more frequent in the news domain and of 50 bigram features, as we have found estimation and later inter- after that magazines, while for ‘last week’, the order is reversed: pretation of components to be better, when the document-feature first magazines and then news. ratio is in favour of more samples. PCA is an unsupervised statisti- cal technique to convert a set of possibly related variables to a new 4.2 Type-token analysis uncorrelated representation or principal components. This type of In order to understand the underlying development in the data bet- analysis groups features according to common variance patterns ter, yielding a more informed analysis, we explore some methods and can help to detect features that vary in a similar way. The results 6 For this experiment only, we took logarithms of relative frequencies before applying of running PCA on the 50 most frequent noun-noun/adjective-noun PCA. 3 inspired from data analysis in the financial sector, Rate of Change detection in cloud data in the presence of anomalies.7 The proposed (RoC). The methods described here are used in section 5. In par- approach (‘E-divisive with Medians’(EDM)) is a non-parametric ticular, we are interested in what way different groups of features technique using medians and estimating the statistical significance change over time with respect to different quantities. For this, we of a change point through a permutation test. We found this tech- consider two basic linguistic measures, type-token ratio (TTR) cal- nique to return fewer change points that were more spread out than culated as the number of types in a particular category divided distribution based change point methods, rendering it even more by all the number of tokens and a type-vs-all-types ratio (TYPR), desirable as our data is not always normally-distributed. We found whereby we compare the number of types in a particular category that using transformations occasionally smooths over interesting to all types. Further we combine these two ideas with the RoC in developments making these less desirable to use in this context. order to gain insights in how types of features not only change over time, but are related to the respective previous time instance. 5 DATA EXPLORATION The log RoC is defined as, the value of the variable Vt today In the following section, we look at unigram and bigram instances of divided by the value yesterday Vt −1 as shown in eq. 1. Thus, this both function and content types to gain an intuition about general is showing how a particular value changes, for instance from one trends in the data. We begin by exploring changes in type and token year to the next. relations in unigrams of both a function (determiner) and a content Vt ! (noun) category. For the determiner category, we considered both RoC ln = ln (1) Vt −1 instances corresponding to ⟨ DT⟩ and ⟨ WDT⟩, thus both the.DT and For most of the analysis, we use simple TTR and TYPR, except which.WDT in contexts, such as ‘Which/The book...’. for figure 5, for which a slightly modified version of the RoC is used. Rather than having the current value at time t in the numerator, we consider the set of linguistic types that two time points have Figure 3: News corpus: determiner types divided by total to- in common , as shown in Eq. 2. Thus, |typesyt | ∩ |typesyt −1 | refers kens(TTR) or types (TYPR). to the size of the group of features found in year yt and yt −1 with respect to the number of tokens in yt −1 . The second version, shown 0.0020 in eq. 3 relativizes with respect to the number of total types in yt −1 . ● TTR In the following, we refer to these to measures as TTR’ and TYPR’ to TYPR distinguish these from static type-token ratios. As is common with 0.0015 Proportion (%) financial data to achieve symmetry between decrease and increase, we take natural logarithms. 0.0010 |typesyt | ∩ |typesyt −1 | ! TT Ry′ t = ln (2) |tokensyt −1 | 0.0005 ● ● ●●●● ●● |typesyt | ∩ |typesyt −1 | ●● ! ●●●●● ●● ● ●● ●● ●●● ●●● ●●●●●●●●●● ● ● ● ● TY PRy′ t = ln (3) 0.0000 ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●●●●● |typesyt −1 | 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 Time These two measures allow us to observe how the broader cate- gories behave with respect to feature types and what proportion these take of either all types or tokens. Figure 3 shows the number of determiner types with respect to all types/all tokens over time. The top line shows a sharp decrease 4.3 Change-point Detection in 1920 for the determiner vs. all types ratio, with this being also Change point analysis is the analysis of a time-series with the aim visible but less pronounced for the type-to-token ratio. One aspect to detect specific points t in time that separate the points before that needs to be taken into account in this context is the influence and after it with respect to some criterion. More formally, aspects of of types attested for in all time instances. We refer to these as the change point analysis can be defined as follows: given a time-series ‘universally’ constant types to distinguish between these and the {yt : t ∈ 1, ...n}, a change point occurs if there exists a time k, types that are ‘partially’ constant appearing in a few consecutive where 1 ≤ k ≤ n − 1, such that the distributions of {y1 ...yk } and years but not in all.8 The shortest span of constancy is two instances {yk +1 ...yn } are different with respect to some criterion, i.e. change (years), which we refer to here as ‘pairwise’ constancy. The concept in mean, change in regression or change in variance. For this analysis, of constancy in itself has to be distinguished from possible asso- we are primarily interested in changes in mean as these would sig- ciated frequency distributions. A feature could appear in all time nal a higher or lower average usage of a feature with respect to an instances and be therefore constant, but might vary considerably earlier time period, while for instance a change in variance would with respect to its relative frequency. With respect to determiners, indicate greater or lesser variability in how a feature is used. As we the proportion of constant determiners of all occurring determiners are interested in long-term change, that lasts at least 10 years or so is relatively high indicating that other constancy types shared less we require a change point detection technique that is less volatile 7 This is implemented in the R package ecp [6]. to short-term fluctuations in the data. For our experiments, we 8 By ‘universally’ the span of our entire data set is meant rather than any data and chose the approach by James et al. [5], originally used for breakout time space that could be examined in this way. 4 of the variation observed in figure 3. Thus, as would be expected with a true function category, most of its types account for a high Figure 5: News corpus: pairwise appearing adjective-noun proportion of all of its types as well as variation in frequency over types against all types/tokens. time. With the noun category, in this case only considering singular and plural common nouns the situation is somewhat reversed. All −1 common noun types account for c. 38% of the entire token variety, ● TTR' but only 0.05% of these are types that are universally constant. Sim- TYPR' ● log (Proportion (%)) ilarly, the non-universally constant types also account for most of −2 the tokens, indicating that variety rather than constancy of types is predominant here. So we expect the proportions of universally con- ● ● stant features to behave differently for nouns and determiners. We ● ● ● ● ● ● ● ● test this by subtracting the proportion of constant determiner types ● ● ● ● ● ● ● ●● ● ● −3 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● of all types from the proportion of all determiner types of all types ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●●● ● ● ● ● for each year and compare these differences using the Wilcoxon ●● ● ● ● ● ● ● ● ● ● ● ● ● ● signed rank test to the same quantities for nouns. The difference ● ● in means over these yearly differences is significant, meaning that 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 universally constant types behave differently in each group. Con- Time versely, comparisons based on the complement of those universally constant features is also significant. In terms of frequency changes, to these groups might only be found in this particular domain one can observe a sharp drop in tokens (TTR) and a slightly more of news articles, where language might be more variable than in temperate downward curve in types (TTYR) after 1920. other domains, such as fiction or non-fiction. Thus, one would not necessarily expect these findings to translate to other genres. Figure 4: News corpus: adjective-noun types divided by all unigram types/tokens. 6 CHANGE-POINT EXPERIMENTS Based on the exploratory analysis described in section 4.1, we chose ‘last year’ as our variable of interest and determine by change point analysis whether there is an abrupt sort of change as opposed to 0.12 ● TTR a more gradual trend. Figure 2 shows two adjective-noun phrase TYPR bigrams, ‘last week’ and ‘last year’. Both are universally constant Proportion (%) 0.10 over both the news and magazine corpus, but not constant for fiction and non-fiction. 0.08 6.1 Discovery of Related Variables ●● ●● 0.06 ● ● ● ● ●●● ●●●●●●●●●●● ●●●● Given our variable of interest, we seek to find variables that display ●● ●●●● ● ● ●●● ●●●●●● ●● ● ● ● ●●● ● ● ●●●● ●● similar change or as we hypothesise are somewhat responsible for ● ●● ● ●●● ● ● ● ●● ●● ●●●●●●● ● ● ● ●●●● ●●● ●● ●●●●● ● 0.04 ●● the change observed in that universally constant feature. 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 Our set of suitable candidate variables comprises the set of bi- Time gram adjective-noun/noun-noun combinations, that in contrast to our main variable need not be universally constant, but might only As a final step, we consider word bigram sequences, in particu- turn out to be partially constant over the entire time span. Each lar the group of bigrams comprised of either adjective + common variable’s frequency in the group is relativized with respect to the noun (plural/singular) or bigrams of only common nouns (plu- entire token count for the respective year. ral/singular). Figure 4 shows the adjective-noun bigram types with The very first step in this is to run a change point analysis for respect to all types/all tokens. Interestingly, although the com- ‘last year’ over the entire 100-year span in order to ascertain the mon noun unigram types are decreasing over time, these bigram exact point of change. As could be observed from figure 2, a change types are increasing in proportion to both all types and tokens. happened a little after 1920 with the period afterwards giving rise The universally constant features in this set are very limited in to a higher frequency pattern than the years leading up to it. One this subgroup and largely all variation is accounted for by partially then chooses an interval of a certain length after the change point constant features. to limit the number of candidate features for examination. The Figure 5 shows the pairwise ‘appearing’ features, i.e. those fea- rationale in this case being that given a rise in frequency after the tures found at time t, that are not at time t − 1 with respect to all change point one would expect correlated features to be partially types (TYPR’)/tokens (TTR’) at time t − 1. As is observable, there constant for at least a certain period of time afterwards, e.g 10 years. is a comparatively large increase of new features in 1920 and a We therefore extract the partially constant features for this time somewhat smaller increase again at c. 1939. This indicates that period only. We then take the remaining features and calculate their there might be new concepts emerging for this feature type that individual change points over the entire 100-year period. Given all have not been there previously. The variation found with respect change points over all features, these are divided into three different 5 Table 1: Correlation between ‘last year’ and chosen features. Universally constant features are marked in italics. No. 1920-1930 1930-1940 1940-1950 1950-1960 1960-1970 1. first half 0.88 first quarter 0.85 first year 0.8 automobile industry −0.87 other countries −0.76 2. floor leader 0.85 second quarter 0.83 international law −0.76 european countries −0.82 government officials −0.75 3. first quarter 0.84 common stock 0.82 other words −0.72 democratic leaders 0.81 political parties −0.7 4. recent weeks 0.55 current year 0.82 public utility −0.66 british government −0.74 european countries −0.69 5. current year 0.77 first time −0.74 international relations −0.65 overwhelming majority 0.7 political leaders −0.66 6. farm relief 0.76 tomorrow morning 0.67 national policy −0.61 last week 0.69 other words −0.64 7. same period 0.75 first half 0.65 late today −0.6 british empire −0.62 open market 0.59 8. automobile industry 0.75 same period 0.64 near future −0.57 floor leader −0.62 low prices 0.57 9. weather conditions 0.73 stock market 0.63 european countries 0.53 american people −0.6 american government −0.57 10. same time 0.72 low prices 0.63 oil production 0.53 other countries −0.55 vice president 0.55 11. executive session 0.72 american people −0.62 american people −0.51 next year 0.54 other nations −0.53 12. whole world −0.71 business conditions 0.61 political parties −0.51 next month 0.54 present conditions −0.53 13. second quarter 0.69 present time 0.6 low prices 0.5 current year −0.54 third quarter 0.51 14. motion picture 0.67 whole world −0.58 last week 0.49 last summer 0.53 several occasions −0.51 15. vice president 0.66 last week 0.58 vice president 0.45 disarmament conference 0.53 war debt −0.5 16. american people −0.65 law enforcement 0.56 crude oil 0.45 first year 0.52 past week −0.5 17. first year 0.65 good business 0.55 next year 0.45 crude oil 0.51 american people 0.46 18. recent years 0.63 present indications 0.54 next few −0.45 present time −0.46 farm products 0.46 19. american government −0.62 past year 0.53 present conditions −0.45 recent years 0.45 large majority −0.45 20. public interest −0.6 near future 0.52 recent months 0.43 political leaders −0.45 present time −0.44 21. important factor 0.6 income tax 0.48 stock market -0.42 international relations -0.45 foreign countries -0.44 groups of features, those whose change points occur before the after that, so somewhat of a transition period where different con- main feature’s change point (in this case ‘last year’), those whose cepts have similar trends to ‘last year’. There are a few temporal occur exactly at the same time and those whose occur after. Only expressions and politically/industry-related terms, such floor leader, those features that change significantly with respect to their mean executive session,vice president and automobile industry and a few within 10 years before or after the main feature’ change are retained, expressions (possibly temporal), that would probably be anchored the reason being that we deem it unlikely that those changes more more strongly in the business context, such as first quarter and remote in time would be related. Using the present method for second quarter. The second time window spanning 1930-1940, fea- detection, features usually do not have more than one change point tures various concepts related to the stock exchange and business, and in the cases that they have two, these are separated by a time such common stock, stock market, business conditions and income span of at least 20 years. A change point indicates a change in tax as well as a few temporal expressions possibly used in this mean and what follows could either be an increase or a decrease in context, such as first quarter and second quarter. Interestingly, over frequency.9 As we only focus on features with similar trends, we the next time span covering 1940-1950, for instance stock market discard those with an opposing trend to our candidate feature, by goes from being reasonably positively correlated (0.63) to being calculating the correlation between ‘last year’ and each feature over negatively correlated (−0.42). and other concepts, such as european the interval covering 15 years on either side of a feature’s change countries and oil production take precedence instead. In the next point and only retaining those features for which this correlation is time window (1950-60), the highest rated concepts are negatively positive.10 In the present case, the specifications were set as follows: correlated with ‘last year’, this effect becoming even stronger in the the change point for ‘last year’ was estimated at 1923, so we choose very last time frame of 1960-70. Overall, we interpret this to mean the interval spanning the years 1924-1934 to look for features that that very different concepts come to be used with ‘last year’ than are constant over this period of time. We would not expect the provided the basis for this set of correlated features. Certain events, exact time frame to be of high importance, as one would expect such the surprising wall street crash in 1929 could have caused most features to level off more gradually over time. After discarding temporal expressions to gain more prominence and created an at- features not constant over this interval, 103 features are left, where mosphere of immediacy that at least in the news world made the at least 22 of these are also temporal expressions. In fact, when use of temporal expressions more likely. With WWII and the cold we consider the universally constant adjective-noun combinations war shortly following, this might have kept the temporal dimension that are constant over the entire 100-span, the majority of these palpable. When we examine the list of change points, including the turn out to be temporal expressions (12/16). The fact that not more ones more than 10 years after ‘last year’, it is noticeable that a few features are constant over the entire span hints at the domain being expressions’ points of change lie very close together, for instance somewhat volatile with respect to content sequences. stock market, preferred stock, financial position, first quarter and third Table 1 shows the highest pairwise correlations (either nega- quarter all change in either the year 1915 or 1916 and in 1945 or tive or positive) between ‘last year’ and each of the 104 features 1946. Figure 6 depicts this overlap in increase after the first change over smaller intervals of 10 years from 1920 to 1970, where the point and return to initial mean frequency pattern after the second universally constant features are marked in italics. The first inter- change point. val covers a few years before the change point and a few years Another aspect that is noticeable in the results is that various temporal expressions appear in the list of features highly correlated 9 We focus on synonymous changes and causes here, i.e. the parallel increase of two with ‘last year’. This suggests that temporal expressions in general features together, rather than assuming that a decrease in one feature causes an increase increased in usage over time with respect to this genre. Figure 7 in the other feature, although this would also be a valid scenario. shows a few of the expressions from table 1. All seem to increase 10 We used the Spearman rank coefficient for this, as available from the core R package. 6 Figure 6: News corpus: relative frequency of items with Figure 8: Fiction corpus: relative frequency of highest ‘last change points around 1915-16 and 1945-46. year’ correlated features in the fiction genre. ● stock market ● ● last year first quarter ● good evening preferred stock 0.00015 little girl Relative Frequency Relative Frequency 0.00010 third quarter good night financial position good time ● ● 0.00010 ● 0.00005 ● ● ● ● ● ● ● ● ● 0.00005 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ●●● ● ●● ● ● ● ● ● ● ●●●● ●●● ●● ● ● ● ● 0.00000 ● ● ●● ●●●●● ● ● ●● ● ●●● ●● ● ● ● ● ● ● 0.00000 ● 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 Time Time Figure 7: News corpus: relative frequency of temporal ex- interval of ± 15 years around its own change point, with the highest pressions. correlation being around 0.4. Figure 8 shows them side-by-side with ‘last year’. This suggests that the temporal aspect has not grown ● last year ● as much in importance in this genre and is less closely linked to last week adjective-noun types as it seems to be the case in the news domain. 0.0003 current year ● ● ● In order to validate this in the news data, we consider actual ● language samples for co-occurrence of items highly correlated with Relative Frequency ● ● ● past year ● ● ● ● ● ● ● ● ● next year ● ● ● ●● ● ● ● ‘last year’. We randomly extract sentences containing ‘last year’ ● ● ● ● ● 0.0002 ● ● ● ● ● ● and observe what concepts co-occur in the same sentences. We ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● take ten samples each from before 1923 (1910-1920), immediately ● ● ● ●● ● ● ● ● ● ● ●● ●● after (1924-1934) and again at a later stage (1950-1960). ● ● ● ● 0.0001 ● ● ● ● ● ● ● Table 2 shows salient concepts occurring in the same sentence as ● ● ● ● ● ●● ● ● ‘last year’ for all three time periods. The number in bracket indicates ● ● ● ● ● ●● ● ●● ● ● ● ● in how many sentences of the ten selected ones the term occurred. 0.0000 The terms occurring in the first time span are mostly related to 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 elections and governments with some more general political topics, Time such as company and wages entering into it as well. The second time span set around the change in ‘last year’ seems to contain in frequency over time. However, correlation analysis might be almost exclusively stock exchange related news items. The final a little volatile in that smaller spans of the entire period are not period, set after the end of WWII contains very mixed samples from representative of the overall correlation. For this reason, we seek to sports, to international politics, companies and space programs. validate our results further, which will be done in the next section. Although extracting a few random samples from a large set of texts cannot provide very fixed conclusions, these results seem 6.2 Validation of Results to support our earlier findings of a strong correlation between As the final part of this analysis, we seek to further validate our stock exchange related items and the temporal expression ‘last results. One part of this is to see whether this effect also exists in less year’ during a particular time period, where this seems to have changeable genre, such as fiction. Thus, we repeat the exact same dominated the news. In order to see to what extent this effect experiment, but using the fiction corpus as a basis rather than the generalizes to other temporal expressions, we need to analyze these news corpus. We first estimate possible change points on the basis separately. of the new corpus. Interestingly, the change point for ‘last year’ in the fiction genre happens earlier, around 1917. There seems to be 7 DISCUSSION a lot less variety in adjective-noun combinations as the partially We have reported an exploratory analysis to investigate the re- constant features over 1918-1928 only add up to 27. Of these 27, lationship between temporal expressions, such as ‘last year’ and only 9 are positively correlated to ‘last year’ based on a span of ± temporally less stable word expressions that appear and disap- 15 around their individual change point. pear over time. We hypothesized that these fluctuating words that However, only good evening, little girl, good night, good time and are more strongly connected to current events would somewhat very well are actually positively correlated with ‘last year’ over an influence the rise in frequency of more stable concepts, such as 7 Table 2: News corpus: salient words occurring with ‘last year’ in 10 randomly selected sentences for each of the time periods: 1910-1920, 1924-1934, 1950-1960. News corpus salient words occurring with ‘last year’ (primary) election(2), party(2), board of education(2), mexican bullets (1), company (3), 1910-1920 director(s)(2), railroad(1), wages(1), shareholders(1), submarine(1), national committee(1) adjustment bond(1), common stock(2), stock (dividend)(2), (cash) investment (2), congress (1), sales(1), share(1), 1924-1934 corporation(1), net profit(1), minor purchases(1), liquidation(1), dividend rate(1), president(1), preferred dividends(1) tournament(1), basketball coach(1), chicago medical society(1), tax bill(1), (space) administration(2), international agreement(1), 1950-1960 wage(1), arbitration(1), net income(1), auto companies(1), production schedules(1), national aeronautics(1), russians pioneer(1) temporal expressions. Our results suggest that there might indeed words or expressions that are stable in occurrence, might be rather be a connection between ‘last year’ and clusters of words linked to volatile with respect to their relative frequency distribution. As historical events, such as the stock marked crash. However, while temporal expressions have fewer semantic associations, they might stock market related words are only constant and very frequent depend more strongly on features that do. for a limited time frame, ‘last year’ and other temporal expressions The results we have obtained are tentative and in order to claim remain frequent. We believe that this could be due to temporal an increase of temporal expressions possibly related to certain aspect in news language having become more important after 1923, historical events, one needs to show this effect to hold for other having gathered momentum through events, such as the stock mar- temporal expressions as well as exclude any possible semantic shift. ket crash and then remained to stay. Our parallel analysis of fiction We also need validation from historians to interpret and relate our data at the same time seems to confirm this insofar as this effect results to historical and cultural changes in or around 1923. Partic- is not found with the same strength in fiction data. Based on our ular language usage and change therein can reflect shifts in society language sample analysis that appears to support our change point and general opinion, adding a more subtle basis for interpretation and correlation analysis, ‘last year’ is continued to be used in vari- of past events. ous different concepts, possibly more varied than before 1923. Our analysis using change points adds to a simpler relative frequency Acknowledgement detection approach by considering the uncertainties associated with We would like to thank our anonymous reviewers for their helpful our predictions. Although without having conducted a semantic suggestions on how to improve the earlier version of this paper. This change analysis, we cannot be entirely certain that this change is research is supported by Science Foundation Ireland (SFI) through not caused by a shift in semantics, however, the possible semantic the CNGL Programme (Grant 12/CE/I2267 and 13/RC/2106) in the space of temporal expressions could be seen as more limited than ADAPT Centre (www.adaptcentre.ie) that for regular common nouns or adjectives. In fact, these temporal expressions might semantically be closer to function types than to REFERENCES content types, in spite of belonging to the latter word class. In a [1] Joan Bybee and Sandra Thompson. 1997. Three frequency effects in syntax. In Annual Meeting of the Berkeley Linguistics Society, Vol. 23. sense, temporal adverbs are similar to prepositions, only anchored [2] Walter Daelemans. 2013. Explanation in computational stylometry. In Computa- in time rather than in space and consequently there might be less tional Linguistics and Intelligent Text Processing. Springer, 451–462. room for reinterpretation of their meaning. [3] Mark Davies. 2010. The Corpus of Historical American English: 400 million words, 1810-2009. http://corpus. byu. edu/coha/. 24 (2010), 2011. (last verified: Our results also need validation from historians, especially with 24.08.2015). respect to events in 1923 that could have caused temporal expres- [4] William L Hamilton, Jure Leskovec, and Dan Jurafsky. 2016. Cultural Shift or sions to become more frequent. The type of analysis we have done Linguistic Drift? Comparing Two Computational Measures of Semantic Change. In Empirical Methods in Natural Language Processing (EMNLP). here shows changes in words’ relative frequency patterns that could [5] Nicholas A James, Arun Kejariwal, and David S Matteson. 2016. Leveraging cloud reflect political or cultural changes. In this, we are at the mercy data to mitigate user experience from Breaking Bad. In Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 3499–3508. of the sampling of our newspaper corpus that although balanced [6] Nicholas A James and David S Matteson. 2013. ecp: An R package for non- over different sources is not impervious to other external factors parametric multiple change point analysis of multivariate data. arXiv preprint that could influence the language samples. For instance, by the arXiv:1309.3295 (2013). [7] Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, and Steven Skiena. 2015. Sta- mid-1920s, the businessman William Randolph Hearst had acquired tistically Significant Detection of Linguistic Change. In Proceedings of the 24th 28 newspapers, that consequently have been subject to same edi- International Conference on World Wide Web (WWW ’15). International World torial decisions, distorting our perception of what language was Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 625–635. https://doi.org/10.1145/2736277.2741627 representative for that time. [8] Meik Michalke. 2014. koRpus: An R Package for Text Analysis. http://reaktanz.de/ ?c=hacking&s=koRpus (Version 0.05-4). [9] Alex Riba and Josep Ginebra. 2006. Diversity of vocabulary and homogeneity of 8 CONCLUSION AND FUTURE WORK literary style. Journal of Applied Statistics 33, 7 (2006), 729–741. [10] Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In essence, this work has been exploratory trying to connect groups In Proceedings of international conference on new methods in language processing, of words that might not occur close to each other in space making Vol. 12. Manchester, UK, 44–49. their relatedness less tangible. Although, additional work is needed to further support our findings, our results tentatively suggest that 8