<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Cross-lingual Trends Detection for Named Entities in News Texts with Dynamic Neural Embedding Models</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Andrey</forename><surname>Kutuzov</surname></persName>
						</author>
						<author role="corresp">
							<persName><forename type="first">Elizaveta</forename><surname>Kuzmenko</surname></persName>
							<email>eakuzmenko_2@edu.hse.ru</email>
						</author>
						<author>
							<persName><forename type="first">M</forename><surname>Martinez</surname></persName>
						</author>
						<author>
							<persName><forename type="first">U</forename><surname>Kruschwitz</surname></persName>
						</author>
						<author>
							<persName><forename type="first">G</forename><surname>Kazai</surname></persName>
						</author>
						<author>
							<persName><forename type="first">D</forename><surname>Corney</surname></persName>
						</author>
						<author>
							<persName><forename type="first">F</forename><surname>Hopf- Gartner</surname></persName>
						</author>
						<author>
							<persName><forename type="first">R</forename><surname>Campos</surname></persName>
						</author>
						<author>
							<persName><forename type="first">D</forename><surname>Albakour</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">University of Oslo</orgName>
								<address>
									<addrLine>Postboks, Blindern 0316</addrLine>
									<postCode>1080</postCode>
									<settlement>Oslo</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">National Research University Higher School of Economics Moscow</orgName>
								<address>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Cross-lingual Trends Detection for Named Entities in News Texts with Dynamic Neural Embedding Models</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8FF0DA0F6CEB28380728F120B3437D53</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T18:32+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents an approach to detect real-world events as manifested in news texts. We use vector space models, particularly neural embeddings (prediction-based distributional models). The models are trained on a large 'reference' corpus and then successively updated with new textual data from daily news. For given words or multi-word entities, calculating difference between their vector representations in two or more models allows to find out association shifts that happen to these words over time. The hypothesis is tested on country names, using news corpora for English and Russian language. We show that this approach successfully extracts meaningful temporal trends for named entities regardless of a language.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>We propose an approach to track changes happening to real-world entities (in our case, countries) with the help of constantly updated distributional semantic models. We show how one can train such models on new textual data arriving daily and draw conclusions about events based on changes in word vectors induced by new contexts. In other words, subtle semantic shifts which the words undergo over time, influenced by realworld events, are detected by the presented method.</p><p>Detecting semantic shifts can be of use in a variety of linguistic applications. First, this method can be of help in the problem of automatically monitoring events through the stream of texts <ref type="bibr" target="#b0">[AGK01]</ref>. Detected semantic shifts can potentially be used as additional features in the algorithms aimed at extracting the course of events. Without unsupervised approaches, it is impossible to process all the continuously generated data. This is the primary motivation factor for our research. Second, the developed approach can be used to study language shift and compare temporal corpora slices. This language area is traditionally studied by linguists, who put a lot of efforts into describing semantic shifts with the help of dictionaries, corpora and sociolinguistic research. At the same time, it is impossible to grasp all the language vocabulary and describe every lexical shift manually. Distributional semantic models facilitate this task.</p><p>The approaches to events detection and modeling of language shifts have a lot in common. First techniques employed various frequency metrics <ref type="bibr" target="#b5">[JS09]</ref> and shallow semantic modeling <ref type="bibr" target="#b9">[KNR15]</ref>, <ref type="bibr" target="#b3">[HBB10]</ref>. With the emergence of distributive semantic models detection of semantic shifts acquired new potential, as it was shown that word embeddings significantly improve the performance of algorithms <ref type="bibr" target="#b6">[KARPS15]</ref>.</p><p>The rest of the paper is organized as follows. In Section 2 we introduce the basics of prediction-based vector models of semantics. Section 3 describes the principles of comparing such models, trained on pieces of text which follow each other in time. Specifics of our datasets are covered in Section 4, followed by the description of experimental setting in Section 5. Section 6 evaluates the results and in Section 7 we conclude.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Distributed Semantic Models</head><p>Vector space models (VSMs) are well established in the field of computational linguistics and have been studied for decades (see [TP + 10], <ref type="bibr" target="#b13">[Reh11]</ref>). Essentially, a model is a set of words and corresponding vectors, which are produced from typical contexts for a given word. The most widespread type of contexts is other words co-occurring with a given one, which means that the set of all possible contexts generally equals the size of the vocabulary of the corpus. The dimensionality of the resulting count model can be reduced with wellknown techniques like Principal Components Analysis (PCA) or Singular Value Decomposition (SVD). But in turn, this effectively forbids online training (continuously updating the model with new data), because after each update one has to perform computationally expensive dimensionality reduction over the whole cooccurrence matrix.</p><p>To overcome this, we employ a type of VSMs called prediction-based models: particularly, Continuous Bag-of-Words (CBOW) algorithm ([BDV03], [MSC + 13])<ref type="foot" target="#foot_0">1</ref> . Predictive models rather approximate co-occurrence data, instead of counting it directly, and show a promising set of properties. Using them, one directly learns dense lexical vectors (embeddings). Vectors are initialized randomly and then, as we move through the training corpus with a sliding window of a pre-defined width, gradually converge to values maximizing the likelihood of correctly predicting lexical neighbors. Such models as a rule use artificial neural networks to train; this is why they are sometimes called neural models.</p><p>For our task, it is important that predictive models can be updated with new co-occurrence data in a quite straightforward way. As already said, this is usually not the case with count models which demand computationally expensive calculations each time a new text is added.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Introducing Temporal Dimension to Vector Models</head><p>Detecting semantic shifts which words undergo over time demands the ability to somehow compare reference ('baseline') and updated models, representing later periods of time.</p><p>The idea of employing changes in distributional semantic models to track semantic shifts is not in itself new. [KCH + 14] proposed to detect language change with chronologically trained models. However, they used rather simplified measure of 'distance' between word vectors at different time slices, namely, raw cosine distance. We employ more sophisticated methods as described further. <ref type="bibr" target="#b12">[POL10]</ref> developed an approach to the First Story Detection in Twitter posts. Their research is similar to ours in that it deals with streaming data. The authors explore the space of documents and compare new tweets to the existing ones. However, the algorithm is developed specifically for short texts like tweets, which differ radically from news pieces analyzed in the presented paper.</p><p>Updating a neural model with new texts (in addition to the base training corpus used for initial training) is technically straightforward. After that, we have two models M 1 and M n , where the former is the 'baseline' reference model, and the latter is the updated one (or a sequence of n updated models, each corresponding to the next time period), probably bringing new semantic shifts. This dynamic model in a way tries to imitate human brain learning new things, gradually 'updating' its state with new input data every day.</p><p>What are the possible ways to extract these changes? Suppose there is a set S of named entities (organizations, locations or persons we are interested in). Initially in the model M 1 , each element of S can be thought of as possessing a number of topical 'associates' or 'nearest neighbors': words with their respective vectors closest to this element vector, ranked by their closeness or similarity. The exact number of nearest neighbors we consider in the simplest case is defined arbitrarily (for example, 10 nearest words). As we update the model with new data, co-occurrence counts for the elements of S are gradually growing (the model sees them in new contexts). It means than in each successive model M n learned vectors for elements of S can be different.</p><p>If contexts for these words remain pretty much the same throughout the training data, the list of associates (nearest neighbors) in M n will also remain intact. However, if a word acquires new typical contexts or loses some previous ones, its neural embedding will change: a semantic shift happens. Accordingly, we will see a new list of associates. For example, the vector representation for the word president may change so that its nearest neighbor is the vector for the name of the actual president of a country, instead of the previous one.</p><p>In this way, lists of nearest neighbors can be compared across models trained on different corpora or across one and the same model after an incremental update (as in the presented research). Substantial changes or bursts in such lists for the named entities we are interested in may signal that these entities have undergone or are undergoing semantic shifts, which in turn reflects real-world events. We dub this approach 'dynamic neural embedding models'.</p><p>Sets of neighbors in different models can be compared in many ways. Approaches to this range from simple Jaccard index <ref type="bibr" target="#b4">[Jac01]</ref> to complex graph-based algorithms. We test two methods:</p><p>1. Kendall's τ coefficient <ref type="bibr" target="#b8">[Ken48]</ref>, which measures similarity of item rankings in two sets. Intuitively, it is important to pay attention not only to raw appearance of some words in the nearest neighbors set, but also to their rankings in it.</p><p>2. Relative Neighborhood Tree (RNT), introduced by <ref type="bibr" target="#b2">[CGS15]</ref>. It essentially produces a tree graph with the target word as its root, nearest neighbors as vertexes and similarities between them as weighted edges. We then select the immediate neighbors of the target word in this tree and rank them according to their cosine similarity to the target word. These rankings are then compared across models using the same Kendall's τ .</p><p>The reason behind the second method is that it theoretically allows a deeper analysis of nearest neighbors' sets structure. Obviously, the neighbors participate in similarity relations not only with the target word but also between themselves. These relations convey meaning as well, making it possible to find the most 'important' neighbors. Graph-based methods to analyze relations between words in distributional models were also used in <ref type="bibr" target="#b10">[KWHdR15]</ref>; note, however, that the problem they deal with is inverse to ours -they attempt to trace changes in surface words for a stable set of concepts, while we attempt to trace semantic shifts (changes in underlying concepts for a stable set of words). We hoped that this graph-supported 'pre-selection' would allow Kendall's τ to improve the performance of the model. However, these expectations failed and simple ranking turned out to be more efficient than graph-based methods; see Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Data Description</head><p>We test our approach on lemmatized corpora of English and Russian news texts. The English corpus consists of The Signal Media Dataset<ref type="foot" target="#foot_1">2</ref> , which contains 265,512 blog articles and 734,488 news articles from September 2015. The size of the corpus (after lemmatizing and removing stop words) is 222,928,287 words.</p><p>We employ Stanford POS tagger <ref type="bibr" target="#b16">[TKMS03]</ref> to extract lemmas and to assign each lemma a part-of-speech tag.</p><p>In order to test whether extracted semantic shifts are consistent across languages, we use a corpus of news articles in Russian published in September 2015 (unfortunately, not available publicly due to copyright restrictions). It contains about 500,000 texts extracted from about 1000 Russian-language news sites. The size of the corpus (after lemmatizing and removing stop-words) is 59,167,835 words. We employ Mystem <ref type="bibr" target="#b14">[Seg03]</ref>, a state-of-the art tagger for Russian to produce lemmas and part-of-speech tags.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experimental setting</head><p>News texts from September 2015 do not seem to be a good training set alone. This is because such a corpus is inevitably limited in language coverage, lacking relations to events that happened earlier. Therefore, we first train a 'reference' or 'baseline' model which aims to mimic some background knowledge, which is then exposed to daily updates. For English, we used British National Corpus<ref type="foot" target="#foot_2">3</ref> (about 50 million words) to train this reference model, while for Russian it was the corpus of news articles published in the months preceding September 2015, precisely June, July and August (taken from the same source as the September articles). This corpus contains about 250 million words.</p><p>We acknowledge it is not quite correct to employ different types of corpora for 'reference' models in English and Russian. However, in a way, we compensate the quality and balance of BNC with the larger size of the reference corpus in Russian. In the future we plan to eliminate this inconsistency by using an analogous set of English news published in summer months or by employing Wikipedia dumps as reference corpora for both languages.</p><p>Both corpora were merged with same-language texts released in the first half of September 2015 (before 14th of September), in order to seed baseline models with some initial 'knowledge' of events and entities belonging to this month. Then, Continuous Bag-of-Words models were trained for both corpora, using negative sampling with 10 samples, vector size 300, symmetric window size 5 and 5 iterations. Words with frequency less than 10 were ignored during training.</p><p>After that, we successively updated these models with texts released in the following September time periods: 14th-15th, 16th-17th, 18th-20th, 21th-22th, 23th-24th, 25th-27th, and 28th-30th. Granularity of 2 or 3 days was chosen in order to enlarge the amount of data fed to models: for example, some one-day Russian corpora corresponding to weekends contained only several thousand words. For this reason, we additionally tried to include week-ends in 3-days periods, to make news stream more evenly distributed. As a result, average time period size in tokens was 18,774,000 for English data and 5,332,000 for Russian data.</p><p>We once again emphasize that our baseline models were not re-trained from scratch with new texts added from new corpora. Instead, we continued training the same model, gradually updating word vectors with new contexts. All interim states were saved as separate models, and in the end we had 8 successive models for each language.</p><p>We extracted English and Russian countries names from Wikipedia list of all world countries<ref type="foot" target="#foot_3">4</ref> and manually checked and normalized it, bringing all name variants to one lexeme. Then we filtered out the entities with frequency less than 30 per million words in either of our two reference corpora (English and Russian), producing a set CS of 36 frequent country names<ref type="foot" target="#foot_4">5</ref> .</p><p>Finally, for each of the successive models, we found nearest neighbor sets for each entity in CS and compared them to the sets from the model state at the previous time period. Kendall's τ and Relative Neighborhood Tree (RNT) were used to compute similarity coefficients for each country within the given pair of models. This provided us with two lists of countries (for each language) ranked by their similarity to the same country in the 'previous' model. Supposedly, countries in which some major events happened during the last days have to position low in these lists, because their associations in news texts drifted towards the recent event or an opinion burst.</p><p>Let's illustrate how news texts and changes in the models reflect the real-life events by comparing 10 nearest associates for Chile in the English and Russian corpora. On the 16th of September 2015 there was an earthquake in Chile, and we can detect its 'echo' in the changes between our models for 14th-15th and 16th-17th of September (see Table <ref type="table" target="#tab_0">1</ref>).</p><p>Before the 16th of September, associates for Chile in both models were mostly the neighboring countries. However, after the earthquake things have completely changed: there was a strong bias towards this topic in news and blogs, and this is reflected in vectors for the word. 60% of English and 20% of Russian associates are now related to the event.</p><p>Kendall's τ coefficient between these two neighbors lists is as low as 0 (neighbors are completely replaced) for English and 0.56 for Russian. Average Kendall's τ for CS is 0.56 in the English models for the two days in question, with standard deviation 0.12. Thus, in the case of English, the change to the neighbors' set can be considered a significant burst, well above simple chance. In the case of Russian, Kendall's τ lies only 1 point below the average value of 0.57. It is obvious that Russian mass media paid less attention to the earthquake (they are more concerned with Michelle Bachelet, Chile's president), but the event is still reflected in the nearest neighbors set.</p><p>The next section describes how we employed crosslinguality of the data to evaluate the presented approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Cross-Lingual Evaluation of Events Detection</head><p>There is no 'golden standard' or ground truth which would allow to evaluate precision and recall of our events and associations extraction, and to tune hyperparameters of the algorithms. However, there is a way to indirectly estimate their performance in a kind of intrinsic evaluation. We hypothesize that the better is an algorithm of detecting semantic shifts, the closer should be its results on model sequences trained on different language corpora. Obviously, national media focus on different topics, but this mostly concerns the domestic news. As for the world news, the worst scenario could be that a news story is not covered in national media of a particular country. However, such scenarios should be rare. In other cases, the perspective on a story can differ, but the 'burst' should remain the same<ref type="foot" target="#foot_5">6</ref> .</p><p>Thus, English and Russian countries lists ranked by their 'burstiness' can be compared using Spearman's ρ China Georgia <ref type="bibr" target="#b15">[Spe04]</ref> for each time period. As there are 7 shifts from one time period to another, we use median of ρ values for these 7 cases as a tentative measure of algorithm's performance. The Table <ref type="table" target="#tab_1">2</ref> gives an example of such country rankings for the changes between 18-20 and 21-22 of September. One can see that the top lists are highly similar, with 3 of 5 countries appearing in both (actual Sperman's ρ for the total lists of 36 countries between these periods is 0.5).</p><p>Overall results of applying this approach to the whole dataset using two our algorithms (with different sizes of nearest neighbors' sets to consider) are presented in the Table <ref type="table" target="#tab_2">3</ref>. We also applied it to a simple baseline method, where nearest neighbors are words which most frequently occurred in the window of 5 tokens to the right and to the left of the target entity in the given corpus. This once again raises questions about whether vector models can be efficiently processed with graph representations. Kendall's τ also outperforms the baseline approach: the margin is as small as two points, but it is supported by higher significance (p &lt; 0.1).</p><p>Note that qualitative analysis of the baseline results shows that they are mostly inappropriate for any practical task. For the time period which is described in the Table <ref type="table" target="#tab_0">1</ref>, the baseline approach almost does not reveal any differences between neighbors sets: average Kendall's τ is 0.92 for English and 0.99 for Russian. Thus, if in the case of English the earthquake event is at least detected (we observe the emergence of 4 new related neighbors), in the case of Russian the neighbor set remained strictly the same. It seems that the raw co-occurrences approach suffers from overestimating the influence of the reference corpora, which are much larger than the daily updates. Dynamic neural embedding models overcome this problem.</p><p>Interestingly, the wider sets of neighbors taken into account results in better performance only for CBOW with Kendall's τ . For the baseline and for CBOW with RNT, increasing the size of processed neighbor sets actually results in poorer performance. The reason for this behavior in RNT can be that the algorithm begins to 'roam' in the graph attracting more far-away associates as immediate tree neighbors to the target word. In the baseline method it simply leads to much language-dependent noise, which semantically aware models filter out at the training stage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusions</head><p>We presented a method of detecting semantic shifts for countries in news texts with the help of dynamic neural embedding models. We explored the difference between entities' vector representations in the models from different temporal stages and discovered association shifts that happen to these words over time. This can be employed to trace trends and events in streaming news texts using a completely unsupervised approach.</p><p>We showed that distributional semantic models are rather efficient when detecting associations shifts and are in most cases language-independent. In our test sets, there is a statistically significant correlation between lists of 'semantically shifted' countries in English and Russian sequences of models for the same time period.</p><p>However, there is still room for improvement. First of all, some ways to evaluate semantic shifts extraction have to be developed (including creation of ground truth datasets). Additionally, we plan to test other ways of comparing neighbor sets and tune algorithms' hyperparameters. It would be also useful to improve the quality of corpora (e.g. eliminate more noise and stop words). Finally, we plan to experiment with using different algorithms or parameter sets for different languages: preliminary tests show promising results.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Change in Chile's neighbor set</figDesc><table><row><cell cols="2">14th-15th September</cell><cell cols="2">16th-17th September</cell></row><row><cell>English</cell><cell>Russian</cell><cell>English</cell><cell>Russian</cell></row><row><cell>peru</cell><cell>бачелет</cell><cell>quake</cell><cell>аргентина</cell></row><row><cell>bolivia</cell><cell cols="3">аргентина earthquake бачелет</cell></row><row><cell></cell><cell></cell><cell></cell><cell>(bachelet)</cell></row><row><cell>colombia</cell><cell>коста-рика</cell><cell>santiago</cell><cell>никарагуа</cell></row><row><cell>argentina</cell><cell>перчик</cell><cell>chilean</cell><cell>мексика</cell></row><row><cell>honduras</cell><cell>никарагуа</cell><cell>tremor</cell><cell>бельгия</cell></row><row><cell>brazil</cell><cell>швейцария</cell><cell>tsunami</cell><cell>исландия</cell></row><row><cell>ecuador</cell><cell>бельгия</cell><cell cols="2">aftershock тунис</cell></row><row><cell>nicaragua</cell><cell>исландия</cell><cell>chileans</cell><cell>магнитуда</cell></row><row><cell></cell><cell></cell><cell></cell><cell>(magnitude)</cell></row><row><cell>paraguay</cell><cell>аргентин</cell><cell>temblor</cell><cell>землетрясение</cell></row><row><cell></cell><cell></cell><cell></cell><cell>(earthquake)</cell></row><row><cell cols="2">enchiladas гватемала</cell><cell>kyushu</cell><cell>коста-рика</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc></figDesc><table><row><cell cols="3">: 5 countries with most changed neighbors' sets</cell></row><row><cell cols="3">(of total 36) between September 18-20 and 21-22</cell></row><row><cell>Rank</cell><cell>English</cell><cell>Russian (translated)</cell></row><row><cell>1</cell><cell>Italy</cell><cell>Japan</cell></row><row><cell>2</cell><cell>Georgia</cell><cell>Brazil</cell></row><row><cell>3</cell><cell>Malaysia</cell><cell>China</cell></row><row><cell>4</cell><cell>Japan</cell><cell>Spain</cell></row><row><cell>5</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc></figDesc><table><row><cell></cell><cell></cell><cell cols="2">Cross-lingual evaluation</cell></row><row><cell>Algorithm</cell><cell></cell><cell>Neighbors'</cell><cell>Median Spear-</cell></row><row><cell></cell><cell></cell><cell>set size</cell><cell>man's ρ</cell></row><row><cell>Raw</cell><cell>co-</cell><cell>5</cell><cell>0.26 (p = 0.12)</cell></row><row><cell>occurrences</cell><cell></cell><cell>10</cell><cell>0.15</cell></row><row><cell>baseline</cell><cell></cell><cell>100</cell><cell>0.06</cell></row><row><cell>CBOW Kendall's τ</cell><cell>and</cell><cell>5 10 100</cell><cell>0.25 0.25 0.28 (p = 0.09)</cell></row><row><cell cols="2">CBOW and Rel-</cell><cell>5</cell><cell>0.20</cell></row><row><cell cols="2">ative Neighbor-</cell><cell>10</cell><cell>0.16</cell></row><row><cell>hood Tree</cell><cell></cell><cell>100</cell><cell>0.14</cell></row><row><cell cols="4">Kendall's τ consistently renders better results with-</cell></row><row><cell cols="4">out additional selection of 'important' associates by</cell></row><row><cell cols="4">a relative neighborhood tree (additionally, it is much</cell></row><row><cell>faster).</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The well-known word2vec tool also implements SkipGram, which is another predictive algorithm. However, it is more computationally expensive, and we leave its usage for future work.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://research.signalmedia.co/newsir16/ signal-dataset.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://www.natcorp.ox.ac.uk/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://en.wikipedia.org/wiki/List_of_sovereign_ states</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">Low-frequency country names bring in noise, because their vectors are susceptible to wild fluctuations when exposed to even a small amount of new contexts.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">Analyzing the degree to which the vision of events is different in national media is beyond the scope of the present research.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Temporal summaries of new topics</title>
		<author>
			<persName><forename type="first">James</forename><surname>Allan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rahul</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Vikas</forename><surname>Khandelwal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;01</title>
				<meeting>the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;01<address><addrLine>New York, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="10" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A neural probabilistic language model</title>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rejean</forename><surname>Ducharme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pascal</forename><surname>Vincent</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="1137" to="1155" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Navigating the semantic horizon using relative neighborhood graphs</title>
		<author>
			<persName><forename type="first">Amaru</forename><surname>Cuba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gyllensten</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Magnus</forename><surname>Sahlgren</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Lisbon, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015-09">September 2015</date>
			<biblScope unit="page" from="2451" to="2460" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Online learning for latent dirichlet allocation</title>
		<author>
			<persName><forename type="first">Matthew</forename><surname>Hoffman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francis</forename><forename type="middle">R</forename><surname>Bach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Neural Information Processing Systems 23</title>
				<meeting><address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="856" to="864" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Distribution de la Flore Alpine: dans le Bassin des dranses et dans quelques régions voisines</title>
		<author>
			<persName><forename type="first">Paul</forename><surname>Jaccard</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1901">1901</date>
			<pubPlace>Rouge</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Event detection in blogs using temporal random indexing</title>
		<author>
			<persName><forename type="first">David</forename><surname>Jurgens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Keith</forename><surname>Stevens</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Events in Emerging Text Types</title>
				<meeting>the Workshop on Events in Emerging Text Types<address><addrLine>Borovets, Bulgaria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="9" to="16" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Statistically significant detection of linguistic change</title>
		<author>
			<persName><forename type="first">Vivek</forename><surname>Kulkarni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rami</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bryan</forename><surname>Perozzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Steven</forename><surname>Skiena</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th International Conference on World Wide Web</title>
				<meeting>the 24th International Conference on World Wide Web<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="625" to="635" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Temporal analysis of language through neural language models</title>
		<author>
			<persName><surname>Kch + ; Yoon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yi-I</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kentaro</forename><surname>Chiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Darshan</forename><surname>Hanaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Slav</forename><surname>Hegde</surname></persName>
		</author>
		<author>
			<persName><surname>Petrov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 52nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Baltimore, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page">61</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Rank correlation methods</title>
		<author>
			<persName><forename type="first">Maurice</forename><surname>George</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kendall</forename></persName>
		</author>
		<imprint>
			<date type="published" when="1948">1948</date>
			<publisher>Griffin</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model</title>
		<author>
			<persName><forename type="first">Manika</forename><surname>Kar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sérgio</forename><surname>Nunes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cristina</forename><surname>Ribeiro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">51</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="809" to="833" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Ad hoc monitoring of vocabulary shifts over time</title>
		<author>
			<persName><forename type="first">Tom</forename><surname>Kenter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Melvin</forename><surname>Wevers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pim</forename><surname>Huijnen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maarten</forename><surname>De Rijke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM &apos;15</title>
				<meeting>the 24th ACM International on Conference on Information and Knowledge Management, CIKM &apos;15<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1191" to="1200" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Msc + ; Tomas Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><forename type="middle">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeff</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">26</biblScope>
			<biblScope unit="page" from="3111" to="3119" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Streaming first story detection with application to twitter</title>
		<author>
			<persName><forename type="first">Saša</forename><surname>Petrović</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Miles</forename><surname>Osborne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Victor</forename><surname>Lavrenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics</title>
				<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="181" to="189" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Scalability of semantic analysis in natural language processing</title>
		<author>
			<persName><forename type="first">Radim</forename><surname>Rehurek</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
		<respStmt>
			<orgName>Masaryk University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">PhD thesis</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine</title>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Segalovich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">MLMTA</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="273" to="280" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The proof and measurement of association between two things</title>
		<author>
			<persName><forename type="first">Charles</forename><surname>Spearman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The American journal of psychology</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="72" to="101" />
			<date type="published" when="1904">1904</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Feature-rich part-of-speech tagging with a cyclic dependency network</title>
		<author>
			<persName><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoram</forename><surname>Singer ; Peter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Patrick</forename><surname>Turney</surname></persName>
		</author>
		<author>
			<persName><surname>Pantel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2003 NAACL-HLT Conference</title>
				<meeting>the 2003 NAACL-HLT Conference</meeting>
		<imprint>
			<date type="published" when="2003">2003. 2010</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="141" to="188" />
		</imprint>
	</monogr>
	<note>From frequency to meaning: Vector space models of semantics</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
