=Paper=
{{Paper
|id=Vol-2079/paper11
|storemode=property
|title=On Temporally Sensitive Word Embeddings for News Information Retrieval
|pdfUrl=https://ceur-ws.org/Vol-2079/paper11.pdf
|volume=Vol-2079
|authors=Taewon Yoon,Sung-Hyon Myaeng,Hyun-Wook Woo,Seung-Wook Lee,Sang-Bum Kim
|dblpUrl=https://dblp.org/rec/conf/ecir/YoonMWLK18
}}
==On Temporally Sensitive Word Embeddings for News Information Retrieval==
<pdf width="1500px">https://ceur-ws.org/Vol-2079/paper11.pdf</pdf>
<pre>
    On Temporally Sensitive Word Embeddings for News
                  Information Retrieval

           Tae-Won Yoon            Sung-Hyon Myaeng             Hyun-Wook Woo
        School of Computing       School of Computing             Naver Corp.
               KAIST                     KAIST              Seongnam-si, South Korea
        Daejeon, South Korea      Daejeon, South Korea       hw.woo@navercorp.com
         dbsus13@kaist.ac.kr       myaeng@kaist.ac.kr
                     Seung-Wook Lee                    Sang-Bum Kim
                       Naver Corp.                      Naver Corp.
                 Seongnam-si, South Korea         Seongnam-si, South Korea
                 swook.lee@navercorp.com        sangbum.kim@navercorp.com


                                                                    from two sets of news articles covering two dis-
                                                                    joint time spans. The collection is comprised
                       Abstract                                     of 500 most frequent queries and their clicked
                                                                    news articles in July, 2017, provided by Naver
    Word embedding is one of the hot issues in re-                  Corp. The experimental result shows there is
    cent natural language processing (NLP) and                      a need for word embeddings to be built in a
    information retrieval (IR) research because it                  temporally sensitive way for news IR.
    has a potential to represent text at a semantic
    level. Current word embedding methods take                  1   Introduction
    advantage of term proximity relationships in
    a large corpus to generate a vector represen-               The method of representing words and texts as vec-
    tation of a word in a semantic space. We                    tors has drawn much attention in the natural language
    argue that the semantic relationships among                 processing (NLP) and information retrieval (IR) areas.
    terms should change as time goes by, espe-                  Various embedding methods for words, sentences, and
    cially for news IR. With unusual and unprece-               paragraphs have emerged to represent them in a low
    dented events reported in news articles, for ex-            dimensional vector space so that their semantic rela-
    ample, the word co-occurrence statistics in the             tionships can be computed[MSC+ 13, PSM14]. Miklov
    time period covering the events would change                et al.[MSC+ 13] proposed two efficient word-level em-
    non-trivially, affecting the semantic relation-             bedding models, Skip-gram and CBOW, both using
    ships of some words in the embedding space                  an objective function to predict the relationship of
    and hence news IR. With a hypothesis that                   words in a sentence. A different approach was pro-
    news IR would benefit from changing word                    posed based on matrix factorization over a word-word
    embeddings over time, this paper reports our                matrix with a neural network model by Pennington et
    initial investigation along the line. We con-               al.[PSM14]
    structed a news retrieval collection based on                   One of the most important issues in building an
    mobile search and conducted a retrieval exper-              embedding model is choosing an appropriate corpus
    iment to compare the embeddings constructed                 for training. There have been several studies on the
                                                                effect of employing different corpora for their types
Copyright c 2018 for the individual papers by the papers’ au-   and domains in training embeddings. Siwei Lai at
thors. Copying permitted for private and academic purposes.     el.[LLHZ16] tested five different embedding models
This volume is published and copyrighted by its editors.
                                                                with three different domain corpora (wiki-dump, NYT
In: D. Albakour, D. Corney, J. Gonzalo, M. Martinez,
B. Poblete, A. Vlachos (eds.): Proceedings of the NewsIR’18
                                                                corpus, IMDB corpus) on eight different tasks. They
Workshop at ECIR, Grenoble, France, 26-March-2018, pub-         conclue that the influence of the domains is dominant
lished at http://ceur-ws.org                                    in most tasks, proving the importance of choosing a
right domain. Diaz et al.[DMC16] also showed the im-          dard constructed from the click-through data.
portance of using a corpus with the same domain in
a query expansion task by comparing different embed-          2     Models and Dataset
ding spaces, one trained globally and the other trained
on a local task-specific corpus. They used Skip-gram          2.1   Embedding Models
and Glove for embedding models, five different local          We employed two most well-known word embedding
corpora for retrieval and embedding training. They            models: word2vec (skip-gram version) proposed by
found that a locally trained embedding model works            Miklov et al.[MSC+ 13] and Glove by Pennington et
much better than globally trained one in the query            al.[PSM14].
expansion task.                                                   Word2vec. This model has two different versions,
   Word embeddings may not reflect the dynamic na-            CBOW and Skip-gram, both of which use the context
ture of word meanings if a static collection is used for      words of the target word to compute its semantics.
training. It is natural that new words coined with tech-      CBOW uses the context words as the input and at-
nological advances or emerging cultures can change the        tempts to predict the target word from them. Skip-
word embedding space. Especially in a news corpus             gram, on the other hand, calculates the probability
that describes new events and contemporary issues,            of existence of the context words based on the target
changes in word statistics would be more phenomenal           word. For optimization, a negative sampling method
and the word embedding space should also change ac-           and hierarchical softmax function can be used. Nega-
cordingly. With an extensive coverage of an unusual           tive sampling is an optimization method that uses not
real-life event in news articles, such as the terror in Las   all the words but randomly sampled ones. Hierarchi-
Vegas in 2017, the semantic distance between terms            cal softmax is a method that keeps all words mutual
like Las Vegas and gun control, for example, would be-        appearance information into a binary tree to reduce
come much closer at least for a time being. We argue          the calculation cost. In our work, we used Skip-gram
that capturing this type of word meaning dynamics             with negative sampling1 .
should improve news IR and recommendation tasks.                  Glove. This model is based on matrix factoriza-
   While the aforementioned research showed the im-           tion over a word-word matrix with a neural network
portance of considering the domain of the corpus, there       model. It converts the word-word co-occurrence in-
has not been much work on investigating the impor-            formation to vectors. After training, the dot product
tance of the publication time of the corpus for retrieval     of two words becomes proportional to the log value of
tasks. As time goes by, the meaning of a word and its         concurrent probability of the two words. According to
relationship to other words would change, too. Kulka-         Pennington et al.[PSM14], the Glove model has been
rni et al.[KARPS15] shows that as time goes by, the           known to show a superior result in word analogy tasks
meaning and the usage of words changes. They ana-             and good at preserving semantic word relationships
lyze the change of word meanings and the relationship         rather than syntactic ones.
between words based on the time frames. However,
they just focus on a computational approach to de-            2.2   Dataset
tect statistically significant linguistic shifts, and did
not apply result to to retrieval tasks.                       Click-through data. In order to evaluate the per-
   We examined the importance of the time periods             formance of multiple sets of word embeddings for
of news corpora used for word embedding training              the retrieval task, we employed a news corpus with
by conducting a similarity-based news retrieval ex-           news click-through data provided by Naver Corp.2 , the
periment based on three different corpora (Korean             biggest portal service provider in South Korea, serving
Wikipedia articles and news articles in March and in          around 42 million users. The news click-through data
July, 2017) and two different commonly used word em-          covers all the mobile search clicks that took place be-
bedding models. A news retrieval collection was de-           tween July 1 and July 9, 2017. The number of records
veloped by extracting the most frequently asked 500           or clicks is 53,472,390. The details of the test collection
queries in July, 2017, and their clicked news articles in     constructed from the click-through data is in section
the click-through news data. For evaluation, we used          3.2.1 below.
the news retrieval task based on inverse document fre-           July news corpus. This corpus was generated
quency weighted word centroid similarities (CentIDF),         from the news click-through data and used for training.
proposed by Georgios-Ioannis Brokos et al.[BMA16].            All the clicked news articles were collected regardless
For each query in the retrieval experiment, we ranked         of the number of clicks. When the embeddings were
the news documents based on the cosine similarity be-            1 We also tested the CBOW model but the result is omitted
tween the query embedding and a document embed-               because it shows similar tendency
ding and compared the result against the gold stan-              2 https://www.navercorp.com/en/index.nhn
constructed, only the nouns extracted from the news
text were used. This corpus shares the same domain
and the collection time with the retrieval evaluation
collection. This corpus consists of 6,011,811 unique
news articles with 1,232,910 tokens3 .
   March news corpus. We collected news articles
clicked in March, four months earlier than the period of
                                                                   Figure 1: Date of each corpus used for the experi-
the evaluation corpus, so that we can examine how the
                                                                   ment. The time periods of the corpora used for the
time difference affects the word embedding result in
                                                                   experiment. Even though one third of the Wikipedia
the news domain. Like the July corpus, only the nouns
                                                                   documents were created after the test set, the future
extracted by a morphological analyzer were used. This
                                                                   documents is only one tenth of the entire wiki corpus
corpus has the same domain with the retrieval evalu-
                                                                   because the portion of Wikipedia corpus is only 30%
ation collection but a different time period. This cor-
                                                                   of the whole and there four months after July.
pus consists of 10,398,040 unique news articles with
1,381,901 tokens.                                                  for a simple news retrieval task. As such, we do not
   Wiki corpus. In order to reassure the importance                attempt here to compare these embedding-based re-
of the training data domain, especially for news IR, we            trieval results against either word-based or embedding-
also built a collection of general articles from Korean            based state-of-the-art IR methods. We make the re-
Wikipedia and Namu-wiki, which are the most widely                 trieval process as simple as possible so that we can
used online encyclopedic wiki collections in Korea.                observe the effect of different embedding methods on
Like the news corpora, only the nouns were extracted               the retrieval process without an interference of other
and used for word embeddings. A Wikipedia dump                     factors that have been devised for retrieval effective-
(389,584 articles) and a Namu- wiki dump (533,406 ar-              ness.
ticles) were downloaded in December 2017 and March
2017, respectively. Given that the test corpus was                 3.1     Training and Parameter Settings
based on the queries in July, searching the Wikipeida
documents generated at a later time until December                 For the training of generating word embeddings, we
gives the effect of searching future data (see Fig. 1).            used python gensim library4 for word2vec and the
While this may seem irrational for news search, it                 author-provided code5 for Glove. Other parameters
should not affect the experimental result in that the              for the Skip-gram model are: 300 for the vector dimen-
Wikipeida articles are not so sensitive to time and that           sion, 5 words for the context window size, and 0.0001
the number of future articles is relatively small. Namu-           for the learning rate. For dropout, all words that ap-
wiki played a more dominant role than Wikipedia in                 pear less than 3 times were ignored. For Glove, we
that the former contains more articles with a longer               trained it with 300 for the vector dimension, 15 for
text per article. The total size of the Namu-wiki cor-             the context window size, and 15 for maximum itera-
pus is 4 times bigger than that of the wikipedia cor-              tions. All words that appear less than 5 were dropped
pus. The resulting corpus contains 922,990 articles                out.
with 2,167,577 tokens in total.
                                                                   3.2     Evaluation via News Retrieval
Table 1: The dataset used for comparisons. All the                 3.2.1    Evaluation-set
data were collected in 2017.
   Name
 Wiki corpus
               Domain
                Wiki
                        Collection Time
                        March,December
                                          # Articles
                                            922,990
                                                       # Tokens
                                                       2,167,577
                                                                   Based on the past research that claims using click-
 March news
 July news
                News
                News
                             March
                              July
                                          10,398,040
                                           6,011,811
                                                       1,381,901
                                                       1,232,910
                                                                   through data can be an alternative way to evaluate
                                                                   retrieval performance[J+ 03, LFZ+ 07], we selected 500
                                                                   most frequently occurred queries from the news click-
3     Experiment                                                   through data introduced in section 2.2. The queries
                                                                   were searched (or used) at least 6,000 times with the
The main goal of the experiment is to gain an insight
                                                                   average of 36,521 times all the way up to about one
on the need to use word embeddings computed from
                                                                   million times. By taking a union of the clicked news
different time periods for news IR that usually seeks
                                                                   articles, the resulting test collection consists of 500
contemporary information, by comparing word embed-
                                                                   queries and 17,530 documents that were clicked at
ding results from the three different types of corpora
                                                                   least twice by the users who entered queries to the
    3 All the datasets used in this paper are in Korean.  They     search engine. After excluding the news articles that
are used after extracting nouns based on the results from the
morphological analyzer provided by Naver Corp. The examples          4 https://radimrehurek.com/gensim/

of the terms given in this paper are English translations            5 https://github.com/stanfordnlp/GloVe
were clicked just once, a query has 33.5 relevant doc-          the writing style of the Namu-wiki corpus being some-
uments on average with the maximum of 439.                      times informal with miscellaneous information and In-
                                                                ternet slangs make the Wiki corpus result worse than
3.2.2    Experimental        Setup     and     Evaluation       the March corpus. This suggests that it is critical to
         Metrics                                                build embeddings with the corpus in a similar domain
                                                                and writing style when the Skip-gram model is used.
To generate a vector for a query or a news article,
                                                                   An important finding is that regardless of the met-
we used the TF-IDF weighted word centroid calcula-
                                                                rics used, the July corpus gave the best results. While
tion method (CentIDF6 ) proposed by Georgios-Ioannis
                                          →
                                          −                     this is somewhat expected at an abstract level, it pro-
Brokos et al.[BMA16]. A document vector t is com-
                                                                vides an important insight on the use of embeddings
puted as follows:
                                                                for IR. Using embeddings as opposed to words would
                 |V                                             increase recall, perhaps at the expense of lower preci-
                 P|
                   T F (wj , t) · IDF (wj ) · −
                                              →j
                                              w                 sion in IR because of flexible matches. However, the
           →
           −   j=1                                              experimental result shows increased precision with a
           t =
                 |V
                 P|                                             more contemporary corpus used for embedding con-
                     T F (wj , t) · IDF (wj )                   struction. This suggests that the embeddings con-
                   j=1
                                                                structed from the same time period better reflect the
Where |V | is the vocabulary size of each sentence, wj          semantics of the words used by the users. Given that
as a word at j-th position in the sentence t.                   the embeddings capture the context of a target word,
   After generating document and query vectors, news            two words appearing in a close proximity in a corpus
articles are ranked according to cosine similarity with         would share similar semantics. This would have the ef-
each query vector. The ranked list of news articles is          fect of retrieving news articles that may not have the
used as a search result for the query. For comparisons          exact query word (hence higher recall) and of reinforc-
among different embedding results, we use the result            ing their relevance with the matched related words of
of three commonly used evaluation metrics: precision            the right context (hence high precision).
at 10, mean average precision (MAP) and NDCG at
10 based on binary relevance decisions.                         Table 2: Evaluating embedding models based on a
                                                                news retrieval task. Bold faced numbers are the best
3.2.3    Analysis of Retrieval Performance                      results in different metrics. Both CentIDF and Arith-
                                                                metic Mean are used for sentence embedding.
The overall comparisons among the three different cor-                                     CentIDF
                                                                         Model           Precision@10  NDCG@10    MAP
pora are summarized in Table 2 for two different em-               Glove (wikipedia)        0.7114      0.7654   0.6192
bedding models. For the Skip-gram model, the MAP                     Glove (March)          0.7046      0.7600   0.6188
result of the model trained on the July corpus is                     Glove (July)          0.7300      0.7776   0.6533
                                                                 Skip-gram (wikipedia)      0.6915      0.7509   0.5939
shown to perfrom 5.5% better than that trained on                 Skip-gram (March)         0.7203      0.7719   0.6317
the March corpus although the time difference was                  Skip-gram (July)         0.7399      0.7841   0.6666
                                                                                       Arithmetic Mean
only four months. The improvement was as high as                         Model           Precision@10  NDCG@10    MAP
12% when compared to the result trained on a general               Glove (wikipedia)        0.6015      0.6518   0.5138
                                                                     Glove (March)          0.6023      0.6529   0.5263
corpus (the wiki corpus), i.e. on a different document                Glove (July)          0.6612      0.7018   0.5948
type or domain. For the Glove model, the MAP result              Skip-gram (wikipedia)      0.5658      0.5193   0.4763
                                                                  Skip-gram (March)         0.6706      0.7147   0.5866
trained on the July corpus is shown to be about 5.5%               Skip-gram (July)         0.7090      0.7491   0.6404
better than both the model trained on the general cor-
pus and the model trained on the March corpus. This
                                                                3.2.4   Qualitative Analysis
strongly suggests that it is critical to build embeddings
with the corpus in a similar time period for news re-           In order to better understand the effect of different
trieval.                                                        corpora on embeddings and potentially on retrieval, we
   The Skip-gram model is more sensitive to the do-             picked two time-sensitive queries corresponding to two
main than the Glove model. This is because the Glove            separate sensational incidents in Korea between July 1
model is better at extracting semantic relationships            and July 9 and computed cosine similarity between the
among words than syntactic ones. That is, the stylis-           embedding of each and those of other words to rank
tic differences between the Wiki corpus and the March           them when the three different corpora were used. The
news corpus (without any temporal benefits) are less            first one was related to a claim made by several par-
important. For the Skip-gram model, on the contrary,            ents that McDonalds hamburgers caused a hamburger
  6 It is known to be better than arithmetic mean. Unweighted
                                                                disease (Hemolytic uremic syndrome)7 , and the other
method was also tried but without any gain.                       7 http://koreaherald.com/view.php?ud=20170705000868
Table 3: Top ten similar terms obtained by three different corpora for two sample queries “Hamburger disease
(Hemolytic uremic syndrome)” and “Incheon kid murder”. The expected intent-aware words are marked ‘*’
       query: “Hamburger disease(Hemolytic uremic syndrome)”                          query: “Incheon kid murder”
    Wiki corpus             March corpus             July corpus        Wiki corpus          March corpus            July corpus
     Swing-top                  215.8g              Hematotoxic*      Jung Duk Soon            Bupyeong              Murderer*
  Substitute (food)            Burger*               Hemolytic*         Park Nari       Kidnap(while sleeping)   Elementary girl*
       Cancer        Synchytrium endobioticum          Uremic*        Lee Duek Hwa          Before murder*         Final Verdict*
  Soy-source bottle         Burger King           Basedow’s disease   Woo Jung Sun           Doodle(river)             Killer*
    Celiac spruse              Maclab               Shagas disease     Yang Jiseung           Taheutajeu           Don-Am dong
 Basedow’s disease               Beef                  Maclab          Wentu Antu           After murder*        Park Chun Pung
        Taste               Mayagbingsso                215.8g         Gak Jae Eun         Palda(mountain)        Incite Criminal*
       Bread                 Fast (food)*           Haemolyticity*    Oh Jong Guen             Siha(lake)           John Odgren
 Parkinson’s disease           BigKing               McDonald*        Song Yung Cil        Elementary girl*         Live-in lover
 DOMDOM(burger)             Kim Kyo Bun              Uremicity*        Lee Wan Hue             Re-phase               Kidnap*
was the kidnap and murder of an eight-year-old girl in            news articles.
the elementary school by teenagers8 . Table 3 shows                   While anecdotal, the examples in Table 3 consti-
top ten closest words under each corpus for the two               tute a strong indication that it is critical to use the
queries.                                                          corpus that coincides with the time-sensitive queries
   For the ”Hamburger disease” query, the result of               in news IR. The embedding space would be entirely
Skip-gram trained on the wiki corpus consists of words            different from that of the same news corpus covering
that are generally related to each of the query words.            a different time period, giving very different similarity
Some are related to food (e.g. ”Swing-top”, ”Sub-                 relationships among words. As an example, we tested
stitute food”, ”Soy-source bottle”, ”Taste”, ”Bread”)             ”presidential impeachment” as the query, which was a
while others are to a disease (e.g. ”Cancer”, ”Base-              very sensational incident in March. We observe that
dows disease”, ”Celiac pruse”, ”Parkisons disease”).              the Skip-gram result trained on the wiki corpus has
But none of them are directly relevant to the inten-              words that are unrelated to the query, such as pres-
tion of the query, such as ”Hemolytic”, ”Umremic”,                ident impeachment incident that took place in other
and ”McDonald”. The result does not even contain                  countries, such as ”Dilma Vana Rousseff”, the former
words about ”Burger” itself but those that are about              president in Brazil. The result under the March corpus
the general notion of ”Food” or ”Disease”. It is obvi-            is slightly better than the result under the July corpus
ous that the embeddings constructed out of the Wiki               since the incident took place at that specific time.
corpus would bring in noise for news retrieval.
   The result under the March corpus is completely                4   Conclusion and Future work
different in the sense that the words about ”Hamber-
gur” were picked up. So the embedding space is much               Given that timeliness is a rather unique aspect of new
more focused on more contemporary issues in general.              IR, word embeddings should be constructed in such a
Since the Hamburger disease” related event didnt oc-              way that they reflect the evolving word-to-word rela-
cur yet in March, however, none of the words are rel-             tionships caused by emerging events and issues. Be-
evant to the query. It is very clear that the model               ginning with this hypothesis, we set out to build em-
trained on the July corpus gave the best result includ-           beddings based on the news corpora of different time
ing the six intent-aware words with an asterisk.                  periods as well as on an encyclopedic corpus as a base-
   For the ”Incheon kid murder” query, the Skip-gram              line for comparison, expecting to see the word embed-
model trained on the wiki corpus gives a result con-              dings constructed based on a temporally close corpus
sisting of perpetrators and victims of a murder in Ko-            would help retrieving more relevant news articles than
rea, especially in Incheon, which would be good search            those based on temporally disparate documents.
terms if the intent were to retrieve general informa-                We conducted an experiment with a newly con-
tion, not about specific event-related news. It is be-            structed news IR corpus and a simple retrieval pro-
cause the target corpus contains articles about indi-             cess using the cosine similarity measure for word em-
vidual murder cases. On the other hand, the March                 bedding matches as well as qualitative analysis of the
corpus gave a completely different words that are re-             pseudo-expansion of query terms. The result clearly
lated to descriptions of different murder cases, such as          shows that it is worth constructing and using a corpus
”Kidnap (while sleeping)”, ”before murder” and ”Af-               of temporally close news articles for news IR especially
ter murder”, contributing to the better retrieval result          when word embeddings are used. The qualitative anal-
in the experiment. The model trained on the July cor-             ysis of two sample queries strongly suggests that the
pus shows the most meaningful result containing six               semantic relationships among words change appropri-
intent-aware words that would help retrieving relevant            ately with different corpora so as to useful terms can
                                                                  be automatically generated for query expansion if the
  8 http://koreaherald.com/view.php?ud=20170330000938             temporal and domain aspects of the corpora match
with the queries.                                          [LFZ+ 07]   Yiqun Liu, Yupeng Fu, Min Zhang,
   The initial result reported in the paper needs to be                Shaoping Ma, and Liyun Ru. Auto-
expanded in a number of different ways. Just to name                   matic search engine performance evalua-
a few, we first need to be able to suggest the appropri-               tion with click-through data analysis. In
ate time periods by which new embedding space must                     Proceedings of the 16th international con-
be created for news IR. Another immediate question is                  ference on World Wide Web, pages 1133–
in what ways we can avoid new embedding construc-                      1134. ACM, 2007.
tions from the scratch when we have the embeddings
for a series of past time spans. We are currently in       [LLHZ16]    Siwei Lai, Kang Liu, Shizhu He, and Jun
the process of utilizing the past click-through data to                Zhao. How to generate a good word
capture the dynamic meaning changes across time pe-                    embedding. IEEE Intelligent Systems,
riods.                                                                 31(6):5–14, 2016.
                                                           [MSC+ 13]   Tomas Mikolov, Ilya Sutskever, Kai
Acknowledgment                                                         Chen, Greg S Corrado, and Jeff Dean.
                                                                       Distributed representations of words and
This research was supported by the Naver Corp.                         phrases and their compositionality. In Ad-
and Next-Generation Information Computing Devel-                       vances in neural information processing
opment Program through the National Research Foun-                     systems, pages 3111–3119, 2013.
dation of Korea (NRF) funded by the Ministry of
Science & ICT (2017M3C4A7065963). Any opinions,            [PSM14]     Jeffrey Pennington, Richard Socher, and
findings and conclusions expressed in this material do                 Christopher D. Manning. Glove: Global
not necessarily reflect the sponsors.                                  vectors for word representation.     In
                                                                       Empirical Methods in Natural Language
                                                                       Processing (EMNLP), pages 1532–1543,
References                                                             2014.
[BMA16]      Georgios-Ioannis Brokos, Prodromos
             Malakasiotis, and Ion Androutsopoulos.
             Using centroids of word embeddings and
             word mover’s distance for biomedical
             document retrieval in question answering.
             In Proceedings of the 15th Workshop on
             Biomedical Natural Language Processing,
             BioNLP@ACL 2016, Berlin, Germany,
             August 12, 2016, pages 114–118, 2016.

[DMC16]      Fernando Diaz, Bhaskar Mitra, and Nick
             Craswell. Query expansion with locally-
             trained word embeddings. In Proceed-
             ings of the 54th Annual Meeting of the
             Association for Computational Linguis-
             tics(ACL), August 7-12, 2016, Berlin,
             Germany, Volume 1: Long Papers, 2016.

[J+ 03]      Thorsten Joachims et al. Evaluating
             retrieval performance using clickthrough
             data., 2003.

[KARPS15] Vivek Kulkarni, Rami Al-Rfou, Bryan
          Perozzi, and Steven Skiena. Statistically
          significant detection of linguistic change.
          In Proceedings of the 24th International
          Conference on World Wide Web(WWW),
          pages 625–635. International World Wide
          Web Conferences Steering Committee,
          2015.

</pre>