=Paper=
{{Paper
|id=Vol-2831/paper3
|storemode=property
|title=Diffusion-based Temporal Word Embeddings
|pdfUrl=https://ceur-ws.org/Vol-2831/paper3.pdf
|volume=Vol-2831
|authors=Ahnaf Farhan,Roberto Camacho Barranco,M. Shahriar Hossain,Monika Akbar
|dblpUrl=https://dblp.org/rec/conf/aaai/FarhanBHA21
}}
==Diffusion-based Temporal Word Embeddings==
<pdf width="1500px">https://ceur-ws.org/Vol-2831/paper3.pdf</pdf>
<pre>
                               Diffusion-based Temporal Word Embeddings
            Ahnaf Farhan, Roberto Camacho Barranco, M. Shahriar Hossain, Monika Akbar
                      Department of Computer Science, University of Texas at El Paso, El Paso, TX 79968
                        afarhan@miners.utep.edu, {rcamachobarranco, mhossain, makbar}@utep.edu


                           Abstract                                 et al. 1980). A concept generally does not spike on a day and
                                                                    disappear immediately. Rather, concepts evolve with con-
  Semantics in natural language processing is largely depen-
  dent on contextual relationships between words and entities       text. Existing temporal low-dimensional language represen-
  in documents. The context of a word may evolve. For ex-           tations fail to integrate the concept of temporal diffusion
  ample, the word “apple” currently has two contexts – a fruit      into language models effectively. Moreover, these existing
  and a technology company. The changes in the context of           models (Bamler and Mandt 2017; Marina Del Rey 2018;
  entities in biomedical publications can help us understand        Rudolph and Blei 2018) cannot simultaneously capture both
  the evolution of a disease and relevant scientific interven-
  tions. In this work, we present a new diffusion-based temporal    the short-term and long-term drifts in the meaning of words.
  word embedding model that can capture short and long-term         As a result, sharply trending concepts, such as COVID-19
  changes in the semantics of biomedical entities. Our model        (coronavirus disease 2019), cannot be modeled in the em-
  captures how the context of each entity shifts over time. Ex-     bedding space when long-term drifts are considered. On the
  isting dynamic word embeddings capture semantic evolution
  at a discrete/granular level, aiming to study how a language      other hand, long-range effects – such as the change in the
  developed over a long period. Our approach provides smooth        meaning of the word cloud – are not captured when these
  embeddings suitable for studying short as well as long-term       algorithms take only short-term drifts into account.
  changes. For the evaluation of the proposed model, we track
  the semantic evolution of entities in abstracts of biomedical        Our approach uses the model for temporal high-
  publications. Our experiments demonstrate the superiority of      dimensional tf-idf representations introduced by (Cama-
  the proposed model when compared to its state-of-the-art al-      cho et al. 2018) to construct a training set. Construction
  ternatives.                                                       of the training set is a one-time cost. The temporal high-
                                                                    dimensional tf-idf representation (Camacho et al. 2018) is
                     1    Introduction                              able to capture sudden short-term changes in the corpus.
                                                                    Additionally, it incorporates diffusion into the modeling to
Word embeddings are low-dimensional vector space models
                                                                    some extent by incorporating the time dimension smoothly.
obtained by training a neural network using contextual in-
                                                                    The framework presented by (Camacho et al. 2018) gener-
formation from a large text corpus. There are several vari-
                                                                    ates smooth tf-idf vectors for each word of a corpus at every
ants of word embeddings with different features, such as
                                                                    timestamp, making it suitable for the generation of training
word2vec (Mikolov et al. 2013b,a) and GloVe (Penning-
                                                                    data for our proposed model. One of the challenges of (Ca-
ton, Socher, and Manning 2014). However, the research on
                                                                    macho et al. 2018) is that each word vector has a length
word embeddings to incorporate temporal shifts of contex-
                                                                    equal to the number of documents in the corpus, which is
tual meanings of words is still in its infant stage. This paper
                                                                    not practical for analyzing a corpus containing thousands of
focuses on generating word embeddings that account for and
                                                                    scientific documents. Our goal is to construct a contextual
take advantage of the temporal nature of timestamped scien-
                                                                    low-dimensional temporal embedding space mimicking
tific documents (e.g., abstracts of biomedical publications.)
                                                                    this high-dimensional representation without losing the es-
Our goal is to obtain a low-dimensional temporal vector
                                                                    sential temporal diffusion information encoded in the vec-
space representation that allows us to study the semantic
                                                                    tors. We introduce a neural-network-based framework that
and contextual evolution of words/entities. Using the word
                                                                    generates temporal word embeddings while optimizing for
embeddings generated by our framework, we demonstrate
                                                                    multiple key objectives. The temporal tf-idf representation
the task of tracking the semantic evolution of entities in a
                                                                    from (Camacho et al. 2018) is used to obtain a baseline ex-
corpus of biomedical abstracts.
                                                                    pected cosine distance (1.0 - cosine similarity) between pairs
    To generate word embeddings, our framework trains a             of word vectors at each timestamp. The expected cosine dis-
model using a diffusion-mechanism for evolving concepts             tance is used in the output layer of our proposed neural net-
within a scientific text corpus (Camacho et al. 2018; Angulo        work. New low-dimensional embedding vectors – driven by
Copyright © 2021, Copyright © 2021 for this paper by its authors.   a rigorous objective function to smoothly bring contextual
Use permitted under Creative Commons License Attribution 4.0        entities close to each other – are generated in the hidden
International (CC BY 4.0).                                          layer. The generated low-dimensional vectors are contextual
and allow the discovery of latent (transitive) relationships      model that may lead to wrong conclusions. A potential solu-
that can’t be observed in the temporal tf-idf representation.     tion to the sparsity problem is introduced by Camacho et
For example, if words A and B are close to C, we expect           al. (Camacho et al. 2018), which leverages diffusion the-
words A and B to be close to each other. We further explain       ory (Angulo et al. 1980) to generate a robust temporal rep-
the objective function and the neural-network in Section 4.       resentation. The technique uses a temporal tf-idf represen-
   The experimental results in Section 5 show that the pro-       tation in which the model changes size with the number of
posed method performs significantly better than the state-        documents and as a result, is not extensible.
of-the-art dynamic embedding models (Rudolph and Blei                The drawbacks of using static word embedding models
2017; Carlo, Bianchi, and Palmonari 2019) in capturing both       to generate temporal representations have led to the devel-
short-term and long-term changes in word semantics. Re-           opment of new techniques that can train the embeddings
sults show that our approach improves the continuity be-          for different timestamps jointly. The models use filters or
tween the vectors across different timestamps. As a result,       regularization terms to connect the embeddings over time.
embeddings for different timestamps combine to a homoge-          Yao et al. (Marina Del Rey 2018) propose to generate a co-
neous space, unlike the state-of-the-art models.                  occurrence-based matrix and factorize it to generate tem-
                                                                  poral embeddings. The embeddings over timestamps are
                   2    Related Work                              aligned using a regularization term. Rudolph et al. (Rudolph
                                                                  and Blei 2018) apply Kalman filtering to exponential family
Meanings of words in a language change over time de-              embeddings to generate temporal representations. Bamler et
pending on their use (Aitchison 2013; Yule 2017). Tem-            al. (Bamler and Mandt 2017) use similar filtering but apply
poral syntactic and semantic shifts are called diachronic         it to embeddings using a probabilistic variant of word2vec.
changes (Hamilton, Leskovec, and Jurafsky 2016). Several          According to Bamler et al. (Bamler and Mandt 2017), us-
probabilistic approaches tackle the problem of modeling the       ing a probabilistic method makes the model less sensitive to
temporal evolution of a vocabulary by converting a set of         noise. All these methods focus primarily on capturing long-
timestamped documents into a latent variable model (Radin-        term semantic shifts, while our goal is to be able to capture
sky, Davidovich, and Markovitch 2012; Yogatama et al.             both long and short-term shifts.
2014; Tang, Qu, and Chen 2013; Naim, Boedihardjo, and
Hossain 2017). Other approaches model diachronic changes                         3    Problem Description
using Parts of Speech features (Mihalcea and Nastase 2012)
or using graphs where the edges between nodes (that repre-        In this paper, we focus on timestamped text corpora, such
sent words) are stronger based on context information (Mitra      as collections of scientific publications that have publication
et al. 2015). However, tracking semantic evolution is not         dates. Let D = {d1 , d2 , . . . , d|D| } be a corpus of |D| docu-
possible using these techniques because they do not generate      ments and W = {w1 , w2 , . . . , w|W| } be the set of |W| noun
language models.                                                  phrases and entities extracted from the text corpus D. We
   The state-of-the-art technique for language modeling is        consider each of the noun phrases and entities a word. Each
word2vec, introduced by Mikolov et al. (Mikolov et al.            document d contains words from the vocabulary (Wd ⊂ W)
2013a,b). This method generates a static language model           in the same order as they appear in the original document
where every word is represented as a vector (also called em-      of d. Every document d ∈ D is labeled with a timestamp
bedding) by training a neural network to mimic the con-           td ∈ T , where T is the ordered set of timestamps.
textual patterns observed in a text corpus. There are sev-           The goal of this paper is to obtain a temporal word em-
eral variants of this method which include probabilistic          bedding model U from corpus D. Thus, for every timestamp
approaches (Barkan 2017) as well as matrix-factorization-         t ∈ T , we seek to obtain a vector representation uit for ev-
based techniques such as GloVe (Pennington, Socher, and           ery word wi ∈ W. The word embeddings U are represented
Manning 2014). A major challenge with static representa-          as a 3-dimensional matrix of size |W|×|T |×|u| where |u| is
tions is that they do not incorporate any temporal informa-       a user-given parameter that indicates the size of a vector for
tion that can be used for tracking semantic evolution. Our        a particular word at a particular time. We use the shorthand
work focuses on incorporating the temporal dimension of           Ui to describe the 2-dimensional matrix of size |T |×|u| that
text data into text embedding models so that evolution of a       represents word wi ∈ W over time.
vector space over time can be studied.
   A proposed solution to tracking semantic evolution is to                           4    Methodology
obtain a static representation for each timestamp in a cor-       Each subsequent subsections below describes a major com-
pus, and then artificially couple these embeddings over time      ponent of our objective function to generate diffusion-based
using regression or similar methods (Hamilton, Leskovec,          temporal word embeddings.
and Jurafsky 2016; Rosin, Adar, and Radinsky 2017; Carlo,
Bianchi, and Palmonari 2019). However, this approach has
several drawbacks. First, it requires having a significant        4.1   Training data for our model
number of occurrences for all words at all times, which is        We use the temporal tf-idf model (Camacho et al. 2018) to
usually not the case since words can gain popularity or ap-       obtain high-dimensional time-reflective text representations
pear at different times. Second, the artificial coupling of em-   of size |W| × |T | × |D| for training purpose. The vectors are
beddings across timestamps can introduce artifacts in the         formed using the temporal tf-idf weights of a word for every
document in every timestamp. The temporal tf-idf weight of             is more important that our word embedding model correctly
a word is computed using Eq. (1).                                      captures the relevant neighborhood of a word. Our experi-
                                                                       ments demonstrated that each word has a small number of
                                     (t −t)2
                                            
                             1      − d2ς 2                            relevant neighbors. That is, each word shares context with a
  ŵ(w, d, td , t, ς) = √         e             ·
                            2πς 2                                      small number of words. To take this into account in the ob-
                                                                     jective function, we introduce a penalty when the temporal
                                            |D|
                      (1 + log(fw,d )(log λw )                         tf-idf-based cosine distance δijt is small, ensuring that our
                                                      2  , (1)
                                                        
         P                                                           word embedding model captures the relevant context accu-
                                                  |D|
                 w0 ∈Wd (1 + log(fw0 ,d )(log λ 0 ) w
                                                                       rately.
                                                                                     |W| |W| |T |
where ŵ is the weighted tf-idf value at timestamp t for the                         X   XX
word w ∈ W in document d ∈ D, which was published at                     ϑ2 (U) =
timestamp td . The term fw,d represents the term frequency                           i=1 j=1 t=1
                                                                                                                                    2
of word w in document d, λw is the number of documents                                  (α · dist(uit , ujt ) − α · δijt ) · e−βδijt                  (3)
that contain word w, and Wd is the set of words that appear
in document d. The standard deviation of the Gaussian dis-             where β is a scaling parameter to increase/decrease the im-
tribution function is represented by ς, and is set by the user.        portance given to the samples with a smaller distance. Notice
   Next, we compute the cosine distance (1.0–cosine simi-              that e−βδijt in Eq. (3) imposes a higher penalty to examples
larity) between every pair of words and store these as a dis-          with smaller baseline distances. The penalty is less when the
tance matrix ∆, where each element can be addressed as                 distance from the temporal tf-idf model is large. Equation (3)
δijt ∈ ∆. This distance matrix ∆ becomes the training data             supports the phenomenon that, for a specific word, most of
for the expected distance between a particular pair of words           the words in the vocabulary are at a relatively large distance.
(wi , wj ) ∈ W at time t ∈ T . We use the notation δij to rep-         The large distances need not be a part of the penalty because
resent a vector of size |T | with the temporal tf-idf-based co-        the objective function is only concerned about neighbors that
sine distance between (wi , wj ) ∈ W for all the timestamps.           appear in the vicinity for the temporal tf-idf model.
The cosine distances are later used in the output layer of our
proposed neural network.                                               4.4   Temporal diffusion filter
                                                                       Based on the diffusion theory (Angulo et al. 1980), we as-
4.2   Optimizing for similarity                                        sume that the meaning of a word, and consequently its vector
One of our objectives is to obtain a low-dimensional word              representation, diffuses (or drifts) over time. Thus, the word
embedding model U such that computing the cosine distance              embeddings should evolve smoothly over time. To introduce
between the word vectors results in a distance matrix that             this concept in our objective function, we model the effect
closely resembles ∆. Equation (2) formulates this objective            of every word-vector in all timestamps to some degree.
as ϑ. In this case, we are optimizing the vectors in U to mini-           We use a Gaussian filter (Eq. (4)) to diffuse the contribu-
mize the difference between the cosine distance of each pair           tion of each vector smoothly before and after the timestamp
of word vectors for every timestamp and the cosine distance            of the current sample. The filter uses a sliding window, going
from temporal tf-idf model in ∆ (Eq. (1)). The minimiza-               from the first to the last timestamp. σ is a user-settable pa-
tion of the difference will ensure that our model captures the         rameter representing the standard deviation of the Gaussian
same similarity as the temporal tf-idf model but ours will             distribution. A large value of σ means that the diffusion of
provide low-dimensional contextual vectors.                            word vectors is slow over time. A small standard deviation
   In this paper, the term dist(A, B) refers to the cosine dis-        allows capturing short-term changes in meaning.
tance between vector A and vector B. The cosine distance
between two words vectors is bounded between [0, 1]. A co-                                                                                         
                                                                                              1            (ti −t)2
sine distance of 0 between two words vectors means that                  γ(t, σ) =        √           e−     2σ 2         with ti = 1, . . . , |T |
both words share the same context, while a cosine distance                                    2πσ 2
of 1 means that the vectors are completely orthogonal, thus                                                                    (4)
does not share contextual similarities. The variable α is in-             Equation (5) presents the updated objective ϑ3 which in-
troduced as a scaling factor to avoid numerical stability is-          cludes the temporal diffusion of the word embeddings.
sues with values close to zero. The simplest form of our ob-                         |W| |W| |T |
jective function is as follows.                                                      X   XX
                                                                         ϑ3 (U) =
                                                                                     i=1 j=1 t=1
             |W| |W| |T |
                                                                                                                                    2
                                                                         (α · dist(γ(t, σ)Ui , γ(t, σ)Uj ) − α · δijt ) · e−βδijt
             X   XX                                          2
  ϑ1 (U) =                  (α · dist(uit , ujt ) − α · δijt )   (2)                                                                                  (5)
             i=1 j=1 t=1
                                                                       4.5   Smoothness penalty: Creating a homogeneous
4.3   Weighing relevance: Giving more importance                             temporal embedding space
      to the neighborhood of each word                                 The second important goal that our word embedding model
In our work, we focus on the task of studying the semantic             should achieve is to be spatially smooth over time. Contin-
evolution of a word based on changes to its context. Thus, it          uous or smooth temporal embeddings are those where the
                                                                      We generated the synthetic dataset consisting of 10,000
                                                                   words and ten timestamps. For this dataset, we already know
                                                                   the 10-nearest neighbors of each word in every timestamp.
                                                                   Neighborhoods of larger sizes will contain random words
                                                                   starting at the 11th nearest neighbor.
                                                                      The PubMed pandemic dataset, contains 328,908 ab-
                                                                   stracts of pandemic and epidemic-related biomedical publi-
                                                                   cations. The abstracts were published between years 2000
                                                                   to 2020. We selected 3,000 most frequent biomedical enti-
                                                                   ties for this dataset.
                                                                      The PubMed COVID dataset contains 41,571 abstracts
                                                                   of biomedical papers related to COVID-19, published in
                                                                   2020. The corpus was collected from Kaggle COVID19
Figure 1: The proposed neural network architecture for tem-        Open Research Dataset Challenge (Wang et al. 2020). We
poral embedding generation in the hidden layer.                    selected 2,000 most frequent biomedical entities for this
                                                                   dataset. We extracted the biomedical entities for the PubMed
distance (e.g., Manhattan or Euclidean) between two vec-           abstracts using scispaCy’s Biomedical Named Entity Recog-
tors of the same word for consecutive timestamps is small.         nition (Neumann et al. 2019). In this paper, we used the
Equation (6) captures the expected behavior by penalizing          phrase temporal word embedding or temporal embedding to
significant spatial changes.                                       describe the core concepts, while in practice we performed
                                                                   temporal biomedical entity embedding.
                         |W| |T |−1
                         X    X                                       We evaluate our temporal word embedding method by
             ε1a (U) =                ||uit+1 , uit ||2    (6)     comparing its performance with that of a regular tf-idf
                         i=1 t=1                                   model, the temporal tf-idf model (Camacho et al. 2018), dy-
   The main issue with this expression is that by forcing con-     namic Bernoulli embeddings (Rudolph and Blei 2018), and
secutive vectors to be very close together, we might be los-       temporal word embeddings with a compass (TWEC) (Carlo,
ing important information when the vectors drift apart in the      Bianchi, and Palmonari 2019). In all our experiments we
original data. Thus, we introduce weights, ωϑ , and ωε to con-     used an embedding size of 64.
trol the effect of each objective. The final objective function       We seek to answer the following questions.
takes the form of Eq. (7).                                        1. What is the effect of introducing different penalty terms
                                                                      in our objective function? (Section 5.1)
                  Fa (U) = ϑ3 (U)ωϑ ε1 (U)ωε               (7)
                                                                  2. How well do the models perform in terms of capturing
  An alternative form would be:                                       the neighborhood of entities over time, compared to the
          Fb (U) = ωϑ log ϑ3 (U) + ωε log ε1 (U)           (8)        temporal tf-idf? (Section 5.2)
                                                                  3. How well do the models perform in terms of capturing
  or                                                                  changes in the neighborhood over time in the respective
               Fc (U) = ωϑ ϑ3 (U) + ωε ε1 (U)              (9)        embedding spaces? (Section 5.3)
4.6    Implementation                                             4. How well does our algorithm track the quick evolution
                                                                      of a specific entity, such as COVID, compared to other
We implemented a neural network-based model using Ten-
                                                                      methods? (Section 5.4)
sorflow to generate our low-dimensional temporal word
embeddings. An overall view of the architecture of our neu-       5. How well does our algorithm capture semantic evolution
ral network is shown in Fig. 1. The goal of the neural net-           of a general term, such as pandemic, compared to other
work is to minimize Eq. (7). The embeddings for all words             methods? (Section 5.5)
in all timestamps are generated in the hidden layer. We ini-
tialize the weights in the hidden layer in the range [0, 1].      5.1   Effect of penalty terms
The data used for training the model contains three inputs        In this experiment, we study the effect of the different ver-
(one-hot encoding of a pair of words for which the cosine         sions of our objective function on the quality of the temporal
distance is known, and the timestamp) and one target value        word embedding model, focusing on the task of tracking se-
(cosine distance). The inputs are the indices for two random      mantic evolution. The versions under this study correspond
words wit and wjt , at timestamp t. The target value is the ex-   to ϑ1 (2), ϑ2 (3), ϑ3 (5), Fa (7), Fb (8), and Fc (9). We
pected cosine distance between wit and wjt , obtained using       quantify the quality of the resulting vectors with two differ-
the temporal tf-idf representations of Eq. (1).                   ent metrics: similarity and continuity.
                                                                     The similarity is measured as the number of intersections
              5     Experimental Results                          between the word neighborhoods obtained using the tem-
We performed experiments using three different datasets: a        poral tf-idf model and each of the different versions of our
synthetic dataset, PubMed Pandemic dataset, and PubMed            objective function. The goal of the similarity evaluation is
COVID dataset.                                                    to quantify how well our model mimics the temporal tf-idf
                                                                                                                   Averaged over 1000 randomly chosen entities
                                                          𝓕a
                                                                                                                   from 328,908 PubMed (Pandemic) abstracts
                                                                                                               bernoulli   temporal_embeddings    twec


                                                                 the neighborhoods in temporal tf-idf
                                                                 Average Jaccard similarity between
                                                          𝝑3                                            0.25

                                                          𝝑2


                                                                          and another model
                                                                                                        0.20

                                                          𝝑1                                            0.15

                                                                                                        0.10

                                                          𝓕c                                            0.05

                                                          𝓕b                                            0.00


                                                                                                               2000
                                                                                                               2001
                                                                                                               2002
                                                                                                               2003
                                                                                                               2004
                                                                                                               2005
                                                                                                               2006
                                                                                                               2007
                                                                                                               2008
                                                                                                               2009
                                                                                                               2010
                                                                                                               2011
                                                                                                               2012
                                                                                                               2013
                                                                                                               2014
                                                                                                               2015
                                                                                                               2016
                                                                                                               2017
                                                                                                               2018
                                                                                                               2019
                                                                                                               2020
                                                                                                                                  Timestamps
                                                                 Figure 4: Jaccard similarity in the neighborhoods between
Figure 2: Average number of intersections per timestamp for
                                                                 temporal tf-df (Camacho et al. 2018) and each of the three
different neighborhood sizes (k) between the neighborhoods
                                                                 models – TWEC, Bernoulli embeddings, and our temporal
obtained with the baseline method and those obtained using
                                                                 word embedding model. (PubMed (pandemic) dataset. Em-
the different versions of our objective function (embedding
                                                                 bedding size = 64.)
size = 64).
                                                                 model can correctly capture the semantic evolution of the
                                                                 synthetic dataset.
                                                                    Evaluating continuity is required to ensure that there is a
                                                                 smooth transition between timestamps for the vectors of the
                                                                 same word. A high average or minimum MSE value indi-
                                                                 cates that there is a significant movement of the word vectors
                                                                 over time in the embedding space. However, a small max-
                                                                 imum MSE value would mean that the word embeddings
                                                                 are not following the trends observed in the temporal tf-idf
                                                                 model-based training. Thus, the best model is one that has
                                                                 high similarity with the temporal tf-idf model while main-
                                                                 taining a low MSE value.
                                                                    Fig. 3 shows the results for the continuity evaluation. In
        𝓕a 𝓕b 𝓕c 𝝑1 𝝑2 𝝑3 𝓕a 𝓕b 𝓕c 𝝑1 𝝑2 𝝑3 𝓕a 𝓕b 𝓕c 𝝑1 𝝑2 𝝑3    this case, Fc has a continuity of 0.0, which, in conjunction
                                                                 with the similarity results, indicates that this objective func-
Figure 3: Average mean squared error (MSE) for different         tion produces static, unusable vectors. The second smallest
versions of our objective function. The average MSE is com-      average MSE value is obtained with Fa , which also showed
puted from obtaining the squared difference between vectors      the best performance in terms of similarity. Thus, the final
for the same word for every pair of consecutive timestamps       objective function is Fa (Eq. (7)), and we confirm that the
(embedding size = 64).                                           smoothness penalty (Eq. (6)) has a positive effect both on
                                                                 the similarity and continuity results.
model. It must be noted that we did not expect to have a per-
fect match in the neighborhoods of words since the temporal      5.2                             Capability to capture content neighborhood
tf-idf model representation does not take into account latent    A major purpose of any temporal word modeling is to cap-
contextual relationships between words.                          ture content similarity over time. We compare three mod-
   The continuity is measured using the average, maximum,        els – TWEC, Bernoulli embeddings, and our temporal word
and minimum mean squared errors (MSE) across consecu-            embedding – with Temporal tf-idf (Camacho et al. 2018) in
tive timestamps for the word vectors obtained using the dif-     Fig. 4, using PubMed (pandemic) dataset. We use temporal
ferent versions of our objective function.                       tf-idf (Camacho et al. 2018) for this comparison because it
   Fig. 2 shows the results for the similarity evaluation with   models content smoothly over time. Each line in the figure
the synthetic dataset described at the beginning of Section 5.   represents average set-based Jaccard similarity between the
The objective function labeled as Fa on the figure performs      10-nearest neighbors of 1000 randomly selected entities us-
significantly better than the other formulations. If we dis-     ing temporal tf-idf and the 10-nearest neighbors of the same
card Fb and Fc , it is possible to see how the similarity im-    entities using one of the three models. Fig. 4 demonstrates
proves with the progression in which we developed our ob-        that our embedding model and TWEC have closer similar-
jective function. Furthermore, taking into account that only     ity with temporal tf-idf than Bernoulli embeddings. Addi-
the top-10 nearest neighbors are known and set as accurate       tionally, our model has greater similarity with the neigh-
in the synthetic data and the rest of the neighbors are ran-     borhood of temporal tf-idf in the earlier timestamps, com-
dom, having an average of 8 intersections means that our         pared to both TWEC and Bernoulli embeddings. Our model
                                               Averaged over 1000 randomly chosen entities                   in terms of average Jaccard dissimilarity, better than regular
                                               from 328,908 PubMed (Pandemic) abstracts
                                           bernoulli   temporal_embeddings   temporal_tfidf   tfidf   twec
                                                                                                             tf-idf and temporal tf-idf models.
                                     1.0                                                                        Our model is clearly superior in terms of the ability to
current year and the previous year

                                                                                                             capture changes. In subsection 5.4, we explain how the supe-
  between 10-neighbours of the
  Average Jaccard dissimilarity


                                     0.8                                                                     riority in the detection of changes in the neighborhood helps
                                                                                                             in analyzing evolving concepts, such as COVID-19.
                                     0.6
                                                                                                             5.4   Analyzing the neighborhood of COVID-19
                                     0.4
                                                                                                             In this experiment, we analyze the changes in the neigh-
                                                                                                             borhood of the word COVID in the PubMed (COVID)
                                     0.2                                                                     dataset. Fig. 6 presents how the similarities between the en-
                                                                                                             tity COVID and some of its nearest neighbors–China, epi-
                                     0.0                                                                     demic, pandemic, and patients– change over time using (a)
                                           2001
                                           2002
                                           2003
                                           2004
                                           2005
                                           2006
                                           2007
                                           2008
                                           2009
                                           2010
                                           2011
                                           2012
                                           2013
                                           2014
                                           2015
                                           2016
                                           2017
                                           2018
                                           2019
                                           2020
                                                                                                             TWEC model, (b) Bernoulli embeddings, (c) temporal tf-idf,
                                                                 Timestamps                                  and (d) our temporal embedding model. The data contains
Figure 5: Comparison of average Jaccard dissimilarity                                                        ranges of two-weeks from January to July of 2020. As we
(change) between 10-neighbors of the current year and the                                                    already know, COVID-19 is, as of August 2020, considered
previous year.                                                                                               a pandemic – which is a global outbreak rather than a lo-
                                                                                                             cal epidemic (Steffens 2020). In Fig. 6, we observe that (c)
smoothly spreads word-influence using diffusion over the                                                     temporal tf-idf and (d) our temporal embedding can detect
years. As a result, our embedding model performs signifi-                                                    the rising trends of pandemic and falling trends of the word
cantly better than other methods even when the vocabulary                                                    epidemic. This observation matches with our known knowl-
is smaller in the earlier timestamps.                                                                        edge regarding COVID-19. TWEC (Fig. 6a) is able to track
                                                                                                             this to some degree but with zigzag-patterns in the trends.
5.3                          Capability to detect changes in neighborhood                                    Bernoulli embeddings (Fig. 6b) give higher similarity for
An objective of a temporal embedding technique is to cap-                                                    pandemic than epidemic with the word COVID, which is cor-
ture changes in the neighborhood of each word over time.                                                     rect in July but the timeline does not demonstrate any rising
The ability to capture changes allows us to study the evolu-                                                 and falling trends of the words pandemic and epidemic.
tion of concepts. This subsection provides an experiment to                                                     Our temporal embedding (Fig. 6d) demonstrates that the
investigate how much change occurs from one year to an-                                                      word China had high similarity with COVID in the begin-
other in the neighborhood using different models. We quan-                                                   ning. The similarity started to fall by the end of March. Ac-
tify change in terms of set-based Jaccard dissimilarity (1.0-                                                cording to our model, starting at the end of march the word
Jaccard similarity) between the neighborhood of a word in                                                    epidemic started to exhibit lesser similarity with COVID and
the current year and the neighborhood of the same word in                                                    the word pandemic started to show higher similarity. The
the previous year. Average Jaccard dissimilarity over many                                                   temporal tf-idf model (Fig. 6c) demonstrates a similar trend.
words in a certain year for a model gives an overall idea                                                    The trends match with our common knowledge regarding
of how much the model can detect changes in the neigh-                                                       the COVID-19 pandemic. Also, TWEC (Fig. 6a) has an
borhood. Fig. 5 demonstrates average Jaccard dissimilarity                                                   overall downward trend for the word China, but with zigzag
(change) at each year for five different models – our tempo-                                                 movements over the timeline. Bernoulli embeddings (Fig. 6
ral embedding model, Bernoulli embeddings, TWEC, and                                                         b) do not demonstrate any change and capture a static simi-
vanilla tf-idf computed independently at each year, and tem-                                                 larity for the entire timeline. We noticed that the underlying
poral tf-idf using 1000 randomly selected entities from the                                                  vectors in Bernoulli embeddings change but the neighbors
PubMed (pandemic) dataset. The plot shows that our tem-                                                      of a word do not change much.
poral embedding model detects more changes in terms of                                                          We know that the number of COVID infected patients in-
average Jaccard dissimilarity compared to other models.                                                      creased over the months of 2020. Our temporal embedding
   The Bernoulli embeddings capture the least amount of                                                      model (as well as the temporal tf-idf) captures the rising-
changes. Based on further investigation (not covered in this                                                 similarity of the word patients in the context of COVID quite
paper), we noticed that Bernoulli embeddings rarely capture                                                  smoothly (Fig. 6d). TWEC also has an upward trend which
any changes. These embeddings capture only a few long-                                                       is less smooth. However, the Bernoulli embeddings do not
term changes, whereas our temporal embedding model sig-                                                      demonstrate any changes in the similarity between the words
nificantly captures both long-term and short-term changes.                                                   patients and COVID.
TWEC captures more changes than Bernoulli and temporal                                                          This experiment demonstrates that our temporal embed-
tf-idf, but lesser changes than the vanilla tf-idf. Our tempo-                                               ding model captures the short-term changes in content (as
ral word embedding performs even better than the vanilla tf-                                                 shown by temporal tf-idf) while also capturing the context
idf. Contextual changes are best-captured using our tempo-                                                   that we can track smoothly to study the evolution of a con-
ral embedding because the objective function of our model                                                    cept, such as COVID. In contrast, Bernoulli embeddings
spreads the effect of each word smoothly from the current                                                    construct a context that is intractable in terms of similarity.
year to other years. As a result, our model captures changes,                                                TWEC provides noisy patterns that are difficult to interpret.
                                                                                                                                                           TWEC                                                                                                                                                                                                                                                      Bernoulli Embedding Model
                                             1.0                                                                                                                                                                                                                                                                                         1.0

         Cosine Sim. with the word 'COVID'


                                                                                                                                                                                                                                                                                                     Cosine Sim. with the word 'COVID'
                                                                                                                                                                                                                                               Similar Words
                                                                                                                                                                                                                                                     china
                                             0.8                                                                                                                                                                                                                                                                                         0.8
                                                                                                                                                                                                                                                     epidemic
                                                                                                                                                                                                                                                     pandemic
                                             0.6                                                                                                                                                                                                     patients                                                                            0.6


                                             0.4                                                                                                                                                                                                                                                                                         0.4                                                                                                                                                                                         Similar Words
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             china
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             epidemic
                                             0.2                                                                                                                                                                                                                                                                                         0.2                                                                                                                                                                                                 pandemic
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             patients

                                             0.0                                                                                                                                                                                                                                                                                         0.0


                                                                                                                                                                                                                                                                   13 01 - 14 Jul

                                                                                                                                                                                                                                                                                    14 15 - 31 Jul


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              13 01 - 14 Jul

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               14 15 - 31 Jul
                                                                                                                        05 01 - 14 Mar

                                                                                                                                         06 15 - 31 Mar


                                                                                                                                                                                                                                                                                                                                                                                                                      05 01 - 14 Mar

                                                                                                                                                                                                                                                                                                                                                                                                                                       06 15 - 31 Mar
                                                                                                                                                                                                                              11 01 - 14 Jun

                                                                                                                                                                                                                                                12 15 - 30 Jun


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            11 01 - 14 Jun

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             12 15 - 30 Jun
                                                    01 01 - 14 Jan

                                                                     02 15 - 31 Jan


                                                                                                                                                                                                                                                                                                                                               01 01 - 14 Jan

                                                                                                                                                                                                                                                                                                                                                                 02 15 - 31 Jan
                                                                                                                                                          07 01 - 14 Apr

                                                                                                                                                                           08 15 - 30 Apr


                                                                                                                                                                                                                                                                                                                                                                                                                                                        07 01 - 14 Apr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                         08 15 - 30 Apr
                                                                                      03 01 - 14 Feb

                                                                                                       04 15 - 30 Feb


                                                                                                                                                                                            09 01 - 14 May

                                                                                                                                                                                                             10 15 - 31 May


                                                                                                                                                                                                                                                                                                                                                                                  03 01 - 14 Feb

                                                                                                                                                                                                                                                                                                                                                                                                    04 15 - 30 Feb


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          09 01 - 14 May

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           10 15 - 31 May
                                                                                       Timestamps (ranges of dates in year 2020)                                                                                                                                                                                                                                                   Timestamps (ranges of dates in year 2020)

                                  (a) TWEC-Temporal Word Embedding using Compass                                                                                                                                                                                                                                                                                                                   (b) Bernoulli embeddings
                                                                                                                                 Temporal Tf-Idf Model                                                                                                                                                                                                                                                               Temporal Embedding Model
                                             1.0                                                                                                                                                                                                                                                                                         1.0
         Cosine Sim. with the word 'COVID'


                                                                                                                                                                                                                                                                                                     Cosine Sim. with the word 'COVID'
                                                                                                                                                                                                                                                Similar Words
                                                                                                                                                                                                                                                                 china
                                             0.8                                                                                                                                                                                                                 epidemic                                                                0.8
                                                                                                                                                                                                                                                                 pandemic
                                                                                                                                                                                                                                                                 patients
                                             0.6                                                                                                                                                                                                                                                                                         0.6


                                             0.4                                                                                                                                                                                                                                                                                         0.4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Similar Words
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         china
                                             0.2                                                                                                                                                                                                                                                                                         0.2                                                                                                                                                                                                             epidemic
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         pandemic
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         patients
                                             0.0                                                                                                                                                                                                                                                                                         0.0
                                                                                                                                                                                                                                                                   13 01 - 14 Jul

                                                                                                                                                                                                                                                                                    14 15 - 31 Jul


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              13 01 - 14 Jul

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               14 15 - 31 Jul
                                                                                                                        05 01 - 14 Mar

                                                                                                                                         06 15 - 31 Mar


                                                                                                                                                                                                                                                                                                                                                                                                                      05 01 - 14 Mar

                                                                                                                                                                                                                                                                                                                                                                                                                                       06 15 - 31 Mar
                                                                                                                                                                                                                              11 01 - 14 Jun

                                                                                                                                                                                                                                                12 15 - 30 Jun


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            11 01 - 14 Jun

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             12 15 - 30 Jun
                                                    01 01 - 14 Jan

                                                                     02 15 - 31 Jan


                                                                                                                                                                                                                                                                                                                                               01 01 - 14 Jan

                                                                                                                                                                                                                                                                                                                                                                 02 15 - 31 Jan
                                                                                                                                                          07 01 - 14 Apr

                                                                                                                                                                           08 15 - 30 Apr


                                                                                                                                                                                                                                                                                                                                                                                                                                                        07 01 - 14 Apr

                                                                                                                                                                                                                                                                                                                                                                                                                                                                         08 15 - 30 Apr
                                                                                      03 01 - 14 Feb

                                                                                                       04 15 - 30 Feb


                                                                                                                                                                                            09 01 - 14 May

                                                                                                                                                                                                             10 15 - 31 May


                                                                                                                                                                                                                                                                                                                                                                                  03 01 - 14 Feb

                                                                                                                                                                                                                                                                                                                                                                                                    04 15 - 30 Feb


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          09 01 - 14 May

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           10 15 - 31 May
                                                                                       Timestamps (ranges of dates in year 2020)                                                                                                                                                                                                                                                   Timestamps (ranges of dates in year 2020)

                                                   (c) Temporal tf-idf model (vector size = 32,000)                                                                                                                                                                                                                                                             (d) Our model (Eq. (7), vector size = 64)
Figure 6: Evolution of the word COVID in PubMed COVID-19-related abstracts published in 2020 using four different models
– TWEC, Bernoulli embeddings, temporal tf-idf, and our temporal embedding model. Cosine similarity is used to compute the
similarity between the vectors of the word COVID and any other word.

5.5   Analysis of the the word Pandemic                                                                                                                                                                                                                                                                                                  nearest neighbors, which are not highly similar to the word
                                                                                                                                                                                                                                                                                                                                         pandemic. This indicates that no entities appeared too close
With the rise of the COVID-19 pandemic, it has become es-                                                                                                                                                                                                                                                                                to the word pandemic in those years.
sential to study how biomedical scientists have dealt with
a pandemic in the past years. Such an analysis requires a                                                                                                                                                                                                                                                                                   TWEC captures influenza and H1N1 in the middle of
model that can capture long term changes. In this experi-                                                                                                                                                                                                                                                                                the timeline but fails to capture COVID-related keywords
ment, we attempt to track the closest term to the word pan-                                                                                                                                                                                                                                                                              in 2020 as the closest entity to pandemic. In Fig. 7, the
demic in each year of the PubMed (pandemic) dataset, which                                                                                                                                                                                                                                                                               Bernoulli model can pick up coronavirus as the nearest
spans biomedical abstracts from 2000 to 2020.                                                                                                                                                                                                                                                                                            neighbor of pandemic but it was not able to pick up influenza
   Each line of Fig. 7 plots the similarity of the top nearest-                                                                                                                                                                                                                                                                          in its trend. Moreover, coronovirus appears in all the years
neighbor of the word pandemic in each year. The five lines                                                                                                                                                                                                                                                                               as the top nearest neighbor of pandemic which is not cor-
represent similarities using five different models – Bernoulli                                                                                                                                                                                                                                                                           rect because the fact is that the coronavirus spread started in
embeddings, our temporal embedding model, temporal tf-                                                                                                                                                                                                                                                                                   2019 and became a pandemic in 2020 (Cucinotta and Vanelli
idf, vanilla tf-idf, and TWEC. Notice that our temporal em-                                                                                                                                                                                                                                                                              2020). Temporal tf-idf and vanilla tf-idf were able to pick up
bedding model demonstrates peak similarities in 2009/2010                                                                                                                                                                                                                                                                                coronavirus/COVID. Temporal tf-idf and vanilla tf-idf were
and in 2020, when H1N1 influenza (swine flu) and COVID-                                                                                                                                                                                                                                                                                  also able to pick up influenza subtype H1N1 (swine flu) but
19, respectively became prominent. This signal from our                                                                                                                                                                                                                                                                                  the respective similarities were not high.
temporal embedding model reflects the fact that the worst                                                                                                                                                                                                                                                                                   Based on the experiment presented in this subsection, our
pandemics in the last 20 years are the H1N1 influenza in                                                                                                                                                                                                                                                                                 temporal embedding model has the ability to separate highly
2009 (Sullivan et al. 2010) and COVID-19 in 2020 (Cu-                                                                                                                                                                                                                                                                                    contextual words (such as H1N1 and COVID) of a concept
cinotta and Vanelli 2020). Note that other words like con-                                                                                                                                                                                                                                                                               (such as pandemic) via similarity-peaks. Our model helps
cerns in 2004 and public in 2015 are detected as the top                                                                                                                                                                                                                                                                                 in determining prominent neighbors of a concept in the past.
                                                             Dataset: PubMed (Pandemic) abstracts                             the description of the epidemic flow of contagious disease. Public
                                                 bernoulli      temporal_embeddings       temporal_tfidf      tfidf   twec    Health Rep 95(5): 478–485.
                                           1.1                                                                                Bamler, R.; and Mandt, S. 2017. Dynamic Word Embeddings. In
                                                                                                                              Proceedings of the 34th ICML, volume 70, 380–389. PMLR.
                                           1.0 coronavirus                     coronavirus                      coronavirus
                                                                                                                              Barkan, O. 2017. Bayesian Neural Word Embedding. In AAAI,
Cosine similarity between "pandemic" and


                                                     smoking                                                        covid
                                           0.9                                                                                3135–3143. San Francisco, California, USA: AAAI Press.
                                                                                                                 covid
                                                             africa          influenza                     healthcare         Camacho, R.; Dos Santos, R. F.; Hossain, M. S.; and Akbar, M.
                                           0.8                                            H1N1
                                                                                                              covid           2018. Tracking the Evolution of Words with Time-reflective Text
        it's top nearest neighbor


                                                                                                                              Representations. In 2018 IEEE International Conference on Big
                                           0.7                                           H1N1                                 Data (Big Data), 2088–2097. Seattle, WA, USA: IEEE.
                                                         aids
                                           0.6                hiv concerns
                                                                                                                              Carlo, V. D.; Bianchi, F.; and Palmonari, M. 2019. Training Tem-
                                                                                                     public                   poral Word Embeddings with a Compass. In AAAI.
                                           0.5                                                                                Cucinotta, D.; and Vanelli, M. 2020. WHO Declares COVID-19 a
                                                                                                                      covid   Pandemic. Acta bio-medica : Atenei Parmensis 157—160.
                                           0.4
                                                                                    influenza                                 Hamilton, W. L.; Leskovec, J.; and Jurafsky, D. 2016. Diachronic
                                           0.3
                                                                                                                              Word Embeddings Reveal Statistical Laws of Semantic Change. In
                                                                                                                              Proc. of ACL, volume 1, 1489–1501. Berlin, Germany.
                                           0.2               aids                 H1N1                                        Marina Del Rey, CA, U. 2018. Dynamic Word Embeddings for
                                                                                                 aids
                                                                                                                              Evolving Semantic Discovery. In Proc. of ACM WSDM, 673–681.
                                           0.1
                                                                                                                              Mihalcea, R.; and Nastase, V. 2012. Word Epoch Disambiguation:
                                                                                                                              Finding How Words Change over Time. In Proc. of ACL, 259–263.
                                           0.0
                                                                                                                              Mikolov, T.; Chen, K.; Corrado, G. S.; and Dean, J. 2013a. Effi-
                                                 2000
                                                 2001
                                                 2002
                                                 2003
                                                 2004
                                                 2005
                                                 2006
                                                 2007
                                                 2008
                                                 2009
                                                 2010
                                                 2011
                                                 2012
                                                 2013
                                                 2014
                                                 2015
                                                 2016
                                                 2017
                                                 2018
                                                 2019
                                                 2020
                                                                                                                              cient Estimation of Word Representations in Vector Space. CoRR
                                                                               Timstamp
                                                                              Timestamps                                      abs/1301.3781.
                                                                                                                              Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J.
Figure 7: Cosine similarity of the top nearest neighbor of                                                                    2013b. Distributed Representations of Words and Phrases and their
pandemic in each year using all methods. Nearest neighbors                                                                    Compositionality. ArXiv abs/1310.4546.
are written with selected peaks. Our temporal embedding                                                                       Mitra, S.; Mitra, R.; Maity, S.; Riedl, M.; Biemann, C.; Goyal, P.;
method provides better context for pandemic. Embedding                                                                        and Mukherjee, A. 2015. An automatic approach to identify word
size = 64.                                                                                                                    sense changes in text media across timescales. Natural Language
                                                                                                                              Engineering 21: 773–798.
                                                                                                                              Naim, S. M.; Boedihardjo, A. P.; and Hossain, M. S. 2017. A scal-
Our vectors are able to construct a peak for a prominent near-                                                                able model for tracking topical evolution in large document collec-
est neighbor because our method models diffusion. That is,                                                                    tions. In IEEE BigData, 726–735.
a concept that appears today affects the past and the future                                                                  Neumann, M.; King, D.; Beltagy, I.; and Ammar, W. 2019. Scis-
to some extent, regardless of whether the concept directly                                                                    paCy: Fast and Robust Models for Biomedical Natural Language
appears in the contents or not.                                                                                               Processing. In BioNLP@ACL.
                                                                                                                              Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global
                                                                 6     Conclusions                                            Vectors for Word Representation. In Proc. of Conf. on EMNLP,
                                                                                                                              1532–1543.
This paper introduces a new technique to generate low-                                                                        Radinsky, K.; Davidovich, S.; and Markovitch, S. 2012. Learning
dimensional temporal word embeddings for timestamped                                                                          Causality for News Events Prediction. In Proc. of Int. Conf. on
scientific documents. We compare our model with existing                                                                      WWW, 909–918.
temporal word embeddings. Our method generates a repre-                                                                       Rosin, G. D.; Adar, E.; and Radinsky, K. 2017. Learning Word
sentation that: (1) can track changes observed within a short                                                                 Relatedness over Time. In Proceedings of the 2017 Conference on
period, (2) provides a smooth evolution of the word vectors                                                                   Empirical Methods in Natural Language Processing, 1168–1178.
over a continuous temporal vector space, (3) uses the con-                                                                    Rudolph, M.; and Blei, D. 2018. Dynamic Embeddings for Lan-
cept of diffusion to capture trends better than the existing                                                                  guage Evolution. In Proceedings of the 2018 World Wide Web Con-
                                                                                                                              ference, 1003–1011.
models, and (4) is low-dimensional. Unlike previous mod-
els, our proposed model creates a homogeneous space over                                                                      Rudolph, M. R.; and Blei, D. M. 2017. Dynamic Bernoulli Embed-
                                                                                                                              dings for Language Evolution. ArXiv abs/1703.08052.
every timestamp of the embeddings. As a result, the gener-
                                                                                                                              Steffens, I. 2020. A hundred days into the coronavirus disease
ated vectors over timestamps can be used for prediction us-                                                                   (COVID-19) pandemic. Euro Surveill. 25(14).
ing conventional algorithms for predicting signals. Extrapo-
                                                                                                                              Sullivan, S. J.; Jacobson, R. M.; Dowdle, W. R.; and Poland, G. A.
lation of the embedding vectors to forecast a future neigh-                                                                   2010. 2009 H1N1 influenza. Mayo Clin. Proc. 85(1): 64–76.
borhood of a scientific concept is a future direction of this                                                                 Tang, X.; Qu, W.; and Chen, X. 2013. Semantic Change Computa-
work.                                                                                                                         tion: A Successive Approach 68–81.
                                                                                                                              Wang, L. L.; Lo, K.; Chandrasekhar, Y.; Reas, R.; and et al. 2020.
                                                                      References                                              CORD-19: The COVID-19 Open Research Dataset.
Aitchison, J. 2013. Language change: progress or decay? Cam-                                                                  Yogatama, D.; Wang, C.; Routledge, B. R.; Smith, N. A.; and Xing,
bridge University Press.                                                                                                      E. 2014. Dynamic Language Models for Streaming Text. Transac-
Angulo, J.; Pederneiras, C.; Ebner, W.; Kimura, E.; and Megale,                                                               tions of the Association for Computational Linguistics 181–192.
P. 1980. Concepts of diffusion theory and a graphic approach to                                                               Yule, G. 2017. The study of language. Cambridge University Press.

</pre>