<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Bari, Italy
" wang@dei.unipd.it (B. Wang)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Sequential Modeling in Vector Space</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benyou Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuele Di Buccio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimo Melucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Statistical Sciences, University of Padova</institution>
          ,
          <addr-line>Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>In Information Retrieval and Natural Language Processing, representation of discrete objects, e.g., words, usually relies on embedding in vector space; this representation typically ignores sequential information. One instance of such sequential information is temporal evolution. For example, when discrete objects are words, their meaning may smoothly change over time. For this reason, previous works proposed dynamic word embeddings to model this sequential information in word representation explicitly. This paper introduces a representation that relies on sinusoidal functions to capture the sequential order of discrete objects in vector space.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;sequential modeling</kwd>
        <kwd>vector space</kwd>
        <kwd>dynamic word embedding</kwd>
        <kwd>sinusoidal functions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>However, such a embedding cannot deal with the spatially or temporally sequential
information of objects. One spatial scenario is to encode word order in bag-of-words neural networks
like Transformer [2, 3]. Regarding the temporal scenario, word meaning may change over time
[4]. For instance, the word gay shifted from the meaning cheerful in the 1900s to the meaning
frolicsome in the 1950s and finally to the meaning homosexuality since the 1990s [5].</p>
      <p>In this work, we will focus on the temporally sequential aspect: temporal evolution. This
work adopts sinusoidal functions to encode sequential evolution of word meaning change in
vector space. The advantages over existing methods might be: 1) it is more eficient since the
proposed method do not need to maintain a copy of word representation for each timestamp as
required by previous works [6]; 2) it can be more efective to model semantic evolution since
functions can deal with long-term but gradual meaning changes thanks to the continuity of</p>
      <p>
        Binary coding: orders in 16 numbers (
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref14 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0 − 15</xref>
        ) are encoded as four-digit
binary numbers 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001,
1010, 1011, 1100, 1101, 1110, 1111, . Observe that the last digit in red is a
periodical sequence of [0, 1, · · · ] with a period of 2, the second last digit
in blue is a periodical sequence of [0, 0, 1, 1, · · · ] with a period of 4, and so
on 1.
functions. In Section 3 we will show how the proposed method could approximate any word
meaning evolution.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem Definition</title>
      <p>A object-agnostic order (e.g., position and time) embedding [7, 2] is defined as:
 : R → R
One may consider binary coding for order embedding. However, it is not diferentiable and thus
unfriendly to neural networks. To this end, one may consider designing continuous coding
with the same periodical property. Fig. 1 shows an alternative sinusoidal encoding [2, 8] with
periods of 2, 4, 8, 16. Such continuity will facilitate back-propagation if such embedding is used
in neural networks.</p>
      <p>Object-aware dynamic evolution. Sequential encoding becomes more challenging when
such sequential evolution is not shared among objects; for example, an individual word may
change meaning over time, but other words may not share the same trend in meaning change.
Therefore, such dynamic evolution processes are object-aware. Formally, evolution of an object
with index  can be formalized as a mapping from object (indexed in N) and time ( ∈ R) to a
-dimensional vector:</p>
      <p>: N × R → R</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology: Dynamic Object Embedding</title>
      <p>
        To smoothly model object-aware dynamic evolution, we represent each object as a continuous
function: a specific object embedding at time  is represented as the values of the function when
the variable equals . More formally, our approach aims to learn a mapping that maps each
object  to functions over time/order:
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
 : N → ( : R → R)
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
where  maps a object, e.g.,  with index , to a function , which is a function over a variable
 ∈ R. Note that the output of  is a -dimensional vector, () ∈ R. Let us denote  ()
as . A object  at time  is represented as a -dimensional vector U, =  ()() = ().
      </p>
      <p>1The example is from https://kazemnejad.com/blog/transformer_architecture_positional_encoding
Examples of  are linear functions () = b + k with parameters b, k ∈ R or a sinusoidal
functions () = b + v sin( + ) with parameters b, v, ,  ∈ R.</p>
      <p>
        A typical way for word vectors is factoring positive point-wise mutual information (PPMI)
matrices [9]. Note that in a temporal scenario, PPMI matrices also changes over time. Assume
that the PPMI between a word pair (, ) at time  is , (); our goal is to approximate , ()
by a dot product between dynamic word embedding of , denoted as  ()() ∈ R, and a static
compass [10] of , denoted as ℎ() = vj ∈ R:
,() ≈  ()()ℎ()
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
Sinusoidal Parameterization. By formalizing  (, ) as sinusoidal functions, i.e., a mixture
of cosine and sine functions plus a bias term:
 ()() = [,1 + ,1 sin(1); ,2 + ,2 cos(1); . . . ; ,−1 + ,−1 sin(/2); , + , cos(/2); ]
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
Eq. 5 will result in:
⏞
Δ
 2
 ()()ℎ() = ∑︁ ,, + ∑︁ ,2−1,2−1 sin() + ,2,2 cos()
      </p>
      <p>⏟=1 =1 ⏟ ,⏞, ⏟ ,⏞,
Therefore, , () is a weighted sum of sinusoidal functions plus a constant term Δ, i.e.,
, () = Δ + ∑︀=/12 ,, sin() + ,, cos() {,,}=/12 and {,,}=/12 are the
co
eficients and {Ω}2=1 are the corresponding frequencies. [11] states that linear combinations
of sine and cosine functions could approximate all continuous functions in (). Thus, Eq. 7
could approximate any , () ∈ (), and therefore capture any word meaning evolution.
Static object vectors, e.g., [12], can be considered as a special case of constant functions:  = b,
or a specific case of sinusoidal function when r = 0 or i is small enough. The additional
parameters i and r are expected to capture the dynamic aspect of word meaning evolution.
Intuitively, long periods reflect some long-range evolution, although in practice, such
sinusoidal functions would not necessarily be periodical with an extremely long period in a limited
timespan [13].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Ongoing and Future Work</title>
      <p>This paper proposes a sinusoidal parameterization to capture the sequential aspects of objects
embedded in vector space. We focused on modeling change in word meaning over time; the
considered parameterization is promising since, in principle, it could approximate any word
meaning evolution. We are currently focusing on the evaluation of the proposed approach to
investigate both its efectiveness and eficiency. Experiments will consider diverse tasks, e.g.,
temporal analogy [6] and semantic change detection [14]. Future work will consider other
discrete objects, e.g., user profiles. Moreover, further theoretical and empirical investigations
are needed to deal with the optimization issues when sinusoidal activation functions are used,
i.e., infinity local minima [13].
The work is supported by the Quantum Access and Retrieval Theory (QUARTZ) project, which
has received funding from the European Union‘s Horizon 2020 research and innovation
programme under the Marie Skłodowska-Curie grant agreement No. 721321.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>A vector space model for automatic indexing</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>18</volume>
          (
          <year>1975</year>
          )
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>NIPS</source>
          <year>2017</year>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Simonsen</surname>
          </string-name>
          ,
          <article-title>Encoding word order in complex embeddings</article-title>
          ,
          <source>in: ICLR</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Di</given-names>
            <surname>Buccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Melucci</surname>
          </string-name>
          ,
          <article-title>Representing words in vector space and beyond, in: Quantum-Like Models for Information Retrieval</article-title>
          and
          <string-name>
            <surname>Decision-Making</surname>
          </string-name>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W. L.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <article-title>Diachronic word embeddings reveal statistical laws of semantic change</article-title>
          ,
          <source>in: ACL</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1489</fpage>
          -
          <lpage>1501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Dynamic word embeddings for evolving semantic discovery</article-title>
          ,
          <source>in: WSDM</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>673</fpage>
          -
          <lpage>681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Kazemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Eghbali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ramanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sahota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thakur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Smyth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Poupart</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Brubaker, Time2vec: Learning a vector representation of time</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>05321</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Simonsen</surname>
          </string-name>
          ,
          <article-title>On position embeddings in bert</article-title>
          , in: ICLR,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Neural word embedding as implicit matrix factorization</article-title>
          ,
          <source>NIPS</source>
          <volume>27</volume>
          (
          <year>2014</year>
          )
          <fpage>2177</fpage>
          -
          <lpage>2185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Di Carlo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <article-title>Training temporal word embeddings with a compass</article-title>
          ,
          <source>in: AAAI</source>
          , volume
          <volume>33</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>6326</fpage>
          -
          <lpage>6334</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cybenko</surname>
          </string-name>
          ,
          <article-title>Approximation by superpositions of a sigmoidal function</article-title>
          ,
          <source>Mathematics of control, signals and systems 2</source>
          (
          <year>1989</year>
          )
          <fpage>303</fpage>
          -
          <lpage>314</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space</article-title>
          ,
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Parascandolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huttunen</surname>
          </string-name>
          , T. Virtanen,
          <article-title>Taming the waves: sine as activation function in deep neural networks, Openreview preprint (</article-title>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Shoemark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. F.</given-names>
            <surname>Liza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>McGillivray</surname>
          </string-name>
          ,
          <article-title>Room to glo: A systematic comparison of semantic change detection approaches with word embeddings</article-title>
          ,
          <source>in: EMNLPIJCNLP</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>