<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Technologies</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Modelling Time and Location in Topic Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Po¨litz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Otto Hahn Str.</institution>
          <addr-line>12, 44227 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Dortmund University, Artificial Intelligence Group</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vancouver</institution>
          ,
          <addr-line>British Columbia</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Wide Web</institution>
          ,
          <addr-line>WWW '12, pp. 769-778, New York, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>44</volume>
      <issue>0</issue>
      <abstract>
        <p>Many text collections like news paper or social media blog archives contain texts that often refer to special dates and/or locations. These information can be valuable to investigate topics in certain regions and time spans. We use topic models that integrate time and geographical information extracted from the texts to find such topics. In this extended abstract, we motivate our approach and shortly describe the method. Experimental evaluations and detailed description are in preparation to a full paper.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Topic models (see for instance (Blei et al., 2003)) have
been used extensively to summarize text collections into
semantic clusters. Such text collections can contain for
instance news paper articles, Blog entries, tweets or any
written social media content. The documents in these
collections contain often information about locations and dates.
These information are valuable for extracting topics for
certain regions or time spans. In order to integrate temporal
and positional information, we need the corresponding time
and location information for our text corpus. Previous
approaches assumed that we either directly have information
about time and position for each document in the corpus or
that a named entity recognition tool finds geographic
locations. We propose a hybrid approach that extends standard
Latent Dirichlet Allocation (LDA (Blei et al., 2003)) topic
models. We assume that the documents can have multiple
dates and positions. In order to integrate multiple
information for single documents, we propose to extract parts
(or chunks) in the documents that contain only
information about one location and one date. For example, a
document might contain the sentence: ”The weather was nice
in Berlin last Sunday, but next day at home in Cologne
it was cloudy.”. In order to reflect all temporal and
posiauthors. Copying permitted for private and academic purposes.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>There are several previous approaches that integrate
temporal and positional information into topic models. In (Yin
et al., 2011) Yin et al. discuss methods to find and
compare topics in documents that have associated GPS
coordinate. Speriosu et al. propose in (Speriosu et al., 2010) to
use topic models that use non-overlapping regions as latent
topics. By this, they model each document as distribution
over these regions. Further approaches use topic models
with geographic information on social media data to
extract activity patterns of users. Hasan and Ukkusuri for
instance use in (Hasan &amp; Ukkusuri, 2014) topic models that
integrate sequences of activities rather than documents. In
(Hong et al., 2012) Hong et al. introduce a sparse
generative topic model and in (Kurashima et al., 2013) Kurashima
et al. propose a geographic topic model that use Twitter
tweets to extract user activities in terms of movement and
interests.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>While the standard topic models group only words and
documents in semantically related topics, we are further
interested in the distribution of the topics over time and
geographic position. In order to extract the distribution of
word senses over time and positions, we use topic models
that consider temporal information about the documents as
well as locations in form of numerical vectors that represent
the geographic position. This means, each document has at
least one time stamp and one geographic position. The time
stamps are assumed to be Beta distributed and the position
Normally distributed. The two distributions are simply
integrated in an LDA topic model under the assumption that
given the latent topics, the words, the time stamps and
geographic positions are independent. We combine the
methods by Wang and McCallum (Wang &amp; McCallum, 2006)
introduced as topics over time and supervised LDA
introduced by Blei et al. (Blei &amp; McAuliffe, 2007).</p>
      <p>The generative process of the words, dates and location is:</p>
      <sec id="sec-3-1">
        <title>1. For each topic t:</title>
        <p>(a) Draw θt ∼ Dir(β)</p>
      </sec>
      <sec id="sec-3-2">
        <title>2. For each document d:</title>
        <p>(a) Draw φd ∼ Dir(α)
(b) For each chunk c(d):
i. For each word i in c(d):</p>
        <sec id="sec-3-2-1">
          <title>A. Draw zi ∼ M ult(φd)</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>B. Draw wi ∼ M ult(θti )</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>C. Draw tc ∼ Beta(ψzi )</title>
          <p>D. Draw lc ∼ N (η′zcˆ(d), ρ2)
Assuming a number of topics we draw for each of them a
Multinominal distribution over words in this topic from a
Dirichlet distribution Dir(β) with metaparameter β. For
each document we draw a Multinominal distribution of
the topics in this document from a Dirichlet distribution
Dir(α) with metaparameter α. For each word in the
document we draw a topic with respect to the topic distribution
in the document and a word based on the word distribution
for the drawn topic. Additionally, we draw a time stamp
ti ∼ Beta(ψzi ) with ψzi = (a, b) the shape parameters of
the Beta distribution and the location li ∼ N (η′zcˆ(d), ρ2)
with zcˆ(d)′ the empirical topic frequencies for document
d. The shape parameters ψ are estimated by the method
of moments. For each topic z we estimate the mean mˆ
and sample variance s</p>
          <p>2 of all time stamps from the
docs
uments that have been assigned this topic.
mˆ · (
mˆ ·(12−mˆ ) − 1) and b = (1 − mˆ) · (
s</p>
          <p>We set a =
mˆ ·(12−mˆ ) − 1)
− 21ρ Pd (yd − η′zˆd)2 − − 21σ Pk ηk2
for each topic.</p>
          <p>Finally, for the Normal distribution, η
is estimated via EM</p>
          <p>methods, that minimize the
likelihood during the estimation of the topic model: L(η) =
Integrating the time stamp as Beta distributed random
variable and the geographic location as Normal distributed
random variable, we get for the probability of a topic zi, given
a word w in a chunk c(d) with time stamp t and location l
and all other topic assignments:
p(zi|w, t, l, z1, · · · zi−1, zi+1, · · · zT )
∝</p>
          <p>Nw,zi − 1 + β
Nzi − 1 + W · β</p>
          <p>· (Nd,zi + α) ·</p>
          <p>Beta(a, b)
(1 − tc(d))a−1 · tbc−(d1) · exp(−
2ρ
)
(1)</p>
          <p>Using this, we can estimate the topic model via Gibbs
samConference on Neural Information Processing Systems,
2007, pp. 121–128, 2007.
1022, March 2003. ISSN 1532-4435.</p>
          <p>Hasan, Samiul and Ukkusuri, Satish V. Urban activity
pattern classification using topic models from online
geolocation data. Transportation Research Part C:
Emerg090X. doi: http://dx.doi.org/10.1016/j.trc.2014.04.003.
ing geographical topics in the twitter stream. In
Proceedings of the 21st International Conference on World
USA, 2012. ACM.
10.1145/2187836.2187940.
Joint modeling of user’s activity area and interests for
location recommendation. In Proceedings of the Sixth
ACM International Conference on Web Search and Data
Mining, WSDM ’13, pp. 375–384, New York, NY, USA,
2013. ACM. ISBN 978-1-4503-1869-3. doi: 10.1145/
2433396.2433444.
K. Connecting Language and Geography with
RegionA non-markov continuous-time model of topical trends.
In Proceedings of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining,
1150450.
ery and comparison. In Proceedings of the 20th
International Conference on World Wide Web, WWW ’11, pp.
247–256, New York, NY, USA, 2011. ACM. ISBN
9781-4503-0632-4. doi: 10.1145/1963405.1963443.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>David M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>McAuliffe</surname>
            ,
            <given-names>Jon D.</given-names>
          </string-name>
          <article-title>Supervised topic models</article-title>
          .
          <source>In Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>