=Paper= {{Paper |id=Vol-1696/paper5 |storemode=property |title=Place as Topics: Analysis of Spatial and Temporal Evolution of Topics from Social Networks Data |pdfUrl=https://ceur-ws.org/Vol-1696/paper5.pdf |volume=Vol-1696 |authors=Giovanni Siragusa |dblpUrl=https://dblp.org/rec/conf/lrec/Siragusa16 }} ==Place as Topics: Analysis of Spatial and Temporal Evolution of Topics from Social Networks Data== https://ceur-ws.org/Vol-1696/paper5.pdf
                   Place as topics: analysis of spatial and temporal evolution
                               of topics from social networks data
                                                       Giovanni Siragusa
                                      Department of Computer Science - University of Turin
                                                    Via Pessinetto, 12, Italy
                                                giovanni.siragusa@edu.unito.it

                                                               Abstract
Geography in a commonsense way is about place. Place is a term used to describe the meaning that humans give to a location.
Characterising a location as a place requires a huge amount of time to collect and analyse data. Furthermore, a place definition
associated to a location can become rapidly obsolete. Nowadays, social networks and social media became very popular. People on
social networks act like social sensors, reporting information about society, politics, economics, etc. Thus, many researchers have
focused on the analysis of posts, combining them together with algorithms or extracting their meaning, keywords or users’ interests.
In this paper, I will describe my research project, a visual framework that aims to simplify the process of place definition using topics
generated from the application of Blei et al.’s Latent Dirichlet Allocation (Blei et al., 2003) on geo-referenced social networks data. My
main assumption is that topics allow to capture the sense of place shared by social sensors. The framework will allow users to be not
overwhelmed by the large amount of time and data required to understand and define places.

Keywords: NLU, LDA, topics, places, social networks


                     1.    Introduction                                uses posts to define cities thematic maps through a sub-
                                                                       spatial cluster algorithm called GeoSubClu. In (Sakaki et
In Geography, place is an important concept: it is used to             al., 2010), the authors consider Twitter as social sensor
describe the meaning that humans give to a location when               for detecting large events such as earthquakes or typhoons.
it is used and lived. Cresswell, in his articles (Cresswell,           Cataldi et al., in their work (Cataldi et al., 2013), proposed
2009; Cresswell, 2011), describes a place as a melting-pot             an approach to provide to users the most emerging topics
of 4 elements: location, locale, sense of place and prac-              expressed by the community. Cataldi et al. see users as
tice. Location is a physical point in space with a specific            real-time news sensors. The work proposed in (Allisio et
set of coordinates, e.g., latitude and longitude. It refers to         al., 2013) uses tweets in conjunction with Sentiment Anal-
the ”where” of place. A location can be a city, a city dis-            ysis to capture how much citizen of Italian cities are happy.
trict, a street, a build or even a ship. Locale refers to the          In their work, they also proposed a graphic framework that
way a place ”looks”: the material setting for social rela-             allows the user to apply Sentiment Analysis and to infer
tion, such as streets, shops, buildings and so forth. Sense of         why people are happy (or unhappy) in a city. Furthermore,
place is a nebulous meaning. It includes feelings, emotions            Allisio et al.’s work can be viewed as an analysis of the
and meanings that a place evokes to people. Sense of place             feeling content of places.
can be individual and based on biography (e.g., the place
                                                                       In this paper, I will describe my research project, a visual
where I spent my childhood) or it can be shared. Practice
                                                                       framework that aims to simplify the process of place defi-
represents what people do in place. It can contain historical
                                                                       nition using topics generated from the application of Blei et
practice (e.g., a battlefield), mundane practice (e.g., going
                                                                       al.’s Latent Dirichlet Allocation (Blei et al., 2003) on geo-
to work) or a mixture of both. Sense of place heavily influ-
                                                                       referenced social networks data. My main assumption is
ences practice, and practice is leaded by the sense of place.
                                                                       that topics allow to capture the sense of place shared by so-
Geographers that are intended to define places need to per-            cial sensors. Moreover, word co-occurrences in topics can
form a sequence of steps. First, they must define the loca-            be used to infer further information regarding places. In
tion to study, then they must collect a set of observations            details, the framework has a threefold impact:
regarding the place: feelings, emotions, meanings, practice
and so forth. Finally, they must analyse all the data to de-
fine the sense of place. Unfortunately, this process requires            1. it will allow to capture the sense of place of a designed
a huge amount of time and the definition produced can be                    location and how it is geographical distributed over the
obsolete due to the dynamic nature of the place itself.                     place;
In the last decades, social network services became very
popular not only for people, but also for scientific commu-              2. it will allow to infer why a place has a specific mean-
nities and practitioners. People on social networks act like                ing, how it is shared and how it evolves over time;
social sensors, reporting information about society, pol-
itics, economics, etc. Thus, many researchers have fo-
cused on the analysis of posts, combining them together                  3. it will allow practitioners (e.g., sociologists or psy-
with algorithms or extracting their meaning, keywords or                    chologists) to apply the LDA model and to generate
users’ interests. The work proposed in (Rizzo et al., 2016)                 plots with a single click.
                  2.    Related Works                             ing to a geographic region, which is modelled by a latent
As previously mentioned in Section 1., social network ser-        variable labelled with r.
vices such as Facebook (www.facebook.com) and Twit-               Recent works have focused on tracking the evolution of
ter (www.twitter.com) became very popular. Users                  topics over time. The framework proposed in (Cui et al.,
use social networks to share their thoughts, pictures or          2011) allows to capture both topics distribution over time
videos. On social networks, a user indicates that he/she          and critical events, such as birth, split, merge or death of
wants to get notified (“follow”) or becomes a friend of an-       topics. Furthermore, the model captures and represents
other user.                                                       word co-occurrences and co-occurrences frequency using
Nowadays, new platform services have emerged. These               threads. First the model defines main words computing a
services have gone beyond information, enabling peo-              set of weights, then it represents co-occurrences through
ple to have a direct link with their neighbours and dis-          the wave bundle of the thread. The amplitude of the wave
cover local businesses or associations. Examples of such          represents the number of co-occurrences between the main
platforms are MyNeighbourhood (www.my-n.eu) and                   word and the other words inside a topic: high amplitudes
Polly&Bob (www.pollyandbob.com). Furthermore,                     represent elevate co-occurrences frequency. In (Wang and
platforms started to use map-based services to push the at-       McCallum, 2006) Wang and McCallum proposed a mod-
tention at problems that have to change in cities. FixMyS-        ified LDA model, called Topics Over Time, where topic
treet (www.fixmystreet.com) allows people to re-                  discovery is influenced both by word co-occurrences and
port, discuss or view local problems. Ushahidi (www.              temporal information. In their work, the authors model the
ushahidi.com), instead, allows users to report or get             time as a continue distribution, defined by a Beta distribu-
notified about what’s happening, where and when.                  tion over a parameter Ψ, associated with each topic which
In last years there exists an increasing trend to geo-            is responsible to generate both patterns and topics distri-
referencing information. Facebook added the possibility to        bution. Lau et al. in (Lau et al., 2012) proposed a novel
geotag posts, while platforms that analyze users’ location,       method to track emerging events in microblogs (e.g., Twit-
geotag and hashtag arise. For example, Trendsmap (www.            ter). Their method defines a window of time slices, where
trendsmap.com) aims to show latest trends from Twit-              each time slice contains several documents, and updates pa-
ter on a map. Differently from these platforms and common         rameters α and β for each old word and document. Novel
social networks, First Life (www.firstlife.org) (An-              words and documents are initialised using two parameters,
tonini et al., 2015) is a social network oriented to the person   α0 and β0 , that are defined a priori.
as the citizen, where information are not rooted on the per-      Another related work, not linked to the LDA model, is
sonal life of users, but on their collective way of living a      (Di Caro et al., 2011). Di Caro et al. proposed a frame-
place. First Life combines different sources of information       work called TMine which defines a navigable tag-flag. A
(posts, blogs, open data, etc), that are geo-referenced, and      tag-flag can be thought as a topic because it contains a set
POIs (Point of Interests), and shows them on an interactive       of related words.
map. Furthermore, data can be associated with a temporal                 3.   Research Questions and Objectives
dimension which is used to filter or to order the data.
                                                                  In this section I describe my research questions and re-
In the context of social networks, Blei et al.’s Latent Dirich-
                                                                  search objectives that will lead to the construction of the
let Allocation (LDA) (Blei et al., 2003) was successfully ap-
                                                                  framework. My objective is to apply the LDA model to
plied. LDA is a generative model that treats a document as a
                                                                  geo-referenced social network data to capture the sense of
finite mixture of topics, where a topic is a distribution over
                                                                  place1 . My assumption is that topics represent how people
words. In details, each topic captures word co-occurrences
                                                                  live a place (Cresswell, 2009; Cresswell, 2011): activities,
inside documents. In the work proposed in (Pennacchiotti
                                                                  emotional attachment to place and so forth. For example,
and Gurumurthy, 2011), authors used LDA to automatically
                                                                  parks can have a sport topic during the afternoon and a con-
discover users’ interests. Users can be represented as a mix-
                                                                  cert topic during the night. Thus, I am interested in spatial
ture of topics, the parameter θ, and these mixtures can be
                                                                  and temporal location of topics. In detail, I will respond to
used to suggest friends or people to follow through the com-
                                                                  three research questions (labelled with RQ):
putation of dissimilarity functions (e.g., Kullback-Leibler
divergence) or cosine similarity. In (Cha and Cho, 2012),         RQ1 Where is a topic spatially located over time? In
Cha and Cho used LDA to analyze the relationship graph               RQ1 I am interested to understand when and where a
of popular social networks. The author’s goal was to clus-           topic emerges and if it can spread in the neighbour-
ter a set of nodes using topics and to label each edge with          ing areas due to social influence: the change in be-
a topic group number, obtaining a model that has a twofold           haviour that one person causes in another. To answer
impact: it can be used to suggest users and infer why a              RQ1, I will study how a topic evolves both tempo-
new user chose to initially follow certain users. Zhang et           rally and spatially. In details, I will develop a mod-
al., in their work (Zhang et al., 2007), proposed a model            ule that associates topics to a location and tracks the
called SSN-LDA (Simple Social Network LDA) to discover               spatial and temporal evolution of topics using dissimi-
communities from social networks. In their model, com-               larity functions (e.g., Kullback-Leibler divergence) or
munities are represented by latent variables. Eisenstein et          cosine similarity. I will study how to track a topic over
al. suppose, in their work (Eisenstein et al., 2010), that pure
topics’ word co-occurrences are corrupted by geographical            1
                                                                       Location and Locale are implicitly defined in the selection of
information. The model assigns words to a topic accord-           the geographical area to study.
     time because it can change its structure from a time        by researchers, practitioners (e.g., sociologists or psychol-
     slice to another. Furthermore, I will try and compare       ogists), data journalists and computational linguists due to
     different LDA models, such as Topic Over Time de-           the huge amount of real human data that it contains. Thus,
     scribed in (Wang and McCallum, 2006), to find the           Twitter was and it is still used to extract information (see
     best model (or models) to extract the sense of place.       (Allisio et al., 2013)) or to test developed application, such
                                                                 as the applications described in (Pennacchiotti and Guru-
RQ2 Which topics are presented in the same space                 murthy, 2011; Cha and Cho, 2012; Eisenstein et al., 2010;
   over time? In RQ2 I am interested to discover how             Lau et al., 2012). Unfortunately, not all information pro-
   people live a place and infer how their way to live a         duced by users are useful. For example, popular users
   place can change over time. To answer RQ2 I will              (users followed by a large number of users), such as artists,
   use the framework developed in RQ1 to analyze the             actor and so forth, can produce noisy information. In my
   correlation between a place and its topics in order to        research project the first step will be to divide popular users
   validate my assumption. Furthermore, the analysis of          from unpopular ones and analyse topics produced by each
   topics in a place in conjunction with their spatial and       group in order to find which ones express the sense of place.
   temporal location analysis will allow to infer further        Defined the user group (popular on unpopular), the second
   information regarding places.                                 step will be to analyze tweets associated with topics that
RQ3 Where are users with same interests geograph-                produced (in the first step) the sense of place. This second
   ically located? In my project I aim to represent how          step will allow to filter the noise in the data, obtaining high
   people live a place using topics, but topics depend also      quality topics. For example, tweets that express the sense
   on people interests (both subjective and emotional).          of place can contain high frequency of certain words or can
   Thus, I assume that users with same specific interests        be highly re-tweeted. Thus I will analyze words frequency,
   would refer to similar places. First I will study how         number of re-tweets and so forth. These filters and their
   to cluster users according to specific topics and how         plots will be integrated in the framework (see Section 5.
   to find group of users that have same specific inter-         for details).
   ests and use a specific language. Then, I will cluster        Differently from Twitter, First Life is a social network fo-
   users and I will analyze where the members of a clus-         cused on the space where a user lives. For this reason, First
   ter are located. Clusters and their shape can be used         Life is the perfect candidate to validate my main assump-
   to improve topic representation inside places: we can         tion. However, First Life has two cons: it is not as popular
   use the clusters to associate topics to specific areas of     as Twitter and it contains only Italian users. For this social
   the place, finding which topic is dominant, how topics        network I will apply the same above-mentioned analysis for
   overlap and how they are distributed. Clusters distri-        Twitter.
   bution inside the place can bring more clues about its
                                                                            5.    Framework Architecture
   meaning.

In Section 2. I described two works that use topics to imple-
ment a user recommendation system: the work described
in (Pennacchiotti and Gurumurthy, 2011) and the work de-
scribed in (Cha and Cho, 2012). Pennacchiotti and Guru-
murthy define a model that capture users’ interest through
topics and compare topics distribution to suggest users; Cha
and Cho, instead, define a model that captures social inter-
action between users through a latent variable which de-
fines a community. Communities can be used to suggest
users to follow to a new user. In my research project, topics
capture users’ interests. Thus, I can suggest to a user places
that have most of the topics in common (sufficient and nec-
essary condition is that the place suggested cannot be the
place where the user dwells).

                        4.    Data
In this section I am going to describe the data I will use to
validate my main assumption, that topics capture the sense
of place expressed by users on social networks. To vali-
date my assumption, I will define two datasets: a dataset
of tweets taken form Twitter using Twitter API, which al-
lows to specify latitude, longitude and a radius in the query,
and a dataset of posts taken from First Life (Antonini et al.,    Figure 1: The figure shows the framework architecture.
2015).
Twitter is a social media where users post their through and     In this section, I present the framework architecture which
get in touch (follow) with other users. It is vastly used        is composed by four layers as showed in Figure 1: a blue
layer which pre-processes documents in input; a violet             ACM SIGIR conference on Research and development in
layer which filters the data; a green layer that applies the       information retrieval, pages 565–574. ACM.
LDA model on cleaned data and a red layer that visualises        Cresswell, T. (2009). Place. In International Encyclope-
topics. I will use json for input documents, allowing users        dia of Human Geography, volume 8, pages 169–177. El-
to use their datasets. Moreover, the json input format will        sevier.
respect a grammar in order to standardise the input.             Cresswell, T. (2011). Place–part i. The Wiley-Blackwell
The first layer (blue), called document pre-processor, deals       companion to human geography, pages 235–244.
with the cleaning of documents. First, it will parse the text    Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z. J.,
to extract Part-Of-Speech (POS) tags; then it will tokenize        Qu, H., and Tong, X. (2011). Textflow: Towards bet-
documents and it will filter stopwords and all words hav-          ter understanding of evolving topics in text. Visualiza-
ing POS tags different from ADJ (Adjective), VERB, Noun            tion and Computer Graphics, IEEE Transactions on,
and X (foreign word). The output of this layer is passed in        17(12):2412–2421.
input to the second layer (violet), called data filter, which    Di Caro, L., Candan, K. S., and Sapino, M. L. (2011). Nav-
is composed by a set of filters that implements operations         igating within news collections using tag-flakes. Journal
described in Section 4. For example, I can filter all tweets       of Visual Languages & Computing, 22(2):120–139.
that have a number of re-tweets lower than a threshold and,      Eisenstein, J., O’Connor, B., Smith, N. A., and Xing, E. P.
then, filter popular users or viceversa. Users can freely          (2010). A latent variable model for geographic lexical
combine filters in sequence and study how topics change            variation. In Proceedings of the 2010 Conference on Em-
according to applied filters. The third layer (green), called      pirical Methods in Natural Language Processing, pages
LDA model, will apply the LDA model on filtered data.              1277–1287. Association for Computational Linguistics.
This layer only needs the number of topics. To simplify          Lau, J. H., Collier, N., and Baldwin, T. (2012). On-line
the choice of the number of topics, I will implement a per-        trend analysis with topic models: #twitter trends detec-
plexity method (associated with a perplexity plot) that will       tion topic model online. In COLING, pages 1519–1534.
require a minimum number of topics, a maximum number             Pennacchiotti, M. and Gurumurthy, S. (2011). Investigat-
of topics and a step. Finally, LDA output will be passed in        ing topic models for social media user recommendation.
input to the last layer (red) called visual framework. This        In Proceedings of the 20th international conference com-
layer will implement all the features described in Section 3.      panion on World wide web, pages 101–102. ACM.
Moreover, the visual framework layer will implement a set        Rizzo, G., Meo, R., Pensa, R. G., Falcone, G., and
of plots that will allow users to infer further information.       Troncy, R. (2016). Shaping city neighborhoods lever-
                                                                   aging crowd sensors. Information Systems.
                    6.    Conclusion                             Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earth-
In this paper I presented my research project: a visual            quake shakes twitter users: real-time event detection by
framework that allows to extract the sense of place from so-       social sensors. In Proceedings of the 19th international
cial networks data using topics generated by Latent Dirich-        conference on World wide web, pages 851–860. ACM.
let Allocation. The main advantage of my framework does          Wang, X. and McCallum, A. (2006). Topics over time:
not only regard the extraction of the sense of place, but also     a non-markov continuous-time model of topical trends.
infer why the place has a specific meaning. Furthermore,           In Proceedings of the 12th ACM SIGKDD international
topics can be used to implement a geographic recommen-             conference on Knowledge discovery and data mining,
dation system, suggesting places to users.                         pages 424–433. ACM.
                                                                 Zhang, H., Qiu, B., Giles, C. L., Foley, H. C., and Yen,
          7.   Bibliographical References                          J. (2007). An lda-based community structure discovery
                                                                   approach for large-scale social networks. ISI, 200.
Allisio, L., Mussa, V., Bosco, C., Patti, V., and Ruffo, G.
  (2013). Felicittà: Visualizing and estimating happiness
  in italian cities from geotagged tweets. In ESSEM@ AI*
  IA, pages 95–106. Citeseer.
Antonini, A., Boella, G., Buccoliero, S., Calafiore, A.,
  Di Caro, L., Giorgino, V., Ruggeri, A., Salaroglio, C.,
  Sanasi, L., and Schifanella, C. (2015). First life, from
  the global village to local communities. In 1st IASC The-
  matic Conference on Urban Commons.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
  dirichlet allocation. the Journal of machine Learning re-
  search, 3:993–1022.
Cataldi, M., Caro, L. D., and Schifanella, C. (2013). Per-
  sonalized emerging topic detection based on a term ag-
  ing model. ACM Transactions on Intelligent Systems and
  Technology (TIST), 5(1):7.
Cha, Y. and Cho, J. (2012). Social-network analysis using
  topic models. In Proceedings of the 35th international