=Paper=
{{Paper
|id=Vol-1696/paper5
|storemode=property
|title=Place as Topics: Analysis of Spatial and Temporal Evolution
of Topics from Social Networks Data
|pdfUrl=https://ceur-ws.org/Vol-1696/paper5.pdf
|volume=Vol-1696
|authors=Giovanni Siragusa
|dblpUrl=https://dblp.org/rec/conf/lrec/Siragusa16
}}
==Place as Topics: Analysis of Spatial and Temporal Evolution
of Topics from Social Networks Data==
Place as topics: analysis of spatial and temporal evolution
of topics from social networks data
Giovanni Siragusa
Department of Computer Science - University of Turin
Via Pessinetto, 12, Italy
giovanni.siragusa@edu.unito.it
Abstract
Geography in a commonsense way is about place. Place is a term used to describe the meaning that humans give to a location.
Characterising a location as a place requires a huge amount of time to collect and analyse data. Furthermore, a place definition
associated to a location can become rapidly obsolete. Nowadays, social networks and social media became very popular. People on
social networks act like social sensors, reporting information about society, politics, economics, etc. Thus, many researchers have
focused on the analysis of posts, combining them together with algorithms or extracting their meaning, keywords or users’ interests.
In this paper, I will describe my research project, a visual framework that aims to simplify the process of place definition using topics
generated from the application of Blei et al.’s Latent Dirichlet Allocation (Blei et al., 2003) on geo-referenced social networks data. My
main assumption is that topics allow to capture the sense of place shared by social sensors. The framework will allow users to be not
overwhelmed by the large amount of time and data required to understand and define places.
Keywords: NLU, LDA, topics, places, social networks
1. Introduction uses posts to define cities thematic maps through a sub-
spatial cluster algorithm called GeoSubClu. In (Sakaki et
In Geography, place is an important concept: it is used to al., 2010), the authors consider Twitter as social sensor
describe the meaning that humans give to a location when for detecting large events such as earthquakes or typhoons.
it is used and lived. Cresswell, in his articles (Cresswell, Cataldi et al., in their work (Cataldi et al., 2013), proposed
2009; Cresswell, 2011), describes a place as a melting-pot an approach to provide to users the most emerging topics
of 4 elements: location, locale, sense of place and prac- expressed by the community. Cataldi et al. see users as
tice. Location is a physical point in space with a specific real-time news sensors. The work proposed in (Allisio et
set of coordinates, e.g., latitude and longitude. It refers to al., 2013) uses tweets in conjunction with Sentiment Anal-
the ”where” of place. A location can be a city, a city dis- ysis to capture how much citizen of Italian cities are happy.
trict, a street, a build or even a ship. Locale refers to the In their work, they also proposed a graphic framework that
way a place ”looks”: the material setting for social rela- allows the user to apply Sentiment Analysis and to infer
tion, such as streets, shops, buildings and so forth. Sense of why people are happy (or unhappy) in a city. Furthermore,
place is a nebulous meaning. It includes feelings, emotions Allisio et al.’s work can be viewed as an analysis of the
and meanings that a place evokes to people. Sense of place feeling content of places.
can be individual and based on biography (e.g., the place
In this paper, I will describe my research project, a visual
where I spent my childhood) or it can be shared. Practice
framework that aims to simplify the process of place defi-
represents what people do in place. It can contain historical
nition using topics generated from the application of Blei et
practice (e.g., a battlefield), mundane practice (e.g., going
al.’s Latent Dirichlet Allocation (Blei et al., 2003) on geo-
to work) or a mixture of both. Sense of place heavily influ-
referenced social networks data. My main assumption is
ences practice, and practice is leaded by the sense of place.
that topics allow to capture the sense of place shared by so-
Geographers that are intended to define places need to per- cial sensors. Moreover, word co-occurrences in topics can
form a sequence of steps. First, they must define the loca- be used to infer further information regarding places. In
tion to study, then they must collect a set of observations details, the framework has a threefold impact:
regarding the place: feelings, emotions, meanings, practice
and so forth. Finally, they must analyse all the data to de-
fine the sense of place. Unfortunately, this process requires 1. it will allow to capture the sense of place of a designed
a huge amount of time and the definition produced can be location and how it is geographical distributed over the
obsolete due to the dynamic nature of the place itself. place;
In the last decades, social network services became very
popular not only for people, but also for scientific commu- 2. it will allow to infer why a place has a specific mean-
nities and practitioners. People on social networks act like ing, how it is shared and how it evolves over time;
social sensors, reporting information about society, pol-
itics, economics, etc. Thus, many researchers have fo-
cused on the analysis of posts, combining them together 3. it will allow practitioners (e.g., sociologists or psy-
with algorithms or extracting their meaning, keywords or chologists) to apply the LDA model and to generate
users’ interests. The work proposed in (Rizzo et al., 2016) plots with a single click.
2. Related Works ing to a geographic region, which is modelled by a latent
As previously mentioned in Section 1., social network ser- variable labelled with r.
vices such as Facebook (www.facebook.com) and Twit- Recent works have focused on tracking the evolution of
ter (www.twitter.com) became very popular. Users topics over time. The framework proposed in (Cui et al.,
use social networks to share their thoughts, pictures or 2011) allows to capture both topics distribution over time
videos. On social networks, a user indicates that he/she and critical events, such as birth, split, merge or death of
wants to get notified (“follow”) or becomes a friend of an- topics. Furthermore, the model captures and represents
other user. word co-occurrences and co-occurrences frequency using
Nowadays, new platform services have emerged. These threads. First the model defines main words computing a
services have gone beyond information, enabling peo- set of weights, then it represents co-occurrences through
ple to have a direct link with their neighbours and dis- the wave bundle of the thread. The amplitude of the wave
cover local businesses or associations. Examples of such represents the number of co-occurrences between the main
platforms are MyNeighbourhood (www.my-n.eu) and word and the other words inside a topic: high amplitudes
Polly&Bob (www.pollyandbob.com). Furthermore, represent elevate co-occurrences frequency. In (Wang and
platforms started to use map-based services to push the at- McCallum, 2006) Wang and McCallum proposed a mod-
tention at problems that have to change in cities. FixMyS- ified LDA model, called Topics Over Time, where topic
treet (www.fixmystreet.com) allows people to re- discovery is influenced both by word co-occurrences and
port, discuss or view local problems. Ushahidi (www. temporal information. In their work, the authors model the
ushahidi.com), instead, allows users to report or get time as a continue distribution, defined by a Beta distribu-
notified about what’s happening, where and when. tion over a parameter Ψ, associated with each topic which
In last years there exists an increasing trend to geo- is responsible to generate both patterns and topics distri-
referencing information. Facebook added the possibility to bution. Lau et al. in (Lau et al., 2012) proposed a novel
geotag posts, while platforms that analyze users’ location, method to track emerging events in microblogs (e.g., Twit-
geotag and hashtag arise. For example, Trendsmap (www. ter). Their method defines a window of time slices, where
trendsmap.com) aims to show latest trends from Twit- each time slice contains several documents, and updates pa-
ter on a map. Differently from these platforms and common rameters α and β for each old word and document. Novel
social networks, First Life (www.firstlife.org) (An- words and documents are initialised using two parameters,
tonini et al., 2015) is a social network oriented to the person α0 and β0 , that are defined a priori.
as the citizen, where information are not rooted on the per- Another related work, not linked to the LDA model, is
sonal life of users, but on their collective way of living a (Di Caro et al., 2011). Di Caro et al. proposed a frame-
place. First Life combines different sources of information work called TMine which defines a navigable tag-flag. A
(posts, blogs, open data, etc), that are geo-referenced, and tag-flag can be thought as a topic because it contains a set
POIs (Point of Interests), and shows them on an interactive of related words.
map. Furthermore, data can be associated with a temporal 3. Research Questions and Objectives
dimension which is used to filter or to order the data.
In this section I describe my research questions and re-
In the context of social networks, Blei et al.’s Latent Dirich-
search objectives that will lead to the construction of the
let Allocation (LDA) (Blei et al., 2003) was successfully ap-
framework. My objective is to apply the LDA model to
plied. LDA is a generative model that treats a document as a
geo-referenced social network data to capture the sense of
finite mixture of topics, where a topic is a distribution over
place1 . My assumption is that topics represent how people
words. In details, each topic captures word co-occurrences
live a place (Cresswell, 2009; Cresswell, 2011): activities,
inside documents. In the work proposed in (Pennacchiotti
emotional attachment to place and so forth. For example,
and Gurumurthy, 2011), authors used LDA to automatically
parks can have a sport topic during the afternoon and a con-
discover users’ interests. Users can be represented as a mix-
cert topic during the night. Thus, I am interested in spatial
ture of topics, the parameter θ, and these mixtures can be
and temporal location of topics. In detail, I will respond to
used to suggest friends or people to follow through the com-
three research questions (labelled with RQ):
putation of dissimilarity functions (e.g., Kullback-Leibler
divergence) or cosine similarity. In (Cha and Cho, 2012), RQ1 Where is a topic spatially located over time? In
Cha and Cho used LDA to analyze the relationship graph RQ1 I am interested to understand when and where a
of popular social networks. The author’s goal was to clus- topic emerges and if it can spread in the neighbour-
ter a set of nodes using topics and to label each edge with ing areas due to social influence: the change in be-
a topic group number, obtaining a model that has a twofold haviour that one person causes in another. To answer
impact: it can be used to suggest users and infer why a RQ1, I will study how a topic evolves both tempo-
new user chose to initially follow certain users. Zhang et rally and spatially. In details, I will develop a mod-
al., in their work (Zhang et al., 2007), proposed a model ule that associates topics to a location and tracks the
called SSN-LDA (Simple Social Network LDA) to discover spatial and temporal evolution of topics using dissimi-
communities from social networks. In their model, com- larity functions (e.g., Kullback-Leibler divergence) or
munities are represented by latent variables. Eisenstein et cosine similarity. I will study how to track a topic over
al. suppose, in their work (Eisenstein et al., 2010), that pure
topics’ word co-occurrences are corrupted by geographical 1
Location and Locale are implicitly defined in the selection of
information. The model assigns words to a topic accord- the geographical area to study.
time because it can change its structure from a time by researchers, practitioners (e.g., sociologists or psychol-
slice to another. Furthermore, I will try and compare ogists), data journalists and computational linguists due to
different LDA models, such as Topic Over Time de- the huge amount of real human data that it contains. Thus,
scribed in (Wang and McCallum, 2006), to find the Twitter was and it is still used to extract information (see
best model (or models) to extract the sense of place. (Allisio et al., 2013)) or to test developed application, such
as the applications described in (Pennacchiotti and Guru-
RQ2 Which topics are presented in the same space murthy, 2011; Cha and Cho, 2012; Eisenstein et al., 2010;
over time? In RQ2 I am interested to discover how Lau et al., 2012). Unfortunately, not all information pro-
people live a place and infer how their way to live a duced by users are useful. For example, popular users
place can change over time. To answer RQ2 I will (users followed by a large number of users), such as artists,
use the framework developed in RQ1 to analyze the actor and so forth, can produce noisy information. In my
correlation between a place and its topics in order to research project the first step will be to divide popular users
validate my assumption. Furthermore, the analysis of from unpopular ones and analyse topics produced by each
topics in a place in conjunction with their spatial and group in order to find which ones express the sense of place.
temporal location analysis will allow to infer further Defined the user group (popular on unpopular), the second
information regarding places. step will be to analyze tweets associated with topics that
RQ3 Where are users with same interests geograph- produced (in the first step) the sense of place. This second
ically located? In my project I aim to represent how step will allow to filter the noise in the data, obtaining high
people live a place using topics, but topics depend also quality topics. For example, tweets that express the sense
on people interests (both subjective and emotional). of place can contain high frequency of certain words or can
Thus, I assume that users with same specific interests be highly re-tweeted. Thus I will analyze words frequency,
would refer to similar places. First I will study how number of re-tweets and so forth. These filters and their
to cluster users according to specific topics and how plots will be integrated in the framework (see Section 5.
to find group of users that have same specific inter- for details).
ests and use a specific language. Then, I will cluster Differently from Twitter, First Life is a social network fo-
users and I will analyze where the members of a clus- cused on the space where a user lives. For this reason, First
ter are located. Clusters and their shape can be used Life is the perfect candidate to validate my main assump-
to improve topic representation inside places: we can tion. However, First Life has two cons: it is not as popular
use the clusters to associate topics to specific areas of as Twitter and it contains only Italian users. For this social
the place, finding which topic is dominant, how topics network I will apply the same above-mentioned analysis for
overlap and how they are distributed. Clusters distri- Twitter.
bution inside the place can bring more clues about its
5. Framework Architecture
meaning.
In Section 2. I described two works that use topics to imple-
ment a user recommendation system: the work described
in (Pennacchiotti and Gurumurthy, 2011) and the work de-
scribed in (Cha and Cho, 2012). Pennacchiotti and Guru-
murthy define a model that capture users’ interest through
topics and compare topics distribution to suggest users; Cha
and Cho, instead, define a model that captures social inter-
action between users through a latent variable which de-
fines a community. Communities can be used to suggest
users to follow to a new user. In my research project, topics
capture users’ interests. Thus, I can suggest to a user places
that have most of the topics in common (sufficient and nec-
essary condition is that the place suggested cannot be the
place where the user dwells).
4. Data
In this section I am going to describe the data I will use to
validate my main assumption, that topics capture the sense
of place expressed by users on social networks. To vali-
date my assumption, I will define two datasets: a dataset
of tweets taken form Twitter using Twitter API, which al-
lows to specify latitude, longitude and a radius in the query,
and a dataset of posts taken from First Life (Antonini et al., Figure 1: The figure shows the framework architecture.
2015).
Twitter is a social media where users post their through and In this section, I present the framework architecture which
get in touch (follow) with other users. It is vastly used is composed by four layers as showed in Figure 1: a blue
layer which pre-processes documents in input; a violet ACM SIGIR conference on Research and development in
layer which filters the data; a green layer that applies the information retrieval, pages 565–574. ACM.
LDA model on cleaned data and a red layer that visualises Cresswell, T. (2009). Place. In International Encyclope-
topics. I will use json for input documents, allowing users dia of Human Geography, volume 8, pages 169–177. El-
to use their datasets. Moreover, the json input format will sevier.
respect a grammar in order to standardise the input. Cresswell, T. (2011). Place–part i. The Wiley-Blackwell
The first layer (blue), called document pre-processor, deals companion to human geography, pages 235–244.
with the cleaning of documents. First, it will parse the text Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z. J.,
to extract Part-Of-Speech (POS) tags; then it will tokenize Qu, H., and Tong, X. (2011). Textflow: Towards bet-
documents and it will filter stopwords and all words hav- ter understanding of evolving topics in text. Visualiza-
ing POS tags different from ADJ (Adjective), VERB, Noun tion and Computer Graphics, IEEE Transactions on,
and X (foreign word). The output of this layer is passed in 17(12):2412–2421.
input to the second layer (violet), called data filter, which Di Caro, L., Candan, K. S., and Sapino, M. L. (2011). Nav-
is composed by a set of filters that implements operations igating within news collections using tag-flakes. Journal
described in Section 4. For example, I can filter all tweets of Visual Languages & Computing, 22(2):120–139.
that have a number of re-tweets lower than a threshold and, Eisenstein, J., O’Connor, B., Smith, N. A., and Xing, E. P.
then, filter popular users or viceversa. Users can freely (2010). A latent variable model for geographic lexical
combine filters in sequence and study how topics change variation. In Proceedings of the 2010 Conference on Em-
according to applied filters. The third layer (green), called pirical Methods in Natural Language Processing, pages
LDA model, will apply the LDA model on filtered data. 1277–1287. Association for Computational Linguistics.
This layer only needs the number of topics. To simplify Lau, J. H., Collier, N., and Baldwin, T. (2012). On-line
the choice of the number of topics, I will implement a per- trend analysis with topic models: #twitter trends detec-
plexity method (associated with a perplexity plot) that will tion topic model online. In COLING, pages 1519–1534.
require a minimum number of topics, a maximum number Pennacchiotti, M. and Gurumurthy, S. (2011). Investigat-
of topics and a step. Finally, LDA output will be passed in ing topic models for social media user recommendation.
input to the last layer (red) called visual framework. This In Proceedings of the 20th international conference com-
layer will implement all the features described in Section 3. panion on World wide web, pages 101–102. ACM.
Moreover, the visual framework layer will implement a set Rizzo, G., Meo, R., Pensa, R. G., Falcone, G., and
of plots that will allow users to infer further information. Troncy, R. (2016). Shaping city neighborhoods lever-
aging crowd sensors. Information Systems.
6. Conclusion Sakaki, T., Okazaki, M., and Matsuo, Y. (2010). Earth-
In this paper I presented my research project: a visual quake shakes twitter users: real-time event detection by
framework that allows to extract the sense of place from so- social sensors. In Proceedings of the 19th international
cial networks data using topics generated by Latent Dirich- conference on World wide web, pages 851–860. ACM.
let Allocation. The main advantage of my framework does Wang, X. and McCallum, A. (2006). Topics over time:
not only regard the extraction of the sense of place, but also a non-markov continuous-time model of topical trends.
infer why the place has a specific meaning. Furthermore, In Proceedings of the 12th ACM SIGKDD international
topics can be used to implement a geographic recommen- conference on Knowledge discovery and data mining,
dation system, suggesting places to users. pages 424–433. ACM.
Zhang, H., Qiu, B., Giles, C. L., Foley, H. C., and Yen,
7. Bibliographical References J. (2007). An lda-based community structure discovery
approach for large-scale social networks. ISI, 200.
Allisio, L., Mussa, V., Bosco, C., Patti, V., and Ruffo, G.
(2013). Felicittà: Visualizing and estimating happiness
in italian cities from geotagged tweets. In ESSEM@ AI*
IA, pages 95–106. Citeseer.
Antonini, A., Boella, G., Buccoliero, S., Calafiore, A.,
Di Caro, L., Giorgino, V., Ruggeri, A., Salaroglio, C.,
Sanasi, L., and Schifanella, C. (2015). First life, from
the global village to local communities. In 1st IASC The-
matic Conference on Urban Commons.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent
dirichlet allocation. the Journal of machine Learning re-
search, 3:993–1022.
Cataldi, M., Caro, L. D., and Schifanella, C. (2013). Per-
sonalized emerging topic detection based on a term ag-
ing model. ACM Transactions on Intelligent Systems and
Technology (TIST), 5(1):7.
Cha, Y. and Cho, J. (2012). Social-network analysis using
topic models. In Proceedings of the 35th international