<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>When a Tweet Finds its Place: Fine-Grained Tweet Geolocalisation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pavlos Paraskevopoulos</string-name>
          <email>p.paraskevopoulos@unitn.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Pellegrini</string-name>
          <email>giovanni.pellegrini@studenti.unitn.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Themis Palpanas</string-name>
          <email>themis@mi.parisdescartes.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Paris Descartes University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trento Telecom Italia - SKIL</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The recent rise in the use of social networks has resulted in an abundance of information on di erent aspects of everyday social activities that is available online. In the process of analysis of the information originating from social networks, and especially Twitter, an important aspect is that of the geographic coordinates, i.e., geolocalisation, of the relevant information. This information is used by a variety of applications for the better understanding of an urban area, the tracking of the way a virus spreads, the identi cation of people that need help in case of a disaster (e.g., an earthquake), or just for the better understanding of the dynamics of a major event (e.g., a concert). However, only a tiny percentage of the twitter posts are geotagged, which restricts the applicability of location-based applications. In this work, we extend our framework for geolocating tweets that are not geotagged, and describe a general solution for estimating the city and neighborhood in the city, from which a post was generated. In addition, we study the speci c problem of geolocalising tweets deriving from targeted locations of interest (i.e., cities and neighborhoods in these cities), and present the visualizations of the prototype dashboard application we have developed, which can help end-users and large-scale event organizers to better plan and manage their activities. The experimental evaluation with real data demonstrates the e ciency and e ectiveness of our approach.</p>
      </abstract>
      <kwd-group>
        <kwd>geotag</kwd>
        <kwd>geolocation</kwd>
        <kwd>Twitter</kwd>
        <kwd>social networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        [Motivation:] Events that happen around us a ect our lives to di erent degrees.
The e ects of an event on a community vary depending on the type of the event
and its dynamics. For example, tra c jams a ect the way we move, football
matches and concerts may a ect the normal pace of life in the area of the venue
for a short period of time, while earthquakes and diseases are unpredicted events,
which could cause signi cant problems that have to be addressed fast. Many
entities, public and private, are interested in analyzing the e ects of such events,
in order to better understand and react to them, and lead to a better quality of
life. For example, the identi cation of lack of clean water at a place would lead
the water providers to take special care for resolving the problem. Even though
this would be a manual, labour-intensive, and time-consuming process in the
past (e.g., consider the 1854 cholera outbreak in London [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]), this is no longer
the case.
      </p>
      <p>
        People tend to share their experiences, especially those a ecting their lives
(or feelings). Social networks, such as Twitter [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Facebook [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Google+ [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
give users the opportunity to express themselves and report details about their
everyday social activities. The combination of this behavior with the widespread
use of mobile smart-phones and tablets has allowed users to report their activities
in real time, adding reports from several di erent locations (not just from their
homes, or workplaces). Consequently, we now have access to datasets containing
detailed information of social activities. To that e ect, several studies [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ],
including applications [
        <xref ref-type="bibr" rid="ref11 ref14 ref20 ref30 ref32 ref35 ref6 ref7">20, 32, 14, 7, 11, 6, 30, 35</xref>
        ] and techniques [
        <xref ref-type="bibr" rid="ref23 ref28 ref31 ref33">33, 28, 23, 31</xref>
        ] have
been developed that analyze datasets created through the use of social networks,
tracking crowd movements and identifying needs, in order to provide bene ts to
end users, businesses, civil authorities and scientists alike.
      </p>
      <p>
        It is interesting to note that several of these applications depend on the
knowledge of the user location at the time of the posting. This knowledge is
necessary for applications that target to characterize an urban landscape, or to
optimize urban planning [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], to monitor and track mobility and tra c [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and
to identify and report natural disasters, such as earthquakes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For example,
in the case of earthquakes knowing the exact location of a tweet can provide
actionable insights to emergency-response workers (extent of damages, or number
of victims at speci c locations, etc.) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Such applications, which represent an
increasingly wide range of domains, are restricted to the use of geotagged data4,
that is, posts in social networks containing the geographic coordinates of the
user at the time of posting.
      </p>
      <p>
        Evidently, the availability of geotagged data, determines not only the
possibility to use such applications, but also their quality-performance characteristics:
the more geotagged data posts are available, the better the quality of the results
will be (more precisely: the higher the probability for being able to produce
better quality results). Nevertheless, the availability of geotagged data is rather
limited. In Twitter, which is the focus of our study, the number of geotagged
tweets is a mere 1.5-3% of the total number of tweets [
        <xref ref-type="bibr" rid="ref15 ref19 ref21">19, 21, 15</xref>
        ]. As a result,
the amount of useful data for these applications to analyze is small, which in
turn limits the utility of the applications. Even if we considered this subset of
geotagged tweets as representative, \there is a tendency for geotaggers to be
4 For the rest of this paper, we will use the terms geotagged and geolocalised
interchangeably.
slightly older than non-geotaggers" [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], which may lead to non-representative,
or skewed results.
      </p>
      <p>
        [Proposed Approach and Contributions:] In this study, we address this
problem by extending our framework [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] for geolocalising tweets that are
nongeotagged. Even though previous works have recognized the importance and
have studied this problem [
        <xref ref-type="bibr" rid="ref18 ref9">9, 18</xref>
        ] (for a comprehensive discussion of this problem
refer to [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]), their goal was to produce a coarse-grained estimate (i.e., postal
zipcodes, cities, or geographical areas larger than cities) of the location of a set
of non-geotagged tweets (e.g., those originating from a single user).
      </p>
      <p>
        In contrast, in our previous work [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], we examined this problem at a much
ner granularity, thus enabling a new range of applications that require detailed
geolocalised data. More speci cally, our solution provides location estimates for
individual tweets, at the level of a city neighborhood given the city (or the city,
given the country). That is, we focused on the identi cation of the location, where
the location belonged to a set of candidate locations. This solution exploits the
similarities in the content between an individual tweet and a set of geotagged
tweets, as well as their time-evolution characteristics.
      </p>
      <p>In this work, we extend our previous solution, and describe a general
technique for estimating the location from which a post was generated using a
twostage process: we rst determine the city, and then the neighborhood in the city,
by building content-based models and analyzing the volume of posts over time,
independently for each one of these two levels. Using this set up, we are able to
e ectively predict the location of a post form the Twitter stream, when the only
input we have is the actual content of the post and its timestamp.</p>
      <p>In addition, we study the speci c problem of geolocalising tweets deriving
from targeted locations of interest, that is, neighborhoods of a particular cultural,
social, or touristic importance (e.g., the Vatican in Rome). Our experiments show
that we can reuse our technique for this case, as well, by adjusting its operation
to this context, where a small number of popular keywords mentioned in the
posts characterize the location.</p>
      <p>Finally, we present the visualizations of the prototype dashboard application
we have developed, which can help end-users and large-scale event organizers to
better plan and manage their activities. These interactive visualizations include
heatmaps for the volume of (geotagged and geolocalised) tweets, where the user
can zoom at di erent levels of granularity, ranging from a country, down to a
city neighborhood, for which the user can also explore the relevant keywords.
Furthermore, we provide visualizations that illustrate in a comprehensive manner
the changes in the volume of posts at di erent locations over time.</p>
      <p>[Paper Organization:] The rest of the document is organized as follows.
In Section 2 we present the related work. Section 3 formalizes the problem,
and Section 4 describes our solution. We present our experimental evaluation in
Section 5, and our prototype dashboard implementation in Section 6. Finally,
we conclude in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Several works have studied the problem of geotagged tweet analysis. Balduini et
al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] studied the movement of people by analyzing geotagged tweets. Some
studies focus on the extraction of local events by analyzing the text in the tweets [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
Abdelhaq et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use both geotagged and non-geotagged tweets for
identifying keywords that best describe events. We note that in all the above studies,
the tweets that are analyzed are already geotagged. In contrast, our focus is on
non-geotagged tweets.
      </p>
      <p>
        The problem of using tweets in order to identify the location of a user, or
the place that an event took place has been studied in the past. The \who,
where, what, when" attributes extracted from a user's pro le can be used to
create spatio-temporal pro les of users, and ultimately lead to identi cation
of mobility patterns [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. Cheng et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] create location pro les based on
idiomatic keywords and unique phrases mentioned in the tweets of users who
have declared those locations as their origins.
      </p>
      <p>
        The similarity between user pro les and location pro les has also been used
in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], where they compute a set of representative keywords for each location,
which allows the algorithm, to compute the probability that a given user comes
from that location. Furthermore, the authors of [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], have evaluated the methods
using as test datasets either geotagged, or non-geotagged tweets, showing that
\a model trained on geotagged data indeed generalizes to non-geotagged data".
      </p>
      <p>
        Two recent approaches that target to geotag unique tweets are presented in
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The main di erence to our approach is that these methods rely
on users that post many tweets in a time interval t, or on data from the user's
pro le. In contrast, we target to geotag tweets even from users that have never
posted before, or do not provide any pro le data (such as their home location).
      </p>
      <p>
        Two studies that target to geotag tweets are presented in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. These
two methods create chains of words that represent a location. The latter study
takes in addition into consideration the location a user has recorded as their home
location. A study that predicts both a user's location and the place a tweet was
generated from is presented in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. In this study, the authors construct language
models by using Bayesian inversion, achieving good results for the country and
state level identi cation tasks. Finally, [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] presented a method for identifying
the geolocation of photos by using the textual annotations of these photos.
      </p>
      <p>
        Even though some of these studies are closely related to our work (e.g., [
        <xref ref-type="bibr" rid="ref18 ref9">9,
18</xref>
        ], we observe that they operate at a very di erent time and space scale. The
pro les they create involve the tweets generated over a long period of time (up
to several months), and the location that has to be estimated is the location of
origin of the user, rather than the location from where a particular tweet was
posted. Moreover, the space granularity used in these studies ranges from postal
zipcodes to areas larger than a city. On the contrary, in our work we predict the
location of individual tweets, at the level of city neighborhoods, and our approach
has been shown to achieve the best results when compared to the state of the
art [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. A recent survey presents methods relevant to location inference [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem Formulation</title>
      <p>The problem we address in this work is the estimation of the geographic location
of individual, non-geotagged posts in social networks.</p>
      <p>Problem 1: Given a set of geotagged posts Ptlj1 ; :::; Ptlji , t1 tj t2, where
li is the location the post was generated from and tj is the time interval during
which the post was generated at, and a set of individual non-geotagged posts
Q1tq ; :::; Qntq , t1 tq t2, we want to identify the location l from which the
post Qitq (1 i n) was generated.</p>
      <p>The timestamps t1 and t2 represent the start and end times, respectively, of
the time interval we are interested in.</p>
      <p>In the context of this work, we concentrate on two-level ne-grained location
predictions: we wish to initially estimate the coarse-grained location of a post
(which is usually as big as a city) and afterwards to estimate the location at
a much ner-grained level such as a city neighborhood (which is usually much
smaller than a postal zipcode). Furthermore, we focus on twitter posts, whose
particular characteristics are the very small size (i.e., up to 140 characters long),
and the heavy use of abbreviations and jargon language.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Proposed Approach</title>
      <p>In this section, we describe our solution to the problem of ne-grained
geolocalisation of non-geotagged tweets. Our method is based on the creation of vectors
describing the Twitter activity, in terms of important keywords, for each
geolocation we have data from, and for the period of time we are interested in.</p>
      <p>
        In this paper we modify the set up of our method, extending this work so
that we can use tweets from the entire stream of a social network (refer to
Algorithm 1). This is an extension of our previous work (details can be found in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]),
where we apply the search in two stages: initially using a coarse granularity (such
as a city), and subsequently, a ne granularity (such as a city neighborhood).
We additionally introduce dynamic data-driven similarity thresholds that get
automatically readjusted, which can lead to high precision (as we demonstrate
in Section 5).
      </p>
      <p>
        [Extraction of Important Keywords and Creation of Location
Vector:] We initialize our method by de ning the Coarse-Grained Locations (CGL)
that we target to identify posts from (i.e., cities), and the time intervals we are
interested in. Afterwards, we get the geotagged posts generated from the CGLs
in our prede ned time intervals, and we group them according to the CGL they
were generated from. We then follow exactly the same steps as described in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]
for extracting the keyword-vector of each location. The only di erence in our
current algorithm is that we do not need to keep the concordance vectors at the
end, since we only use the Tf-Idf vectors.
      </p>
      <p>Having created the CGL vectors, we repeat the procedure for the
FineGrained Locations (F GL), where each F GL is a subset of a CGL (e.g., a city
Algorithm 1 Tweet Geotagging Algorithm
INPUT: A training set of timestamped and geotagged tweets, a timestamped
querytweet (Qt) that is not geotagged, a set of prede ned coarse-grained locations (CGL)
and a set of prede ned ne-grained location (F GL) where CGL F GL
OUTPUT: The most eligible candidate location.
1: kwV ectorQt create vector of Qt keywords and their weights . process
non-geotagged tweet Qt
2: for all i 2 fCGLg do . process training dataset, for all coarse-grained locations
3: for all t 2 ftime intervalsg do . and for all time intervals
4: Docit all tweets in location i at time interval t
5: kwV ectorit create vector of Docit keywords and their weights
6: similarityi V ectorSim(kwV ectorQt ; kwV ectorit )
7: if similarityi &gt; 0 then
8: Add CGL to candidate coarse grained locations (CandCGL)
9: for all j 2 CandCGL do
10: for all F GLi 2 j do
11: for all t 2 ftime intervalsg do . and for all time intervals
12: DocF GLit all tweets in location F GLi at time interval t
13: kwV ectorF GLit create vector of DocF GLit keywords and their weights
similarityF GLi V ectorSim(kwV ectorQt ; kwV ectorF GLit )
T woLevelSimilarityF GLi similarityj similarityF GLi</p>
      <p>argmaxi2F GLsfP robCalc(T woLevelSimilarity) . identify location of
14:
15:
16: location</p>
      <p>tweet Qt
17: return location
neighborhood), extracting the keyword vectors that describe the activity for the
level of the F GLs.</p>
      <p>
        [Similarity Calculation and Best Match Extraction:] We then
calculate the similarity between the keyword vector of Q and the keyword vector
of each one of the CGLs, using the function V ectorSim (as described in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]).
Although up to this point the similarity extraction is the same as in our
basemethod, we note that it is possible to have posts that are not generated from the
prede ned locations. Yet, they could have a small similarity with our candidate
locations, leading the algorithm to wrongly assign them to our candidate CGL.
In order to avoid such cases, we use dynamic data-driven thresholds that allow
us to lter out these posts (we describe this procedure in detail in Section 4.1).
      </p>
      <p>Having extracted the most eligible CGLs, we proceed to the next stage and
check all the F GL that are subsets of an eligible CGL (once again using the
V ectorSim function). Although we have already ltered the CGLs with low
similarity, it is possible to encounter low-similarity scores at the level of F GL,
as well. In order to avoid this, we use an additional data-driven threshold, and we
extract the F GLs that exceed the threshold of the ne-grained granularity. We
then combine the F GL similarity with the CGL similarity, multiplying them and
getting a unique value for each eligible F GL, which we normalize and convert
Algorithm 2 Probability Calculation
1: procedure ProbCalc(similarities between Qt and candidate geolocations
(Geolocs))
2: for all i 2 Geolocs do . Get the probability distribution
3: P robit;Qt PSimSiitm;QQtt
4: SortDescendingP robit;Qt
5: return Geolocs and their P robit;Qt
into a probability. At the end, the algorithm returns the geolocation with the
highest probability (refer to Algorithm 2).</p>
      <p>
        [Similarity Based on Correlation of Activity Time Series:] As
established in our previous study [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], the method that achieves the best results
is the TG-TI-C (Tweet Geotag with the use of Tf-Idf and Correlation). This
method, apart from the content similarity that we presented above, also uses
the correlation factor for the calculation of the similarities.
      </p>
      <p>The correlation factor is a parameter that allows us to exploit the
timeevolution behavior of a location: we initially extract the correlation between
\global" and \local" locations, and afterwards we multiply it to the similarity
extracted between the keywordV ectorlocal and the keywordV ectorQ. in our
current study, we employ this feature, but restrict it only to F GLs, since (as we
explained earlier) we are now dealing with posts that may not correspond to any
of the candidate locations.
4.1</p>
      <sec id="sec-4-1">
        <title>Dynamic Threshold Extraction</title>
        <p>As we have already mentioned, the set up of this method allows us to identify
non-geotagged tweets that are coming from the full stream of a social network. As
a result, posts irrelevant to our candidate locations could still share stopwords,
leading to a (small) similarity to some location. In order to lter out these cases,
we use thresholds on the similarity, both for the CGLs and the F GLs.</p>
        <p>The distribution of the keywords among the candidate locations is di erent
depending on the time intervals we check. Therefore, the signi cant keywords
are going to have di erent weights for each time interval. For example, during
the night we have a few posts, leading to the creation of small dictionaries, where
matching one of the stopwords in these dictionaries would lead to high similarity
between the Q tweet and the candidate location. In this case, the threshold
should be set high. This is not true when we consider the dictionaries created
during the day.</p>
        <p>In order to automatically set a dynamic threshold, we use a small training
dataset (in our case 1 day), keeping the similarities between each Q tweet
and the location that it corresponds to. We initiate our threshold by setting
it to 0. Then, we identify the tweets that are correctly matched to a location,
and we record their similarity. At the end, we compute the mean of all the
similarities, giving us the threshold extracted from the rst day and for the
speci c time intervals. In order to set up the threshold for these time intervals
for the following day, we use the mean of the thresholds used in all previous
days. As a result, the threshold for a given day and time interval is computed as
the mean of the threshold means of the previous days for the same time interval.</p>
        <p>Following the procedure described above, we dynamically update the
threshold: the thresholds are data driven, and the method is parameter-free.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>Experimental Setup</title>
        <p>We performed the experiments on a PC with an Intel Core i7-2600 CPU @
3.40GHz x 8 processor, running Ubuntu 16.04.2 LTS with 8GB RAM. We
implemented our algorithms in Python 2.7, and used the Geoplotlib toolbox for
the visualizations.</p>
        <p>[Datasets:] We use a real dataset containing geotagged posts from Twitter,
generated in Italy between June 1 and June 20, 2016. The coarse-grained
locations that we focus on are the 7 Italian cities with the highest activity, namely
Rome, Milan, Florence, Venice, Naples, Turin and Bologna. The total number of
tweets is 218,572: 23,566 originated from Rome, 18,824 from Milan, 7,840 from
Florence, 4,628 from Venice, 4,071 from Naples, 3,719 from Turin, 2,624 from
Bologna and 153,300 from the rest of the country. As ne-grained locations, we
consider city neighborhoods represented by 1km-side squares. For targeted
locations, we have selected the Vatican (a 1.3km-side square) from Rome, and San
Siro Stadium (a 0.8km-side square) from Milan. The time windows we use have
a duration of 4 hours (which can e ectively capture an important event, as well
as the start and the aftermath of this event), while also keeping the detailed
aggregated information for every 15min time interval.</p>
        <p>
          Instead of using tumbling windows as done in [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], we use sliding windows
that lead to more up-to-date topic models for our locations. We experimented
sliding the window by 1 and by 2 time intervals, getting almost the same results;
thus, we chose to slide our window by 2 time intervals per slide (30-minutes),
which led to faster execution times. Finally, we update the keywordVectors
incrementally at every slide, which means that our method can be used in an
online fashion. In this experiment, we had 1920 15-min timeslots, resulting into
952 window slides.
        </p>
        <p>
          [Algorithms:] We experimentally evaluate the two-level algorithm we
described in Section 4. In order to initialize our thresholds, we run our method
on the rst day, but exclude these results from the evaluation. In all our
experiments, we randomly divided the dataset in 80%training and 20% testing,
repeated each experiment 5 times, and reported the mean values in the results.
We compare the results of our two-level algorithm with the state-of-the-art QL
and KL algorithms [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] (for a more detailed comparison, refer to [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]).
        </p>
        <p>[Evaluation Measures:] We study the e ectiveness of our approach using
the precision and recall measures: P recision = cggTTwweeeettss and Recall = cagTTwweeeettss ,
where cgT weets is the number of the correctly geolocalised tweets, gT weets is
the number of tweets we geolocalised, and aT weets is the number of all tweets
that are originally deriving from our candidate locations. We also report the
balanced F1 measure, F 1 = 2 PPrreecciissiioonn+RReeccaallll .
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Performance with Varying Number of CGLs</title>
        <p>In the rst set of experiments, we study the performance of our method when we
vary the number of CGLs (i.e., cities) between 1 and 7. As estimated location,
we only consider the rst answer given by our algorithm (i.e., @Top1). We note
that the random algorithm had precision less than 0.024% and recall less than
0.12%, with the highest values occurring when using 1 city.</p>
        <p>In Figures 1a-1b, we illustrate the precision and recall when we use 7 CGLs.
The results for 1-6 CGLs are very similar, and omitted for brevity. The F1 for
the cases of 1 and 7 CGLs are compared in Figures 1c and 1d, respectively.
Using our approach, we achieve a precision of up to 89%, and a recall of up to
17%, while the best F1 was 26%. The best precision is achieved when using 60%
of the keywords and a threshold of +20%, while the best recall and F1 for 70%
of the keywords and \no threshold".</p>
        <p>For the comparison to the state-of-the-art presented in Figure 2, we use the
version of our method with threshold +20%. As depicted in the plots, our method
achieves up to 80% precision and 23% recall, while KL only achieves up to 13%
precision and 20% recall.</p>
        <p>We note that as we increase the number of CGLs considered, we would
expect to see a reduction in the precision and recall values, as a result of the
increased search space. When looking at all the detailed results though, we do
not observe this. On the contrary, precision slightly increases as we add CGLs,
demonstrating the robustness of our approach.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Performance for Targeted Locations</title>
        <p>We now evaluate the performance of the proposed approach for targeted
locations of interest. The results for the Vatican and San Siro locations are presented
in Figures 3a-3b, and Figures 3c-3c, respectively.</p>
        <p>The precision for Vatican reaches a maximum of 68% when using either 10%,
or 20% of the keywords and \no threshold", while recall reaches 84% when using
100% of the keywords and \no threshold". Similarly, San Siro achieves a precision
of 49%, and a recall of 54%, for 10%, or 20% of the keywords, and for 100% of
the keywords, respectively, and \no threshold". These numbers correspond to
a pretty high performance, especially when taking into account the very high
recall values.</p>
        <p>We note that in both locations, the precision and recall values are exactly
the same when using 10% and 20% of the keywords, while the precision reduces
suddenly after that. A close look at the dictionaries of the two locations revealed
that the most important keywords are the names of the locations. The small
(a) Precision for 7 CGLs
(b) Recall for 7 CGLs
(c) F1 for 1 CGL (Milan)
(d) F1 for 7 CGLs
dictionary size employed (when using 10-20% of the keywords) is then occupied
by these keywords. As the dictionary size increases, stopwords and noise are
inserted, which have a negative impact on precision.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Dashboard</title>
      <p>We now provide a brief description and sample screenshots from our prototype
dashboard implementation, which uses our tweet geolocalisation solution in order
to help users visualize and better understand geolocalized Twitter activity.</p>
      <p>Figure 4a depicts the geotagged tweets that are generated from Italy in the
form of a heat-map (the more red the color of a location, the more tweets
originate from that location). Due to the bounding box used for extracting the
tweets, we have tweets from places outside of Italy, as well. Nevertheless, this
does not pose a problem, since our approach aims to geotag any tweet in the
public stream, and is language independent. As we can observe from the
heatmap, the Twitter activity is heavy in the big cities (e.g., Milan and Rome), and
low in the country side.</p>
      <p>The user can then use the map to zoom in, for example on the capital of Italy,
Rome. In this case, we observe that the activity around important locations,
such as the Vatican and the Colosseum, is much higher than the activity at the
suburbs of the city (refer to Figure 4b).</p>
      <p>If the user continues to zoom in, they arrive at the detailed city view,
illustrated in Figure 5a. In this view, we can see the square grid structure,
superimposed with a heat-map that represents the distribution of the geolocalised tweets
(the more tweets a location has, the more dense the color is). In addition, this
view allows the user to examine the topic models computed for each one of the
neighborhoods: when the mouse hovers over a square, a window pops up that
lists the most important keywords for that location.</p>
      <p>Another useful analysis tool is the study of how the volume of tweets changes
over time in di erent neighborhoods of a given city. This is depicted in Figure 5b,
where we report the relative percentage increase(/decrease) of the volume of
tweets in the current time interval, when compared to the previous time interval
(i.e., (#NewT weets #OldT weets) ). In case we have data in a square, we put the
#OldT weets
percentage of the di erence with the previous window, otherwise the square
appears without any number. This view can quickly reveal neighborhoods, where
the twitter activity is rapidly increasing(/decreasing), signifying, for example, an
aggregation of people in a speci c location of the city.</p>
      <p>More speci cally, in Figure 5b, we present how the activity changes at Milan
while getting closer to the beginning of the soccer game in the San Siro Stadium.
As the gure shows, the square in which San Siro Stadium is located and it's
(a) Precision and Recall
(b) F1
Fig. 2: Precision, Recall and F1 Comparison for 7 CGRs (@Top1)
(a) Precision
(b) Recall
(c) Precision
(d) Recall
Fig. 3: (Top) Vatican (1.3km-side square / @Top1). (Bottom) San Siro
(0.8kmside square / @Top1).
neighboring square have an increase of the activity, while the majority of the
squares in Milan have a decrease. After further analyzing the activity of the
squares, we identi ed that the square in which the San Siro Stadium is located
has a constant increase of the activity up to 1 hour before the beginning of the
match, while there is no other square with a constant increase. This
representation allows us to depict the e ect that such an event has on the area, and
provides local authorities useful information that can be used to o er better
services to the citizens (e.g., better manage the tra c ows).</p>
      <p>Finally, the user can focus on a speci c location in the city, and view the
volume of tweets over time for that location. Figure 6 illustrates the tweet activity
time series5 for the Vatican and San Siro locations (for the time interval of
5 The labels of the peaks have been inserted manually, but could also been done
automatically by analyzing the news streams.
May 26, 17.00 to June 1, 17.00, during which we had some important events).
These graphs show that these two locations have very di erent characteristics:
Vatican exhibits a relatively low, yet stable activity stream; on the other hand,
San Siro has almost no activity for a large part of the time interval, but includes
activity bursts. Nevertheless, it is interesting to note that, as reported in Figure 3,
our algorithm for tweet geolocalisation performs equally well for both targeted
locations.</p>
      <p>(a) Italy
(b) Rome</p>
      <p>(a) Volume of Geotagged Tweets of (b) Di erence in the Activity of squares
Rome and Representative Keywords around San Siro after one slide (May
for Vatican square (June 2, 10:00- 28, [15:00-19:00] to [15:30-19:30])
14:00).</p>
      <p>Fig. 5: Geotagged Tweets, Keywords and Activity Di erence
(a) Vatican</p>
      <p>(b) SanSiro Stadium
In this work, we present a framework that allows us to geolocalise non-geotagged
tweets. Our two-level framework allows the estimation of the location from which
a post was generated, by exploiting the similarities in the content between this
post and a set of geotagged tweets. Contrary to previous approaches, our
framework provides geolocation estimates at a
ne grain, thus, supporting a range of
applications that require this detailed knowledge and could be used for social
good. The experimental evaluation with real data demonstrates the e ectiveness
of our approach, regardless of the number (as small as 1) and size (as small as
a 700m-side square) of the targeted locations. In order to render the results of
our methods easier to understand, we developed a prototype dashboard that
provides interactive visualizations, assisting end-users and large-scale event
organizers to better plan and manage their activities.</p>
      <sec id="sec-6-1">
        <title>Acknowledgments</title>
        <p>This work was supported by a fellowship from Telecom Italia.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Facebook</surname>
          </string-name>
          ,https://www.facebook.com/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Google</surname>
            <given-names>+</given-names>
          </string-name>
          ,https://plus.google.com
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Twitter</surname>
          </string-name>
          ,https://twitter.com
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Abdelhaq</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sengstock</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gertz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Eventweet:
          <article-title>Online localized event detection from twitter</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>6</volume>
          (
          <issue>12</issue>
          ) (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ajao</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>A survey of location inference techniques on twitter</article-title>
          .
          <source>Journal of Information Science</source>
          <volume>41</volume>
          (
          <issue>6</issue>
          ),
          <volume>855</volume>
          {
          <fpage>864</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Balduini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bocconi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bozzon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Della Valle</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oosterman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsytsarau</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A case study of active, continuous and predictive social media analytics for smart city</article-title>
          .
          <source>In: ISWC Workshop on Semantics for Smarter Cities (S4SC)</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Balduini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>DellAglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Tsytsarau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Palpanas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Confalonieri</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Social listening of city scale events using the streaming linked data framework</article-title>
          .
          <source>In: ISWC</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Castillo</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Big Crisis Data: Social Media in Disasters and Time-Critical Situations</article-title>
          . Cambridge University Press (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>H.w.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eltaher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , J.:
          <article-title>@ phillies tweeting from philly? predicting twitter user locations with spatial word usage</article-title>
          .
          <source>In: ASONAM</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Cheng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caverlee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>You are where you tweet: a content-based approach to geolocating twitter users</article-title>
          .
          <source>In: CIKM</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Crooks</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croitoru</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stefanidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radzikowski</surname>
          </string-name>
          , J.:
          <article-title># earthquake: Twitter as a distributed sensor system</article-title>
          .
          <source>Transactions in GIS 17(1)</source>
          ,
          <volume>124</volume>
          {
          <fpage>147</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Earle</surname>
            ,
            <given-names>P.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowden</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Twitter earthquake detection: earthquake monitoring in a social world</article-title>
          .
          <source>Annals of Geophysics</source>
          <volume>54</volume>
          (
          <issue>6</issue>
          ) (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Eisenstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.A.</given-names>
            ,
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.P.:</surname>
          </string-name>
          <article-title>A latent variable model for geographic lexical variation</article-title>
          .
          <source>In: EMNLP</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Frias-Martinez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soto</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hohwald</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frias-Martinez</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Characterizing urban landscapes using geolocated tweets</article-title>
          . In:
          <string-name>
            <surname>SocialCom-PASSAT</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cook</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Text-based twitter user geolocation prediction</article-title>
          .
          <source>JAIR</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Ikawa</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Enoki</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tatsubori</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Location inference using microblog messages</article-title>
          .
          <source>In: Proceedings of the 21st international conference companion on World Wide Web</source>
          . pp.
          <volume>687</volume>
          {
          <fpage>690</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The Ghost Map: The Story of London's Most Terrifying Epidemic and How it Changed Science, Cities and the Modern World</article-title>
          .
          <source>Riverhead Books</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Kinsella</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murdock</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Hare</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.:</surname>
          </string-name>
          <article-title>I'm eating a sandwich in glasgow: modeling locations with tweets</article-title>
          .
          <source>In: SMUC</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Leetaru</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Padmanabhan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shook</surname>
          </string-name>
          , E.:
          <article-title>Mapping the global twitter heartbeat: The geography of twitter</article-title>
          .
          <source>First Monday</source>
          <volume>18</volume>
          (
          <issue>5</issue>
          ) (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Mathioudakis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koudas</surname>
          </string-name>
          , N.:
          <article-title>Twittermonitor: trend detection over the twitter stream</article-title>
          .
          <source>In: SIGMOD</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Murdock</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Your mileage may vary: on the limits of social media</article-title>
          .
          <source>SIGSPATIAL Special</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Paradesi</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>Geotagging tweets using their content</article-title>
          .
          <source>In: FLAIRS Conference</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Paraskevopoulos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinh</surname>
            ,
            <given-names>T.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dashdorj</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sera</surname>
            <given-names>ni</given-names>
          </string-name>
          , L.:
          <article-title>Identi cation and characterization of human behavior patterns from mobile phone data</article-title>
          .
          <source>NetMob</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Paraskevopoulos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Fine-grained geolocalisation of non-geotagged tweets</article-title>
          .
          <source>In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</source>
          <year>2015</year>
          . pp.
          <volume>105</volume>
          {
          <fpage>112</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Schulz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadjakos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nachtwey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Muhlhauser, M.: A multi-indicator approach for geolocalization of tweets</article-title>
          . In: ICWSM (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Serdyukov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murdock</surname>
          </string-name>
          , V.,
          <string-name>
            <surname>Van</surname>
            <given-names>Zwol</given-names>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          :
          <article-title>Placing ickr photos on a map</article-title>
          .
          <source>In: SIGIR</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Sloan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morgan</surname>
          </string-name>
          , J.:
          <article-title>Who tweets with their location? understanding the relationship between demographic characteristics and the use of geoservices and geotagging on twitter</article-title>
          .
          <source>PloS one</source>
          <volume>10</volume>
          (
          <issue>11</issue>
          ),
          <year>e0142209</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Tsytsarau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amer-Yahia</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
          </string-name>
          , T.:
          <article-title>E cient sentiment correlation for large-scale demographics</article-title>
          .
          <source>In: SIGMOD</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Tsytsarau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Survey on mining subjective data on the web</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Tsytsarau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Nia: System for news impact analytics</article-title>
          .
          <source>KDD Workshop on Interactive Data Exploration and Analytics (IDEA)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Tsytsarau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castellanos</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Dynamics of news events and social media reaction</article-title>
          .
          <source>In: SIGKDD</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Tsytsarau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denecke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Scalable discovery of contradictions on the web</article-title>
          .
          <source>In: WWW</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Tsytsarau</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palpanas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denecke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Scalable detection of sentiment-based contradictions</article-title>
          .
          <source>DiversiWeb</source>
          , WWW (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cong</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thalmann</surname>
            ,
            <given-names>N.M.</given-names>
          </string-name>
          :
          <article-title>Who, where, when and what: discover spatio-temporal topics for twitter users</article-title>
          .
          <source>In: SIGKDD</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Zafarani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <surname>H.</surname>
          </string-name>
          :
          <article-title>Evaluation without ground truth in social media research</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>58</volume>
          (
          <issue>6</issue>
          ),
          <volume>54</volume>
          {
          <fpage>60</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>