Visualisation of User-Generated Event
Information: Towards Geospatial Situation
Awareness Using Hierarchical Granularity Levels
Heidelinde Hobel1 , Lisa Madlberger2 , Andreas Thöni3 , and Stefan Fenz4
1
Vienna University of Technology,
Doctoral College Environmental Informatics, Austria
2
Vienna University of Technology,
Vienna Phd School of Informatics, Austria
3
Vienna University of Technology,
Institute of Software Technology and Interactive Systems, Austria
4
SBA Research, Vienna, Austria
hobel@geoinfo.tuwien.ac.at, lisa.madlberger@tuwien.ac.at,
andreas.thoeni@tuwien.ac.at, sfenz@sba-research.org
Abstract. In recent years, enterprises and emergency response teams
have started to use user-generated content to monitor crises, events and
trends. Especially in critical situations, decision makers must, above all,
quickly assess huge amounts of data. E↵ective geographical visualization
and aggregation of collected data is an important prerequisite to enable
decision makers to infer the impact of a detected event on, for example,
their supply chains and other physical establishments. However, in ex-
isting literature the aspect of geographical visualization of automatically
analysed events is hardly addressed. In this paper, we propose to intro-
duce hierarchical levels of detail, a concept from Geographic Information
Systems, for the visualization of user-generated data describing a local
event. We developed a tool which can improve the assessment of regional
impacts by o↵ering the possibility to browse and visualize results on lay-
ers aggregating data along individually defined hierarchical dimensions,
e.g. geographical or political districts.
1 Introduction
Social networks, RSS feeds, and platforms for microblogging are becoming more
and more important for users to share feelings, experiences and to report about
recent events at any time. As a result, huge amounts of data are created, which
are often publicly available. This raw information can be used by enterprises
as well as governments to rapidly learn about the latest events and to mon-
itor the public opinion about certain topics. However, processing these huge
amounts of information is a challenging task and, in case of a crisis, the ac-
cessible information must be quickly assessed to be useful for decision-making.
Globalization led to an increase in the multinational footprint of enterprises and
their supply chains. Consequently, the critical infrastructure is spread over large
2
44 Heidelinde Hobel, Lisa Madlberger, Andreas Thöni, Stefan Fenz
and disconnected territories. Knowing geographical references is a key input for
decision-making processes in such an environment.
Therefore, the assessment of the collected information needs to be linked
with geospatial information about regions and points of interest. Microblogs and
news feeds are often enriched with temporal and geospatial data, either explicitly
provided by tags in the meta-data or implicitly in the messages’ content itself.
However, this information inherits the intrinsic properties of user-generated data
and is therefore likely to be incomplete, incorrect, and imprecise. Furthermore,
it is a non-trival task to monitor large regions of interest while still quickly
assessing the impact of a detected event with globally dispersed points of interest.
Therefore, knowing the geographical reference area of a feed can be an important
starting point.
The goal of our research is, therefore, to improve geospatial assessment of
events reported in microblogs by using hierarchical levels for the analysis of
important events threatening the infrastructure of enterprises and visualization
of results by browsing through these layers. While the detection of the geo-
graphic origin of an incident is often determined by the measurement of bursts,
e.g., [7], we focus on monitoring regions of interest, assessing the impact of de-
tected events, and providing better user-support for decision-making. To this
end, we developed a tool for the assessment of collected microblogs at hierar-
chical geospatial levels while considering relevance factors that are assigned to
individual microblog messages. The contributions of this paper are as follows:
– We developed a tool aimed at supporting enterprises and emergency response
workers in geospatial assessment of incidents by using hierarchical levels.
– We discuss architectural decisions, implementation details, and our semantic
model for the analysis of microblogs.
It is important to note that any monitoring-approach relying on user-generated
web data is restricted to situations where both users as well as decision mak-
ers are still able to access communication infrastructure. Moreover, it is limited
by the extent of data being augmented with geospatial information, either by
explicit tags (e.g. GPS-tags) or implicitly in the content. Considering for exam-
ple the microblog platform Twitter5 , around 2 percent of microblogs had been
GPS-tagged in 2012. Given a baseline of around 400 million entries per day,
the amount tagged was still significant. A fulltext geocoder including additional
fields such as the location could even reference around 28% of the entries [6].
Throughout the paper we are considering exemplary User Generated Text
Content (UGTC), which includes microblogs, RSS feeds and content from so-
cial media platforms. We focus on the generic aspects when using UGTCs in
critical response systems - an implementation in a certain domain using a spe-
cific provider always requires the consideration and adherence to the specific
applicable data protection laws, privacy terms, and terms of use of the provider.
The remainder of this paper is structured as follows: In Section 2, we present
related work on the use of microblogs for event-detection and in Section 3, we
5
http://www.twitter.com
Geospatial Situation Awareness Using Hierarchical Granularity Levels 3
45
discuss the opportunities to retrieve geospatial information from microblogs and
RSS feeds. We present the visualization approach in Section 4, and describe our
architecture and implementation in Section 5. In Section 6, we conclude our work
and o↵er an outlook on future work.
2 Related Work
The huge amounts of publicly available user-generated data motivated research
in various areas. Several studies focus on detecting events in microblog data -
one of the earliest of this kind was developed by Sakaki et al. [11], who developed
a system to reliably detect earthquakes in Japan exclusively based on Twitter
data. They do not particularly address the aspect of visualization in their study,
however in a provided screenshot they use coloured pins to depict individual
tweets on related to earthquakes on a map. The problem with pins is that mul-
tiple instances at the same location cannot be visually distinguished from single
occurrences. The same applies for Sadilek et al. [10] who use microblog data
to predict disease transmission and used pins to visualize geographic locations
of users, but again, visualization was not a core aspect of their work. However
in both cases, the possibility to aggregate and view results on higher hierar-
chical levels, e.g. for each district might enable a better overview and lead to
additional insights. Chunara et al. [3] use news and microblog data to visualize
disease outbreaks on a health map. They present alerts, derived from Tweets
on a heatmap, which indicates high and low-density of relevant messages both
on a detailed level and aggregated on up to two hierarchical levels according
to administrative districts. While this provides a good example for the use of
geospatial hierarchies for data-visualization, our approach aims for a general
solution allowing for hierarchical aggregations along multiple dimensions, e.g.
administrative but also according to geographical or political attributes.
Additionally, several e↵orts have been made to detect events independently
of a specified domain [1, 2, 5, 7]. These methods typically extract events based
on the detection of high occurrences of words. While in most of these studies
temporal and geospatial properties of the detected events are extracted, only
little attention has been paid yet to the geographic representation of events.
One of the few studies explicitly devoting attention to visualization aspects
was conducted in [7], where the authors validated their map-based visualization
approach in a user study. As one of their results they found, that for an intu-
itive user experience the additional possibility to zoom in and out of visualized
data as well as the aggregation of mapped results would be required. Rosi et
al. [9] also point out the need for better visualization techniques and tools to
view and understand data at multiple levels of granularity. Pouliquen et al. [8]
geocoded news items and experimented with di↵erent visualization options. They
suggested representing news stories as points on a map leveraging WorldKit6 or
used placeholders in GoogleEarth7 with icons representing the frequency of news
6
brainoff.com/worldkit/
7
earth.google.com
46 Heidelinde Hobel, Lisa Madlberger, Andreas Thöni, Stefan Fenz
items found referencing a specific place. In the later, they relied on the zooming
features naturally provided by GoogleEarth. Furthermore, they experimented
with Scalable Vector Graphics (SVGs), but relied on only one country level.
In our study, we want to address this gap by proposing a method to imple-
ment hierarchical geospatial layers, which allow for di↵erent aggregation levels
during event-detection, visualization and assessment. We use UGTCs such as
microblogs and RSS feeds to illustrate our approach.
3 Inferring Geospatial Attributes from UGTC
Multiple ways to infer geospatial information are applicable to Microblogs as
well as to RSS feeds. By using microblogging platforms users can often decide
whether the exact location (identified by GPS-information), the place (such as
the city or neighbourhood) or no location information is attached to a message.
Additionally to these location-tagged microblogs, geographic information could
be obtained from the user’s profile if available and accessible at a platform. The
user’s profile may include a location field, the time zone and may include further
location information in the profile description or on a linked personal website.
Eventually, information can also be extracted from the message text, which
might relate to a certain event or directly to an area, territory, or jurisdiction.
Using profile, user location and geo tags as input and again taking Twitter as
an example for a microblogging platform, Leetaru et al. [6] reported a share of
34% of microblogs mappable at a correlation level of 72% against a baseline.
Considering standard RSS feeds, location information can be obtained anal-
ogously from the content or the author’s information. Since RSS feeds are linked
to more comprehensive blogs or news articles, more information about where
an event occurred could be provided. Moreover, the W3C GeoRSS standard
is designed to explicitly provide information about the geographic location a
post relates to in form of geographical points, lines and polygons, which can be
automatically processed by geographic software. In this case, the location infor-
mation is certainly more accurate since it is explicitly annotated and aims at
describing the geospatial features of a report. However, even when having only
full-text with geographic information available, accuracy rates of 77% have been
achieved [8].
Locations inferred from user-generated data can be obtained by di↵erent
methods. However, it must be distinguished between reports about events or in-
cidents from the place where the event or incident occurred and reports in which
the event or incident are discussed. While certainly both of categories are impor-
tant, we require schemas to reflect these geospatial dependencies. Furthermore,
location information is often ambiguous or may be incorrect. As a consequence,
we have to account for the possibility to allow users to correct errors and infer
the geographic dependencies between analysed topics.
Geospatial Situation Awareness Using Hierarchical Granularity Levels 5
47
4 Geospatial Visualization of UGTCs
The goal of our tool is to improve geospatial assessment of events reported in
UGTCs by using hierarchical levels for the analysis of important events threat-
ening the infrastructure of enterprises and the visualization of results by brows-
ing through these layers. These layers are diversely designed according to the
needs of the domain, which is encompasses the enterprises’ requirements and the
(global) dependencies of its supply chain infrastructure. For example, comparing
the states of the US with the countries of the European Economic Area requires
to set these areas on the same hierarchical level. This might be interesting when
enterprises are planning or evaluating establishments of their infrastructure in
these regions of interest. Furthermore, the user has to decide which regions and
layers are of importance in the analysis. For example, in an industrial use case, an
enterprise might concentrate on geographic regions where critical infrastructure
or suppliers are located. Measuring the influence of earthquakes might motivate
the user to define the center of the earthquake as central point and then to de-
fine concentric circles as hierarchical levels around the earthquake’s epicentre.
For our implementation, we created a hierarchical structure according to the
United Nations Statistics Division8 . According to this website, the geographi-
cal regions and compositions are structured in the following hierarchy: World 7!
continental regions 7! geographic sub-regions (e.g., Eastern Africa) 7! Countries.
We extended the structure with Countries 7! country-specific Regions.
In the following two scenarios, we will motivate the usage of hierarchical
layers which facilitate the browsing of data in two conceptually contrary models:
top-down and bottom-up.
Top-down assessment. In the top-down approach, users are monitoring the high-
est level of interest. This visualization approach is aimed at providing a holistic
overview of gathered information and allowing to zoom into the defined levels of
interest, where each area reveals its own and from sub-areas inherited UGTCs.
By using the top-down approach, the user can compare regions of interest glob-
ally (i.e., at the highest abstraction level) with other areas at this level. Our
tool then provides the possibility to seamlessly start zooming in to explore more
specific areas in more detail. For instance, this visualization scenario is suitable
when an enterprise is planning new establishments for critical infrastructure or
to monitor geographically large areas. In case of an emergency or crisis this ap-
proach allows to zoom into the relevant local areas, determine the geospatial
impact and to plan appropriate measures.
Bottom-up assessment. The bottom-up approach is useful for monitoring specific
geographic locations and to access the impact as as soon as an event is detected.
Users can also zoom out of the region in order to browse the UGTCs in the his-
tory of upper-levels. The predefined hierarchical levels allow to systematically
analyze the global impact of an event. For instance, in case of a detected earth-
quake an approximation of the epicenter is calculated and the user’s definition
8
https://unstats.un.org/unsd/methods/m49/m49regin.htm
6
48 Heidelinde Hobel, Lisa Madlberger, Andreas Thöni, Stefan Fenz
of the radii of the concentric circles around the epicenter are used to infer the
impact of the earthquake.
The following sections explain the features of our tool based on the Visual
Analytics Mantra “Analyse First – Show the Important – Zoom, Filter and Anal-
yse Further – Details on Demand” [4]. We shortly describe how the visualization
tool can be used to browse through the analyzed data (Show the Important –
Zoom Filter and Analyse Further), and how details of analyzed UGTCs could
be accessed. The setup of the system and the processing of UGTCs is explained
in Section 5.
4.1 Interactive Maps for Visualization
We use Leaflet9 , which is an open source javascript library for interactive maps,
for the visualization of results using the defined hierarchical layers. Figure 1
shows the main interface for the analysis of collected UGTCs.
Fig. 1. Using Leaflet and a hierarchical tree to visualize the geographic correlation of
UGTCs on interactive maps. (See for tools and data: Leaflet, OpenStreetMap, and
NaturalEarth)
On the top of the tool, the user can choose the appropriate level for the
analysis. In the example presented in Figure 1, the user is analyzing the sec-
ondary level, which comprises the countries China and Turkey. Furthermore,
the user has opened the category China in the tree on the right side, which
centers China’s geometry on the map. China’s category shows two identified
9
http://leafletjs.com
Geospatial Situation Awareness Using Hierarchical Granularity Levels 7
49
UGTCs, which where classified into one of China’s provinces, i.e., Gansu. How
many UGTCs where detected in an area is displayed in brackets beneath after
the name of the region of interest. Zooming into one of the provinces of China
could be done by either opening the respective category or by double clicking
onto the specific area on the map. In the case that a high number of UGTCs re-
garding defined topics, e.g. crisis, bomb, etc., for an area is detected, the specific
area and every superordinate area is colored red. To specify “high”, users may
define a threshold value of number of UGTCs detected. Moreover, in order to
minimize the e↵ort to assess single UGTCs, only the most relevant UGTCs (see
Section 5) are presented to the user. The time-frame, which could be used to
retrieve relevant UGTCs can be manually adjusted.
When zooming into an area of the lowest level of the defined layers, the vector
overlay will be transparent and UGTCs that have a location tag are shown on
the map as markers, as shown in Figure 2.
Fig. 2. Highlighting of detected UGTCs with location tags in the lowest hierarchical
level. (See for tools and data: Leaflet, OpenStreetMap, and NaturalEarth)
4.2 Details on Demand
In our first prototype, we use timestamps of retrieval and the actual timestamp of
exemplary messages, the content of exemplary messages, topical and geospatial
tags, as well as information about the preprocessed information of the UGTC.
Our system pre-classifies UGTCs based on a keyword analysis and assigns an
indicator which is reflecting the relevance of each UGTC, as explained in Section
5. However, the classification of text content is a non-trivial task and a complete
accuracy is almost not possible, therefore users should be able to manually assess
detected incidents and correct possible misclassification. Hence, our interactive
interface allows to open detail-sites when clicking onto a link of a incident that
is displayed in the tree structure on the right side, in which the user can quickly
8
50 Heidelinde Hobel, Lisa Madlberger, Andreas Thöni, Stefan Fenz
assess the relevance of a message, edit its tags, and link further online resources.
Semantic annotations allow to explore further details by following the links.
5 Implementation and Architecture
In this section, we present our framework for geospatial assessment of UGTCs.
For our first prototype, we implemented a Web application for the processing of
collected microblogs and visualization of results, its architecture is illustrated in
Figure 3.
Fig. 3. Architecture
In the following, we will showcase the process using an exemplary fictive
microblog message: “The earth is shaking - earthquake in Gansu”.
UGTC Processor Our system is designed that it can be connected to various
data sources, including microblogs and RSS feeds. The UGTC processor is de-
signed to collect UGTCs from online sources or import datasets to search based
on topical and geospatial keywords for relevant UGTCs. At the time this paper
was written, we have tested our system based on an imported historic dataset.
The keywords should be initially defined according to the domain of the sys-
tem in order to restrict the input of UGTCs to the system. Enterprises with
globally dispersed critical infrastructure could define keywords for the names of
establishments and critical facilities (i.e. topical keywords) as well as all related
names of areas corresponding to the facilities (i.e. geospatial keywords). In more
generic use cases, the users should define keywords such as crisis, earthquake
and thunderstorm. In our example message, we detected the following keywords:
earthquake and Gansu.
Geospatial Processor. Geospatial keywords and hierarchical dependencies can
be inferred from the GeoNames API10 . GeoNames provides a huge amount of
geospatial features, however, the location of many areas is often just provided
as a single point, since the boundaries are not yet available for every record.
10
http://www.geonames.org/export/ws-overview.html
Geospatial Situation Awareness Using Hierarchical Granularity Levels 9
51
Furthermore, the hierarchies of queried areas are not customizable and must be
mapped to self defined hierarchies to allow customized comparisons.
The hierarchical geospatial structure for the prototype is based on continen-
tal and administrative boundaries, where we used the following taxonomy as
mentioned in Section 4: World 7! continental regions 7! geographic sub-regions
(e.g., Eastern Africa) 7! Countries 7! country-specific regions. We identified
the datasets provided by NaturalEarth11 as the most appropriate dataset for
the visualization, since the data layers are preprocessed and provide consistent
geographic shapes which lineworks are independent from other shapes, e.g. coun-
tries that share one line as a boundary. For countries and their regions we used
the large scale data (1:10m) and directly exported the vector data in GeoJSON
format by using the open source tool Quantum GIS12 . For the world-wide and
continental layers, and geographic sub-regions, we started from the dataset for
administrative level one and merged the country-specific features into the ap-
propriate structure. For each of the extracted areas, we added GeoJSON prop-
erties to designate the hierarchical type of the layer and the part-of relation
of the respective area. Each layer and feature is enriched with the geospatial
features, names and alternative names retrieved from GeoNames. The hierarchi-
cal geospatial database for our first prototype implementation is based upon a
NoSQL database to store vectorial features in GeoJSON13 format.
Each UGTC that is passed on from the UGTC Processor is processed ac-
cording to the hierarchical information stored in the databases. Our first pro-
totype supports geospatial keyword analysis for the content of a UGTC as well
as geospatial queries such as “is this point in this polygon” for possible location
metadata information of the UGTC. If the UGTC is classified based on the lo-
cation information, then it is assigned to the feature of the lowest level of the
hierarchical layers. If the UGTC contains a geographical keyword of the layers,
then it is assigned to the specific feature. Since the fictional microblog message
encompasses no explicit location tag, it is assigned to the feature that relates to
“Gansu”.
Mapping and Enrichment of Microblogs. Once a UGTC is detected from the
geospatial processor and preclassified according to the identified feature, it is
mapped to our RDF schema and stored in a triple store. To allow queries and
aggregation functions on the stored set of UGTCs, we map each UGTC into
a semantic model. We link the UGTC to the geospatial feature of interest by
using a feature tag. The term location tag is used to refer to explicit GPS
information in the meta data of the UGTC if available. Identified topical and
geospatial keywords are annotated as tags, as well as location and temporal tags
of UGTC are added as annotations. For the annotation we used the geonames14
and dcterms15 vocabularies. The following triples show an excerpt of the tags
11
http://www.naturalearthdata.com/
12
http://www.qgis.org/en/site/
13
http://geojson.org/
14
http://www.geonames.org/ontology/documentation.html
15
http://dublincore.org/documents/dcmi-terms
10
52 Heidelinde Hobel, Lisa Madlberger, Andreas Thöni, Stefan Fenz
used to annotate our exemplary microblog (we simplified it for presentation by
linking geonames’ RDF resource for Gansu and one entry for dcterms).
@prefix geovis: .
"2014-04-23 2:11:02" ;
geovis:relatesToGeoNames