=Paper=
{{Paper
|id=Vol-2088/paper4
|storemode=property
|title=Understanding Human Activities in Green Areas with Social Media Data
|pdfUrl=https://ceur-ws.org/Vol-2088/paper4.pdf
|volume=Vol-2088
|authors=Vuokko Heikinheimo,Hoda Allahbakhshi,Robert Weibel,Weiming Huang,Ali Mansourian,Lars Harrie,Sebastian Hunger,Azimjon Sayidov,Robert Weibel,Kiran Zahra
|dblpUrl=https://dblp.org/rec/conf/agile/Heikinheimo17
}}
==Understanding Human Activities in Green Areas with Social Media Data==
Understanding human activities in green areas with social media data
Vuokko Heikinheimo
Digital Geography lab,
Department of Geosciences and
Geography, University of Helsinki,
PO Box 64, 00014 University of
Helsinki, Finland
Vuokko.heikinheimo@helsinki.fi
Abstract
Up-to-date information about human-nature interactions are urgently needed to inform sustainable land use planning and nature
conservation. Large amounts of content-rich geographic data are produced continuously by users of different social media platforms across
the globe containing information about the whereabouts and activities of people. Such data, combined with other sources of data, have potential
to provide new and useful information about human presence, activities, observations and movements at different spatial and temporal scales.
Despite many examples in other fields, location-based social media data have not been widely used in nature conservation. This work aims to
understand the potential and biases in geographic social media data in order to inform conservation-related decision making across scales.
Main objectives of the work are to 1) extract meaningful patterns related to human-nature interactions in green areas from location-based
social media, while 2) understanding the biases and limitations of the data. Firstly, the aim is to position location-based social media data
among other sources of user-generated geographic information and to identify the useful elements and limiting factors of using such data in
conservation science. Secondly, the aim is to understand who the data represents in order to derive further information about green area users.
Lastly, user-generated data is combined and contrasted with other data sources to understand the spatial and temporal patterns of human
actions, and potential threats in areas of high conservation value at regional and global scales.
Keywords: Social media data, human-nature interactions, green areas, bias.
1 Introduction platforms (such as Flickr, Instagram and Twitter). Publicly
shared content in different social media services can be often
Understanding patterns of human-nature interactions is crucial accessed in large quantities through Application Programming
for sustainable land use planning and nature conservation Interfaces (APIs).
(Venter et al., 2016). However, spatially and temporally Social media data is used in various fields of science, also
accurate data on threats affecting biodiversity persistence are increasingly in environmental sciences, to explore spatial and
lacking (Joppa et al., 2016), and datasets needed to inform temporal patterns of human activities. However, issues of data
conservation decision-making are limited and often biased (Di quality and data ownership might limit the use and
Minin & Toivonen, 2015). trustworthiness of these data sources in systematic analyses and
Spatial data generated by non-experts have recently become decision-making support (Sui & Goodchild, 2011) and not
a valuable resource both in academia and the society in addition many studies, especially in the context of environmental
to more traditional data produced by scientists and other studies, have aimed to validate observed patterns or to
authorities (See et al., 2016; Goodchild, 2007). Geographic systematically asses evident gaps in the data, for example
information generated by crowds, such as geotagged photos related to spatial coverage of the available data in different
and other location-based social media data provide diverse platforms.
information about human activities across the globe. These This work aims to underpin the potential and limitations of
user-generated data, as opposed to official data sources (such using user-generated geographic information in environmental
as census data, visitor counts and surveys), may provide studies. Furthermore, this work aims to understand how user-
complementary information about the values, observations and generated geographic information can complement traditional
activities of different groups of people especially in regions data sources in the study of human-nature interactions.
where official data is collected rarely. Focusing on natural and semi-natural environments such as
Social media, in general, refers to computer-based national parks and urban green areas at different spatial scales,
applications used for networking and sharing digital content. the aim is to map and analyse nature recreation and human
Here, social media data refers to the spatial attributes (location), pressure using publicly shared social media posts in
temporal attributes (time), and relevant content (text, photos, combination with other available data. The general goals of this
and video) generated by users of different social media wor are to 1) extract meaningful patterns related to human-
nature interactions in green areas in order to inform
conservation-related decision making at different spatial scales, purposes than mapping and research. These passively shared
while 2) understanding the biases in location-based social data evidently require special consideration related to the
media data. The work includes the development of automated ethical use of data, and representativeness of the results. Thus,
workflows for data processing and analysis, comparisons there is a need to further position social media among other
between user-generated data and official datasets, and sources of user-geographic information and authorative data
accounting for gaps, inaccuracies and bias in the user-generated sources.
data at different spatial scales. The main objectives and related Social media data has been used in many application fields of
questions are the following: geography to study spatial phenomena, especially in the urban
• Analyzing the potential and limitations of different context. The study of population dynamics in cities (Longley &
social media data for studying human activities in green Adnan, 2016; Steiger et al., 2015), spatial diffusion (Crampton
areas: What kind of information do we get from social et al., 2013) and humanitarian response applications (Crooks et
media, what biases are included in the data and how is it al., 2013) are only a few examples of existing research from the
useful for studying human activities in green areas? fields of geography and geographic information science.
How does social media data compare with other However, examples in environmental studies, especially in
information sources from focus areas? conservation science are still limited (Di Minin, Tenkanen &
• Understanding the park visitors: Whose views and Toivonen, 2015). Studies focusing more on human-nature
observations are presented in user-generated content interactions include the quantification of visitation rates (Wood
from national parks? Where do the visitors come from et al., 2013; Levin, N., Kark, S. and Crandall, 2015),
and how do they move within and between green areas? assessment of cultural ecosystem services and people’s
• Mapping conservation opportunities and threats: Can interests (Richards & Friess, 2015; Roberge, 2014), and the
we characterize national parks and national park visitors extraction of species data (Barve, 2014; Stafford et al., 2010)
based on social media data? Can we identify human from social media. Also, only a few studies have used social
pressure on the environment from social media data? media data to understand human-nature interactions in urban
What tradeoffs between nature recreation and nature environments. These include methodological development for
conservation can we discover on a regional/global studying cultural ecosystem services based on social media
scale? content analysis with a case study from urban mangroves in
Singapore (Richards & Friess, 2015; Thiagarajah et al., 2015)
and tourism crowding (including parks) based in check-in data
2 Related work from Shanghai (Shi, Zhao & Chen, 2017).
Existing studies are often limited to only a single social media
platform, and lack comparisons and validation against other
Data and tools related to the information age (Castells, 2000)
data sources. Most studies using social media data for
and the big data revolution (Mayer-Schönberger & Cukier,
environmental studies rely solely on one platform (mostly
2013; Kitchin, 2014) have opened up new possibilities for
Flickr). Flickr might be the most suitable platform when
geographic knowledge discovery (Mennis & Guo, 2009;
looking at biodiversity features, while Instagram or other
Crampton et al., 2013). Before, recreational use patterns and
available data sources might reflect better the activities present
preferences related to green spaces have been studied using
in the area of interest (Hausmann et al., 2017b). Studies with
surveys (Tyrväinen, Mäkinen & Schipperijn, 2007), activity
more in-depth and advanced content analysis have thus far been
diaries (Mytton et al., 2012), GPS tracking (Korpilo, Virtanen
limited to smaller study sites (for example Richards & Tunçer,
& Lehvävirta, 2017) and public participatory GIS (PPGIS)
2017) and there is great potential to scale up such analysis to
(Brown, Schebella & Weber, 2014; Laatikainen et al., 2015).
continental and even global scales.
However, these methods are often costly to implement (Kwan,
A critical approach is needed when using social media as a
2013) and often limited to a specific case study area (Ives et al.,
source of information (Boyd & Crawford, 2012). Firstly,
2017). Recently, large amounts of geographic ‘big data’, such
ethical issues related to using people’s personal data need to be
as location-based social media data, have become available for
considered even when using publicly available content.
capturing information about people’s movement and activities
Secondly, it can be difficult to assess how representative the
in unprecedented volumes. This “location-based story telling”
captured social media users are of the population in question
(Sui & Goodchild, 2011) in various online platforms such as
(Longley & Adnan, 2016). Furthermore, the data is often biased
Facebook, Twitter and Instagram, has fundamentally
both spatially and temporally in relation to infrastructure,
transformed the notion of geographic information in recent
mobile phone coverage, and popular events, and potentially
years.
towards certain socioeconomic classes. This work aims to
Location-based social media data is often discussed in the
provide a deeper understanding of these issues in the context of
context of Volunteered Geographic Information (VGI). The
conservation-related questions.
concept VGI, coined by Goodchild in 2007, is widely used to
describe geographic datasets generated by non-experts. Vast
amounts of spatial data are continuously created in
collaborative projects such as the OpenStreetMap, social 3 Methods
networks such as Twitter and other location-aware platforms
on the web which host user-generated content. However, the The study sites of this research consists of green area networks
term VGI does not fully capture the nature of more at different scales; urban green areas from the city of Helsinki,
spontaneously generated data (See et al., 2016) such as tweets Finland, individual protected areas from Finland and South
and Flickr photos which have originally been shared for other Africa and the global protected area network and key
biodiversity areas worldwide. National parks with regular References
visitor monitoring schemes provide a test environment for
comparing social media user counts and content to official Barve, V. (2014) Discovering and developing primary
statistics. biodiversity data from social networking sites: A novel
Main material for the study consist of openly shared location- approach. Ecological Informatics, 24, 194–199.
based social media posts from different social media platforms
including (but not restricted to) Instagram, Twitter, and Flickr. Boyd, D. & Crawford, K. (2012) Critical questions for big data:
Data is retrieved from platform APIs using existing packages Provocations for a cultural, technological, and scholarly
and custom-made scripts in the Python Programming phenomenon. Information, communication & society, 15(5),
Language. 662–679.
Social media data are used in conjunction with, and
contrasted to official visitor statistics, surveys and other Brown, G., Schebella, M.F. & Weber, D. (2014) Using
available data from the focus areas. Official visitor statistics participatory GIS to measure physical activity and urban park
and visitor survey data from national park authorities are used benefits. Landscape and Urban Planning, 121, 34–44.
together with social media content from different platforms.
Global analysis from projected areas and key biodiversity areas Castells, M. (2000) The rise of the network society. Blackwell
is done using additional data from the International Union for Publishers.
the Conservation of Nature (IUCN), and Birdlife International.
The first part of the study is focused on the potential and Crampton, J.W., Graham, M., Poorthuis, A., Shelton, T., et al.
limitations of social media data for environmental studies (2013) Beyond the geotag: situating ‘big data’ and leveraging
through analyzing the different components of the data in terms the potential of the geoweb. Cartography and Geographic
of precision, accuracy and fit-for-purpose. Accuracy and Information Science, 40(2), 130–139.
precision of spatial and temporal information are assessed
trough methods of data exploration and comparisons to Crooks, A., Croitoru, A., Stefanidis, A. & Radzikowski, J.
ancillary data sets. Manual and automated content analysis of (2013) # Earthquake: Twitter as a distributed sensor system.
texts and images is used to explore if the content shared from Transactions in GIS, 17(1), 124–147.
green areas is thematically meaningful for green area
management and conservation, and to produce further analyses Goodchild, M.F. (2007) Citizens as sensors: the world of
of the spatial and temporal patterns of observed content volunteered geography. GeoJournal, 69(4), 211–221.
categories (for example related to a specific activity or species
within a national park). Results are compared with existing Hausmann, A., Toivonen, T., Heikinheimo, V., Tenkanen, H.,
information about park visitation from case study sites with et al. (2017a) Social media reveal that charismatic species are
existing reference data. For example, activities revealed from not the main attractor of ecotourists to sub-Saharan protected
social media photo content are compared with surveyed areas. Scientific Reports., 7(1), 763.
activities in a case study from a Finnish National Park.
Understanding bias in social media data includes the analysis Hausmann, A., Toivonen, T., Slotow, R., Tenkanen, H., et al.
of age- and language groups as well as the differentiation (2017b) Social Media Data Can Be Used to Understand
between data generated by locals and visitors. Questions related Tourists’ Preferences for Nature-Based Experiences in
to visitors’ social media usage are conducted in selected Protected Areas. Conservation Letters, 11(1).
national parks in Finland in order to find out the proportion of
national park visitors who are active in social media and to link Ives, C.D., Oke, C., Hehir, A., Gordon, A., et al. (2017)
this information to demographic background variables. Capturing residents’ values for urban green space: Mapping,
Furthermore, different platforms are compared in terms of their analysis and guidance for practice. Landscape and Urban
information content. Based on earlier findings from Kruger Planning, 161, 32–43.
National park in South Africa, Flickr contains more
information related to biodiversity features, whereas Instagram Joppa, L.N., O’Connor, B., Visconti, P., Smith, C., et al. (2016)
posts portray more often human activities (Hausmann et al., Filling in biodiversity threat gaps. Science, 352(6284), 416–
2017a). In this work, such differences in shared content will be 418.
explored further in different spatial and temporal contexts.
The last sections of the work apply the earlier findings for Kitchin, R. (2014) Big Data, new epistemologies and paradigm
answering questions related to conservation opportunities and shifts. Big Data & Society, 1(1), 1–12.
threats in the global protected area network and key
biodiversity areas using a combination of user-generated and Korpilo, S., Virtanen, T. & Lehvävirta, S. (2017) Smartphone
official data sources. GPS tracking—Inexpensive and efficient data collection on
The work has potential to bridge the gap between recent recreational movement. Landscape and Urban Planning, 157,
advances in social media data analytics and information needs 608–617.
in conservation science. The work will likely reveal new
information about spatial and temporal patterns of human Kwan, M.-P. (2013) Beyond space (as we knew it): toward
activities in green areas worldwide, especially from areas with temporally integrated geographies of segregation, health, and
no systematic visitor monitoring in place. accessibility: Space--time integration in geography and
GIScience. Annals of the Association of American Stafford, R., Hart, A.G., Collins, L., Kirkhope, C.L., et al.
Geographers, 103(5), 1078–1086. (2010) Eu-social science: the role of internet social networks in
the collection of bee biodiversity data. PloS one, 5(12), e14381.
Laatikainen, T., Tenkanen, H., Kyttä, M. & Toivonen, T.
(2015) Comparing conventional and PPGIS approaches in Steiger, E., Westerholt, R., Resch, B. & Zipf, A. (2015) Twitter
measuring equality of access to urban aquatic environments. as an indicator for whereabouts of people? Correlating Twitter
Landscape and Urban Planning, 144, 22–33. with UK census data. Computers, Environment and Urban
Systems, 54, 255–265.
Levin, N., Kark, S. and Crandall, D. (2015) Where have all the
people gone? Enhancing global conservation using night lights Sui, D. & Goodchild, M. (2011) The convergence of GIS and
and social media. Ecological Applications. 25 (8), 2153–2167. social media: challenges for GIScience. International Journal
Longley, P.A. & Adnan, M. (2016) Geo-temporal Twitter of Geographical Information Science, 25(11), 1737–1748.
demographics. International Journal of Geographical
Information Science, 30(2), 369–389. Thiagarajah, J., Wong, S.K.M., Richards, D.R. & Friess, D.A.
(2015) Historical and contemporary cultural ecosystem service
Mayer-Schönberger, V. & Cukier, K. (2013) Big Data: A values in the rapidly urbanizing city state of Singapore. Ambio,
Revolution That Will Transform How We Live, Work, and 44 (7), 666–677.
Think. Boston, Massachusetts, Houghton Mifflin Harcourt.
Tyrväinen, L., Mäkinen, K. & Schipperijn, J. (2007) Tools for
Mennis, J. & Guo, D. (2009) Spatial data mining and mapping social values of urban woodlands and other green
geographic knowledge discovery-An introduction. Computers, areas. Landscape and Urban Planning, 79(1), 5–19.
Environment and Urban Systems, 33(6), 403–408.
Venter, O., Sanderson, E.W., Magrach, A., Allan, J.R., et al.
Di Minin, E., Tenkanen, H. & Toivonen, T. (2015) Prospects (2016) Sixteen years of change in the global terrestrial human
and challenges for social media data in conservation science. footprint and implications for biodiversity conservation.
Frontiers in Environmental Science, 3. Nature Communications, 7, 12558.
Di Minin, E. & Toivonen, T. (2015) Global protected area Wood, S. a, Guerry, A.D., Silver, J.M. & Lacayo, M. (2013)
expansion: creating more than paper parks. BioScience, 65(7), Using social media to quantify nature-based tourism and
637–638. recreation. Scientific reports, 3, 2976.
Mytton, O.T., Townsend, N., Rutter, H. & Foster, C. (2012)
Green space and physical activity: an observational study using
Health Survey for England data. Health & place. 18 (5), 1034–
1041.Richards, D.R. & Friess, D.A. (2015) A rapid indicator of
cultural ecosystem service usage at a fine spatial scale: Content
analysis of social media photographs. Ecological Indicators,
53, 187–195.
Richards, D.R. & Tunçer, B. (2017) Using image recognition
to automate assessment of cultural ecosystem services from
social media photographs. Ecosystem Services. [Online]
Available from: doi:10.1016/J.ECOSER.2017.09.004
[Accessed: 4 October 2017].
Roberge, J.M. (2014) Using data from online social networks
in conservation science: Which species engage people the most
on Twitter? Biodiversity and Conservation, 23(3), 715–726.
See, L., Mooney, P., Foody, G., Bastin, L., et al. (2016)
Crowdsourcing, Citizen Science or Volunteered Geographic
Information? The Current State of Crowdsourced Geographic
Information. ISPRS International Journal of Geo-Information,
5(5), 55.
Shi, B., Zhao, J. & Chen, P.-J. (2017) Exploring urban tourism
crowding in Shanghai via crowdsourcing geospatial data.
Current Issues in Tourism, 20(11), 1186–1209.