=Paper=
{{Paper
|id=Vol-2088/paper1
|storemode=property
|title=Integration of Social Media in Spatial Crime Analysis and Prediction Models for Events
|pdfUrl=https://ceur-ws.org/Vol-2088/paper1.pdf
|volume=Vol-2088
|authors=Alina Ristea,Michael Leitner,Christos Charcharos,Emmanuel Papadakis,Thomas Blaschke,Vuokko Heikinheimo,Hoda Allahbakhshi,Robert Weibel,Weiming Huang,Ali Mansourian,Lars Harrie,Sebastian Hunger,Azimjon Sayidov,Robert Weibel,Kiran Zahra
}}
==Integration of Social Media in Spatial Crime Analysis and Prediction Models for Events==
Integration of Social Media in Spatial Crime Analysis and Prediction
Models for Events
Alina Ristea Michael Leitner
University of Salzburg, Louisiana State University,
Schillerstraße 30 E-104 Howe-Russell-Kniffen
Salzburg, Austria Geoscience Complex,
mihaela.ristea@stud.sbg.ac.at Baton Rouge, LA, USA
mleitne@lsu.edu
Abstract
The last decade has been the most productive in respect to social media data exploration and possible uses in crime prediction. This area
is thus a rapidly evolving and growing field. This PhD research aims to find and evaluate spatial relationships between crime occurrences
and nearby social media activity for events areas and estimating the possible influence of this activity for crime prediction models. Overall,
the thesis will focus on geospatial crime prediction concerning planned and emerging events through the exploration of social media data,
and other information including demographic, economic and safety risk factors.
The thesis will utilize methods and tools from various fields including: social media text mining and classification from machine learning;
spatial statistics together with forecasting models from crime prediction. Outcomes will be a valuable basis for defining new research areas,
helping to understand further spatial crime analysis and prediction models that include secondary data sources, such as social media, on the
basis of event exploration.
Keywords: spatial crime analysis, social media, spatial prediction, crowd based events.
1 Introduction as well as in social media and it opens up a
plethora of research that can be done in different
To date, crime prediction models in conjunction fields of interest.
with social media data have been able to achieve a Machine learning techniques together with linear
significantly high rate of success, for certain types and logistic modeling (Alruily, 2012; Burnap &
of crime, complementing traditional crime Williams, 2015; Wang & Gerber, 2015; Wang et
prediction models (Corso, 2015; Gerber, 2014; al, 2012), density based models (Bendler et al,
Wang & Gerber, 2015; Wang et al, 2012). 2014a; Cheng & Smyth, 2015; Featherstone,
Most of the crime prediction techniques are used 2013a; b), risk terrain modeling (Perry, 2013) or
for crime retrospective forecasting, which consider Geographically Weighted Regression (Bendler et
the existence of historical crime data. For this al, 2014b) have been used to predict crime
approach, quantitative methods were developed to occurrences using geotagged tweets or, in more
categorize crime data in objective ways and to find detail, text mining from tweets. The algorithms
characteristics such as the type of crime, typology have highly ranked results; however there are not
of offender, result of investigation, confidential many explanations about why the accuracy is
information using geospatial and statistical changing for different crime or social media
techniques, such as hot spot analysis (Eck et al, datasets. As for our knowledge, very few previous
2005), regression, cluster determination or works are considering the effect of events on
spatiotemporal pattern recognition. spatial crime distribution while using social media
In recent period the crime predictive analytics are in prediction.
getting more interdisciplinary. This is also related There is an important body of literature focusing
to the “big data” growth, the last decade being the on spatial crime distribution from the events
most productive in respect to social media data mirror and on social media during events, such as
exploration. Researchers from informatics, big or mega events, sporting events, natural
computer science, mathematics and statistics are disasters. However, not so much research attempt
collaborating with criminologists, sociologists and has been done before specifically for predicting
others in developing new prediction models. planned events considering social media and crime
Moreover, the high evolution of the technology is data, at a specific location or at a venue spot and
being a very important process in crime analytics also including environmental explanatory variables
AGILE 2018 – Lund, June 12-15, 2018
in the models. Population trajectories and their include hot spot analysis, regression methods, data
impact on crime likelihood are different according mining and machine learning algorithms, near-
to the environmental factors. repeat concept, spatiotemporal analysis and risk
Finding attributes from social media that can terrain analysis (Perry, 2013). For a better
give a boost in crime prediction models and their prediction algorithms are selected accordingly
implementation along with the crime data for a with the research approach.
better prediction is the core part of the PhD, with a Crowd based events (high attendance events) are
main focus on public events. Three main elements considered attractors and generators of crime.
are the base of this PhD research: crime There are studies emphasizing potential
occurrences, social media (mostly Twitter data) implications of theories like the routine activity,
and events (planned events and emerging events). involved in the hooliganism and violence crime
An event can be defined as a matter that happens and the crime pattern theory, related to crime
in a place, especially one of importance, such as a increase in specific areas for events such as
planned public and social occasion or particular sporting events (Kurland et al, 2014).
contests making up a sports competition. The The analyses of crime patterns are the base of
planned events are the ones for which their main determining crime displacement, spatially and
parameters are defined, such as the location or the temporally. However, there is not a lot of focus on
public attendance. The emerging events refer to specific events in the growing field of spatial
the ones from which basic elements have the crime predictive analytics. This research aims to
ability to develop novel relations and identities adapt and use the already mentioned crime
designed into higher-level elements. prediction methods for events. The social media
Overall, the spatiotemporal analysis is the base data processing for event analysis and the
of this PhD study, managed along with spatial integration of the outcomes in the crime prediction
relationships such as distance, connectivity, models may improve the final results.
distribution, form, and space between spatial units. The opportunities offered by social media require
The study cases will be carefully chosen and the establishment of research methodology for
discussed particularly, following a final drawing insights into extraction of information that
comparison where an adapted and robust crime can be helpful in many fields, as crime analysis.
prediction model for events will be defined. There is a huge volume of data that social media
This PhD research aims at filling this gap of the networks offer and it is analyzed in branches like
social media integration in spatial crime prediction social sciences, economics, GiScience, computer
for different event occurrences. During the PhD science, psychology or philosophy.
study I will use the tools to extract, quantify and Key techniques go beyond text analytics to
normalize the social media data and attributes that include opinion mining, entity extraction, event
can lead to better results in geospatial crime recognition, sentiment analysis, topic modeling,
prediction analytics models for different events. social network analysis, trend analysis, and visual
Therewith, this research will aim at paving the analytics. The density of words and their
way for the usage of multidisciplinary tools and consistency from a lexicon (dictionary) have the
integration of the results in geospatial prediction likelihood to define relationships between the data.
models that can answer spatial and temporal Therefore, it is still an open field of research
patterns in crime analysis. Spatial criminology because of the noisy, unstructured and highly
theories will be support of the developed analyses diverse social media data. The analysis of social
during my PhD studies. data parameters, not considering the "spatial"
component, was performed mostly from a
2 Related work computer and data science point of view.
The implementation of social media data in
Crime presents an increased strategic complexity crime prediction models started just recently.
and interaction with other networks that are not However, crime prediction algorithms were tested
necessarily connected. The main categories of in details through studies in the last five years, the
prediction models applied in crime applications
AGILE 2018 – Lund, June 12-15, 2018
same can be confirmed for prediction algorithms An additional innovative attempt considers the
for social media. implication of sentiment analysis by applying
One approach for combining social media and lexicon-based methods and of weather parameters,
crime data is developed through topic extraction combine with crime data in a kernel density
and the connections with crime occurrences. The algorithm (Cheng & Smyth, 2015). For the same
2012 was the first time of bringing the social city, researchers calculated user ranking for the
media and crime together in order to make a concept of user credibility and then captured
prediction (Wang et al, 2012). Automatic semantic predictive context hidden variables to test in crime
analysis and NLP of Twitter data, dimensionality rate trend prediction.
reduction through LDA and prediction with linear Past research has already confirmed that crime
modeling for hit-and-run crimes in Charlottesville, types distribution show some similarities
Virginia represented the earliest research on this throughout different cultures, religions, languages,
topic. Another study investigated the possible and socio-economic statuses. However, no
integration of rich textual content to predict users research attempt has ever been done before
spatial trajectories, followed by the correlation specifically for predicting planned and emerging
with crime occurrences in Chicago, IL (Wang & events considering social media and crime data, at
Gerber, 2015). different locations and also at a venue spot.
A second approach points out the importance of Besides the crime occurrences connected with
the social media density. If the social media usage sport events, research shows results in detecting
is sufficient in an area of study, it may establish a sport events on Twitter, the public’s overall
higher predictive value (Featherstone, 2013a; b). perception of highly ranked events such as the
Researchers implemented Twitter data as SuperBowl, and crowd activities related to sport
predictors along with archived crime data, which events. Moreover, some researchers are interested
resulted in an increase in the prediction for in crowd events such as festivals, concerts,
burglaries and robberies (Bendler et al, 2014a). political summits, expos, city traffic, etc.
However, the analysis considered just the number Another important type of event considered in
of the tweets and the number and crime type. crime research is protests, which can lead to high
Twitter data is considered a proxy for ambient crime displacement. Recent theoretical
population used in crime rate calculations, background argues that social media may increase
showing impact on crime hotspots (Malleson & the occurrence of emerging events, such as
Andresen, 2015; 2016). Moreover, other datasets protests. The spatiotemporal variation in the event
can be supportive for ambient population intensity can be connected with social media
calculations. Considering social media as a activity. On the other hand, the coordination and
dynamic variable, it is important to create also a management of the protest activity might be done
dynamic population variable (ambient), challenge on social media, and also the social pressure might
that would be tested during my PhD development be developed through online announcements. The
(Kounadi et al, 2017). limited existing research in this field considers
Topic modeling and linguistic analysis of crowd activities related to events as a proxy for
spatiotemporal tagged tweets added to crime data crime analysis and prediction.
in kernel density estimation at neighborhoods level As discussed before, there is a growing literature
resulted in good predictions for the City of that investigates the impact on crime from events
Chicago, IL (Gerber, 2014). Through this (sporting events, for example), as well as a
research, it was shown that Twitter-derived growing literature that shows how peoples’
attributes improve prediction in 19 from 25 crime behavior on social media changes during
types. Acknowledging the importance of the study, (sporting) events. However, there is limited
the temporal patterns might be different for a research that investigates the relationship, if
longer period of time than the three months dataset present, between events, social media activity, and
used. Also the seasonality of crime can affect the criminal events.
prediction accuracy.
3 Objectives. Research description
AGILE 2018 – Lund, June 12-15, 2018
Prediction of crime incidents can benefit from Datasets: crime, tweets, points of interest, old
social media implementation as an exogenous protest data, and socio-economic information
predictor and for possibly improving the precision Methods: topic extraction, exponential dispersion
of results. The innovative aspect of this research models, logistic regression, crime displacement
project will be the integration of social media methods, trajectory analysis.
analysis into crime prediction models for specific
events and the evaluation of the quality of such 4 Discussion
predictions. Three main objectives followed by
research questions and shortly presented data and Overall, this dissertation will focus on geospatial
methods are in the following rows: crime predictive analysis concerning planned and
emerging events analysis through the exploration
• Objective 1: examine the relationship of the complex parameters of social media data.
between the distribution of crime and social Moreover, the study will explore historical crime
media at regularly occurring events data and analyze the correlation between crime
RQ1: What is the relationship between specific occurrences and social media data parameters
types of events and crime types? (topic, term frequency, emotions). According to
RQ2: How can social media predict the diffusion research, there is a tendency of crime prevention
of crimes related to the end of events? initiatives to displace crime or diffuse crime
Datasets: crime, tweets, points of interest, reduction benefits. The analysis will identify
residential population, Landscan population. information from social media that may help
Methods: topic extraction, text classification by predict crime related to spatial displacement
finding “violent tweets”; heat maps, point pattern regarding the occurrence of an event. Also other
analyses, hierarchical clustering (KNN), logistic possible risk factors will be considered. Population
regression. data is very important in determining crime rates,
so determining population at crime risk will be an
• Objective 2: investigate the relationship additional risk factor into the crime prediction
between crime occurrences at a venue and models.
various event types The distinctive characteristic of this approach
RQ1: How does the event type affect crime lies in the use of the three data elements in
prediction at a venue? combination with some other information, such as
RQ2: How are social media and the number of demographic, to provide a new interpretation of
crimes correlated? social media integration in spatial crime prediction
Datasets: crime, tweets, points of interest. for different event occurrences.
Methods: topic extraction, opinion mining (using Several spatial statistical models will be applied,
Naïve Bayes); Gi* (clusters of points with values including, spatial regression analysis for finding
higher in magnitude than expected in randomize spatial relationships among crime and social data
distributions), Moran's Index I (clustering variables, geographically weighted regression for
likelihood), negative binomial logistic regression, point data validation; linear and logistic
evaluation using Area under the Curve (AUC). regression; global spatial autocorrelation for
finding the degree of dependency among the
• Objective 3: explore the adaptability of occurrences in the same geographic space.
spatiotemporal techniques in the evaluation The above listed methods will help the
of emerging events (protests, riots) evaluation and integration of social media
RQ1: How may a spatiotemporal analysis of information in crime analysis and predictive
social media help identify emerging events analytics for event based occurrences. There are
influencing crime? limitations in respect to the location of social
RQ2: How may social media predict crime media data. Because of the rather small percentage
related to the spatial displacement of an emerging of the people who use geo-tagging, algorithms to
event? improve the locational quality through text mining
(the location is extracted from the text) were
AGILE 2018 – Lund, June 12-15, 2018
developed. Other limitation may also be the Gerber, M. S. (2014) Predicting crime using
quality of the crime data. We have to remember Twitter and kernel density estimation. Decision
that these data are collected by humans, so it is Support Systems, 61, 115-125.
very difficult to eliminate the bias included in all Kounadi, O., Ristea, A., Leitner, M. & Langford,
datasets used in research. C. (2017) Population at risk: using areal
As a follow up application of this PhD, the interpolation and Twitter messages to create
results may be used for a higher effectiveness of population models for burglaries and robberies.
police patrols allocation in a larger area of Cartography and Geographic Information
influence, not just on the event location vicinity, Science, 1-15.
and also in monitoring emerging events for Kurland, J., Tilley, N. & Johnson, S. D. (2014)
negative effects. This would ideally increase The Football ‘Hotspot’Matrix. Football
policing efficiency, and prevent damages to public Hooliganism, Fan Behaviour and Crime:
property. Contemporary Issues, 21.
Malleson, N. & Andresen, M. A. (2015) The
5 References impact of using social media data in crime rate
calculations: shifting hot spots and changing
Alruily, M. (2012) Using text mining to identify spatial patterns. Cartography and Geographic
crime patterns from arabic crime news report Information Science, 42(2), 112-121.
corpus. Malleson, N. & Andresen, M. A. (2016) Exploring
Bendler, J., Brandt, T., Wagner, S. & Neumann, D. the impact of ambient population measures on
(2014a) Investigating crime-to-twitter London crime hotspots. Journal of Criminal
relationships in urban environments-facilitating a Justice, 46, 52-63.
virtual neighborhood watch. Perry, W. L. (2013) Predictive policing: The role
Bendler, J., Ratku, A. & Neumann, D. (2014b) of crime forecasting in law enforcement
Crime Mapping through Geo-Spatial Social Media operationsRand Corporation.
Activity. Wang, M. & Gerber, M. S. (2015) Using Twitter
Burnap, P. & Williams, M. L. (2015) Cyber Hate for Next-Place Prediction, with an Application to
Speech on Twitter: An Application of Machine Crime Prediction, Computational Intelligence,
Classification and Statistical Modeling for Policy 2015 IEEE Symposium Series on. IEEE.
and Decision Making. Policy & Internet. Wang, X., Gerber, M. S. & Brown, D. E. (2012)
Cheng, Z. & Smyth, R. (2015) Crime Automatic crime prediction using events extracted
Victimization, Neighbourhood Safety and from twitter posts, Social Computing, Behavioral-
Happiness in China. Cultural Modeling and PredictionSpringer, 231-
Corso, A. J. (2015) Toward Predictive Crime 238.
Analysis via Social Media, Big Data, and GIS
Spatial Correlation. iConference 2015
Proceedings.
Eck, J., Chainey, S., Cameron, J. & Wilson, R.
(2005) Mapping crime: Understanding hotspots.
Featherstone, C. (2013a) Identifying vehicle
descriptions in microblogging text with the aim of
reducing or predicting crime, Adaptive Science
and Technology (ICAST), 2013 International
Conference on. IEEE.
Featherstone, C. (2013b) The relevance of social
media as it applies in South Africa to crime
prediction, IST-Africa Conference and Exhibition
(IST-Africa), 2013. IEEE.