=Paper= {{Paper |id=Vol-2088/paper1 |storemode=property |title=Integration of Social Media in Spatial Crime Analysis and Prediction Models for Events |pdfUrl=https://ceur-ws.org/Vol-2088/paper1.pdf |volume=Vol-2088 |authors=Alina Ristea,Michael Leitner,Christos Charcharos,Emmanuel Papadakis,Thomas Blaschke,Vuokko Heikinheimo,Hoda Allahbakhshi,Robert Weibel,Weiming Huang,Ali Mansourian,Lars Harrie,Sebastian Hunger,Azimjon Sayidov,Robert Weibel,Kiran Zahra }} ==Integration of Social Media in Spatial Crime Analysis and Prediction Models for Events== https://ceur-ws.org/Vol-2088/paper1.pdf
     Integration of Social Media in Spatial Crime Analysis and Prediction
                              Models for Events
                                              Alina Ristea                             Michael Leitner
                                        University of Salzburg,                   Louisiana State University,
                                            Schillerstraße 30                   E-104 Howe-Russell-Kniffen
                                           Salzburg, Austria                       Geoscience Complex,
                                     mihaela.ristea@stud.sbg.ac.at                 Baton Rouge, LA, USA
                                                                                      mleitne@lsu.edu


                                                                      Abstract

       The last decade has been the most productive in respect to social media data exploration and possible uses in crime prediction. This area
    is thus a rapidly evolving and growing field. This PhD research aims to find and evaluate spatial relationships between crime occurrences
    and nearby social media activity for events areas and estimating the possible influence of this activity for crime prediction models. Overall,
    the thesis will focus on geospatial crime prediction concerning planned and emerging events through the exploration of social media data,
    and other information including demographic, economic and safety risk factors.
       The thesis will utilize methods and tools from various fields including: social media text mining and classification from machine learning;
    spatial statistics together with forecasting models from crime prediction. Outcomes will be a valuable basis for defining new research areas,
    helping to understand further spatial crime analysis and prediction models that include secondary data sources, such as social media, on the
    basis of event exploration.
    Keywords: spatial crime analysis, social media, spatial prediction, crowd based events.

1     Introduction                                                        as well as in social media and it opens up a
                                                                          plethora of research that can be done in different
To date, crime prediction models in conjunction                           fields of interest.
with social media data have been able to achieve a                          Machine learning techniques together with linear
significantly high rate of success, for certain types                     and logistic modeling (Alruily, 2012; Burnap &
of crime, complementing traditional crime                                 Williams, 2015; Wang & Gerber, 2015; Wang et
prediction models (Corso, 2015; Gerber, 2014;                             al, 2012), density based models (Bendler et al,
Wang & Gerber, 2015; Wang et al, 2012).                                   2014a; Cheng & Smyth, 2015; Featherstone,
  Most of the crime prediction techniques are used                        2013a; b), risk terrain modeling (Perry, 2013) or
for crime retrospective forecasting, which consider                       Geographically Weighted Regression (Bendler et
the existence of historical crime data. For this                          al, 2014b) have been used to predict crime
approach, quantitative methods were developed to                          occurrences using geotagged tweets or, in more
categorize crime data in objective ways and to find                       detail, text mining from tweets. The algorithms
characteristics such as the type of crime, typology                       have highly ranked results; however there are not
of offender, result of investigation, confidential                        many explanations about why the accuracy is
information using geospatial and statistical                              changing for different crime or social media
techniques, such as hot spot analysis (Eck et al,                         datasets. As for our knowledge, very few previous
2005), regression, cluster determination or                               works are considering the effect of events on
spatiotemporal pattern recognition.                                       spatial crime distribution while using social media
  In recent period the crime predictive analytics are                     in prediction.
getting more interdisciplinary. This is also related                        There is an important body of literature focusing
to the “big data” growth, the last decade being the                       on spatial crime distribution from the events
most productive in respect to social media data                           mirror and on social media during events, such as
exploration. Researchers from informatics,                                big or mega events, sporting events, natural
computer science, mathematics and statistics are                          disasters. However, not so much research attempt
collaborating with criminologists, sociologists and                       has been done before specifically for predicting
others in developing new prediction models.                               planned events considering social media and crime
Moreover, the high evolution of the technology is                         data, at a specific location or at a venue spot and
being a very important process in crime analytics                         also including environmental explanatory variables
 AGILE 2018 – Lund, June 12-15, 2018




in the models. Population trajectories and their        include hot spot analysis, regression methods, data
impact on crime likelihood are different according      mining and machine learning algorithms, near-
to the environmental factors.                           repeat concept, spatiotemporal analysis and risk
  Finding attributes from social media that can         terrain analysis (Perry, 2013). For a better
give a boost in crime prediction models and their       prediction algorithms are selected accordingly
implementation along with the crime data for a          with the research approach.
better prediction is the core part of the PhD, with a     Crowd based events (high attendance events) are
main focus on public events. Three main elements        considered attractors and generators of crime.
are the base of this PhD research: crime                There are studies emphasizing potential
occurrences, social media (mostly Twitter data)         implications of theories like the routine activity,
and events (planned events and emerging events).        involved in the hooliganism and violence crime
  An event can be defined as a matter that happens      and the crime pattern theory, related to crime
in a place, especially one of importance, such as a     increase in specific areas for events such as
planned public and social occasion or particular        sporting events (Kurland et al, 2014).
contests making up a sports competition. The              The analyses of crime patterns are the base of
planned events are the ones for which their main        determining crime displacement, spatially and
parameters are defined, such as the location or the     temporally. However, there is not a lot of focus on
public attendance. The emerging events refer to         specific events in the growing field of spatial
the ones from which basic elements have the             crime predictive analytics. This research aims to
ability to develop novel relations and identities       adapt and use the already mentioned crime
designed into higher-level elements.                    prediction methods for events. The social media
  Overall, the spatiotemporal analysis is the base      data processing for event analysis and the
of this PhD study, managed along with spatial           integration of the outcomes in the crime prediction
relationships such as distance, connectivity,           models may improve the final results.
distribution, form, and space between spatial units.      The opportunities offered by social media require
The study cases will be carefully chosen and            the establishment of research methodology for
discussed particularly, following a final               drawing insights into extraction of information that
comparison where an adapted and robust crime            can be helpful in many fields, as crime analysis.
prediction model for events will be defined.            There is a huge volume of data that social media
  This PhD research aims at filling this gap of the     networks offer and it is analyzed in branches like
social media integration in spatial crime prediction    social sciences, economics, GiScience, computer
for different event occurrences. During the PhD         science, psychology or philosophy.
study I will use the tools to extract, quantify and       Key techniques go beyond text analytics to
normalize the social media data and attributes that     include opinion mining, entity extraction, event
can lead to better results in geospatial crime          recognition, sentiment analysis, topic modeling,
prediction analytics models for different events.       social network analysis, trend analysis, and visual
  Therewith, this research will aim at paving the       analytics. The density of words and their
way for the usage of multidisciplinary tools and        consistency from a lexicon (dictionary) have the
integration of the results in geospatial prediction     likelihood to define relationships between the data.
models that can answer spatial and temporal             Therefore, it is still an open field of research
patterns in crime analysis. Spatial criminology         because of the noisy, unstructured and highly
theories will be support of the developed analyses      diverse social media data. The analysis of social
during my PhD studies.                                  data parameters, not considering the "spatial"
                                                        component, was performed mostly from a
2    Related work                                       computer and data science point of view.
                                                          The implementation of social media data in
Crime presents an increased strategic complexity        crime prediction models started just recently.
and interaction with other networks that are not        However, crime prediction algorithms were tested
necessarily connected. The main categories of           in details through studies in the last five years, the
prediction models applied in crime applications
                                                                           AGILE 2018 – Lund, June 12-15, 2018




same can be confirmed for prediction algorithms           An additional innovative attempt considers the
for social media.                                       implication of sentiment analysis by applying
  One approach for combining social media and           lexicon-based methods and of weather parameters,
crime data is developed through topic extraction        combine with crime data in a kernel density
and the connections with crime occurrences. The         algorithm (Cheng & Smyth, 2015). For the same
2012 was the first time of bringing the social          city, researchers calculated user ranking for the
media and crime together in order to make a             concept of user credibility and then captured
prediction (Wang et al, 2012). Automatic semantic       predictive context hidden variables to test in crime
analysis and NLP of Twitter data, dimensionality        rate trend prediction.
reduction through LDA and prediction with linear          Past research has already confirmed that crime
modeling for hit-and-run crimes in Charlottesville,     types distribution show some similarities
Virginia represented the earliest research on this      throughout different cultures, religions, languages,
topic. Another study investigated the possible          and socio-economic statuses. However, no
integration of rich textual content to predict users    research attempt has ever been done before
spatial trajectories, followed by the correlation       specifically for predicting planned and emerging
with crime occurrences in Chicago, IL (Wang &           events considering social media and crime data, at
Gerber, 2015).                                          different locations and also at a venue spot.
  A second approach points out the importance of          Besides the crime occurrences connected with
the social media density. If the social media usage     sport events, research shows results in detecting
is sufficient in an area of study, it may establish a   sport events on Twitter, the public’s overall
higher predictive value (Featherstone, 2013a; b).       perception of highly ranked events such as the
Researchers implemented Twitter data as                 SuperBowl, and crowd activities related to sport
predictors along with archived crime data, which        events. Moreover, some researchers are interested
resulted in an increase in the prediction for           in crowd events such as festivals, concerts,
burglaries and robberies (Bendler et al, 2014a).        political summits, expos, city traffic, etc.
However, the analysis considered just the number          Another important type of event considered in
of the tweets and the number and crime type.            crime research is protests, which can lead to high
  Twitter data is considered a proxy for ambient        crime       displacement.     Recent        theoretical
population used in crime rate calculations,             background argues that social media may increase
showing impact on crime hotspots (Malleson &            the occurrence of emerging events, such as
Andresen, 2015; 2016). Moreover, other datasets         protests. The spatiotemporal variation in the event
can be supportive for ambient population                intensity can be connected with social media
calculations. Considering social media as a             activity. On the other hand, the coordination and
dynamic variable, it is important to create also a      management of the protest activity might be done
dynamic population variable (ambient), challenge        on social media, and also the social pressure might
that would be tested during my PhD development          be developed through online announcements. The
(Kounadi et al, 2017).                                  limited existing research in this field considers
  Topic modeling and linguistic analysis of             crowd activities related to events as a proxy for
spatiotemporal tagged tweets added to crime data        crime analysis and prediction.
in kernel density estimation at neighborhoods level       As discussed before, there is a growing literature
resulted in good predictions for the City of            that investigates the impact on crime from events
Chicago, IL        (Gerber, 2014). Through this         (sporting events, for example), as well as a
research, it was shown that Twitter-derived             growing literature that shows how peoples’
attributes improve prediction in 19 from 25 crime       behavior on social media changes during
types. Acknowledging the importance of the study,       (sporting) events. However, there is limited
the temporal patterns might be different for a          research that investigates the relationship, if
longer period of time than the three months dataset     present, between events, social media activity, and
used. Also the seasonality of crime can affect the      criminal events.
prediction accuracy.
                                                        3    Objectives. Research description
 AGILE 2018 – Lund, June 12-15, 2018




  Prediction of crime incidents can benefit from       Datasets: crime, tweets, points of interest, old
social media implementation as an exogenous           protest data, and socio-economic information
predictor and for possibly improving the precision     Methods: topic extraction, exponential dispersion
of results. The innovative aspect of this research    models, logistic regression, crime displacement
project will be the integration of social media       methods, trajectory analysis.
analysis into crime prediction models for specific
events and the evaluation of the quality of such      4    Discussion
predictions. Three main objectives followed by
research questions and shortly presented data and       Overall, this dissertation will focus on geospatial
methods are in the following rows:                    crime predictive analysis concerning planned and
                                                      emerging events analysis through the exploration
 •    Objective 1: examine the relationship           of the complex parameters of social media data.
      between the distribution of crime and social    Moreover, the study will explore historical crime
      media at regularly occurring events             data and analyze the correlation between crime
  RQ1: What is the relationship between specific      occurrences and social media data parameters
types of events and crime types?                      (topic, term frequency, emotions). According to
  RQ2: How can social media predict the diffusion     research, there is a tendency of crime prevention
of crimes related to the end of events?               initiatives to displace crime or diffuse crime
  Datasets: crime, tweets, points of interest,        reduction benefits. The analysis will identify
residential population, Landscan population.          information from social media that may help
  Methods: topic extraction, text classification by   predict crime related to spatial displacement
finding “violent tweets”; heat maps, point pattern    regarding the occurrence of an event. Also other
analyses, hierarchical clustering (KNN), logistic     possible risk factors will be considered. Population
regression.                                           data is very important in determining crime rates,
                                                      so determining population at crime risk will be an
 •    Objective 2: investigate the relationship       additional risk factor into the crime prediction
      between crime occurrences at a venue and        models.
      various event types                               The distinctive characteristic of this approach
  RQ1: How does the event type affect crime           lies in the use of the three data elements in
prediction at a venue?                                combination with some other information, such as
  RQ2: How are social media and the number of         demographic, to provide a new interpretation of
crimes correlated?                                    social media integration in spatial crime prediction
  Datasets: crime, tweets, points of interest.        for different event occurrences.
  Methods: topic extraction, opinion mining (using      Several spatial statistical models will be applied,
Naïve Bayes); Gi* (clusters of points with values     including, spatial regression analysis for finding
higher in magnitude than expected in randomize        spatial relationships among crime and social data
distributions), Moran's Index I (clustering           variables, geographically weighted regression for
likelihood), negative binomial logistic regression,   point data validation; linear and logistic
evaluation using Area under the Curve (AUC).          regression; global spatial autocorrelation for
                                                      finding the degree of dependency among the
 •    Objective 3: explore the adaptability of        occurrences in the same geographic space.
      spatiotemporal techniques in the evaluation       The above listed methods will help the
      of emerging events (protests, riots)            evaluation and integration of social media
  RQ1: How may a spatiotemporal analysis of           information in crime analysis and predictive
social media help identify emerging events            analytics for event based occurrences. There are
influencing crime?                                    limitations in respect to the location of social
  RQ2: How may social media predict crime             media data. Because of the rather small percentage
related to the spatial displacement of an emerging    of the people who use geo-tagging, algorithms to
event?                                                improve the locational quality through text mining
                                                      (the location is extracted from the text) were
                                                                        AGILE 2018 – Lund, June 12-15, 2018




developed. Other limitation may also be the            Gerber, M. S. (2014) Predicting crime using
quality of the crime data. We have to remember         Twitter and kernel density estimation. Decision
that these data are collected by humans, so it is      Support Systems, 61, 115-125.
very difficult to eliminate the bias included in all   Kounadi, O., Ristea, A., Leitner, M. & Langford,
datasets used in research.                             C. (2017) Population at risk: using areal
  As a follow up application of this PhD, the          interpolation and Twitter messages to create
results may be used for a higher effectiveness of      population models for burglaries and robberies.
police patrols allocation in a larger area of          Cartography and Geographic Information
influence, not just on the event location vicinity,    Science, 1-15.
and also in monitoring emerging events for             Kurland, J., Tilley, N. & Johnson, S. D. (2014)
negative effects. This would ideally increase          The      Football    ‘Hotspot’Matrix.    Football
policing efficiency, and prevent damages to public     Hooliganism, Fan Behaviour and Crime:
property.                                              Contemporary Issues, 21.
                                                       Malleson, N. & Andresen, M. A. (2015) The
5    References                                        impact of using social media data in crime rate
                                                       calculations: shifting hot spots and changing
Alruily, M. (2012) Using text mining to identify       spatial patterns. Cartography and Geographic
crime patterns from arabic crime news report           Information Science, 42(2), 112-121.
corpus.                                                Malleson, N. & Andresen, M. A. (2016) Exploring
Bendler, J., Brandt, T., Wagner, S. & Neumann, D.      the impact of ambient population measures on
(2014a)         Investigating       crime-to-twitter   London crime hotspots. Journal of Criminal
relationships in urban environments-facilitating a     Justice, 46, 52-63.
virtual neighborhood watch.                            Perry, W. L. (2013) Predictive policing: The role
Bendler, J., Ratku, A. & Neumann, D. (2014b)           of crime forecasting in law enforcement
Crime Mapping through Geo-Spatial Social Media         operationsRand Corporation.
Activity.                                              Wang, M. & Gerber, M. S. (2015) Using Twitter
Burnap, P. & Williams, M. L. (2015) Cyber Hate         for Next-Place Prediction, with an Application to
Speech on Twitter: An Application of Machine           Crime Prediction, Computational Intelligence,
Classification and Statistical Modeling for Policy     2015 IEEE Symposium Series on. IEEE.
and Decision Making. Policy & Internet.                Wang, X., Gerber, M. S. & Brown, D. E. (2012)
Cheng, Z. & Smyth, R. (2015) Crime                     Automatic crime prediction using events extracted
Victimization,     Neighbourhood      Safety    and    from twitter posts, Social Computing, Behavioral-
Happiness in China.                                    Cultural Modeling and PredictionSpringer, 231-
Corso, A. J. (2015) Toward Predictive Crime            238.
Analysis via Social Media, Big Data, and GIS
Spatial      Correlation.     iConference     2015
Proceedings.
Eck, J., Chainey, S., Cameron, J. & Wilson, R.
(2005) Mapping crime: Understanding hotspots.
Featherstone, C. (2013a) Identifying vehicle
descriptions in microblogging text with the aim of
reducing or predicting crime, Adaptive Science
and Technology (ICAST), 2013 International
Conference on. IEEE.
Featherstone, C. (2013b) The relevance of social
media as it applies in South Africa to crime
prediction, IST-Africa Conference and Exhibition
(IST-Africa), 2013. IEEE.