Integration of Social Media in Spatial Crime Analysis and Prediction Models for Events Alina Ristea Michael Leitner University of Salzburg, Louisiana State University, Schillerstraße 30 E-104 Howe-Russell-Kniffen Salzburg, Austria Geoscience Complex, mihaela.ristea@stud.sbg.ac.at Baton Rouge, LA, USA mleitne@lsu.edu Abstract The last decade has been the most productive in respect to social media data exploration and possible uses in crime prediction. This area is thus a rapidly evolving and growing field. This PhD research aims to find and evaluate spatial relationships between crime occurrences and nearby social media activity for events areas and estimating the possible influence of this activity for crime prediction models. Overall, the thesis will focus on geospatial crime prediction concerning planned and emerging events through the exploration of social media data, and other information including demographic, economic and safety risk factors. The thesis will utilize methods and tools from various fields including: social media text mining and classification from machine learning; spatial statistics together with forecasting models from crime prediction. Outcomes will be a valuable basis for defining new research areas, helping to understand further spatial crime analysis and prediction models that include secondary data sources, such as social media, on the basis of event exploration. Keywords: spatial crime analysis, social media, spatial prediction, crowd based events. 1 Introduction as well as in social media and it opens up a plethora of research that can be done in different To date, crime prediction models in conjunction fields of interest. with social media data have been able to achieve a Machine learning techniques together with linear significantly high rate of success, for certain types and logistic modeling (Alruily, 2012; Burnap & of crime, complementing traditional crime Williams, 2015; Wang & Gerber, 2015; Wang et prediction models (Corso, 2015; Gerber, 2014; al, 2012), density based models (Bendler et al, Wang & Gerber, 2015; Wang et al, 2012). 2014a; Cheng & Smyth, 2015; Featherstone, Most of the crime prediction techniques are used 2013a; b), risk terrain modeling (Perry, 2013) or for crime retrospective forecasting, which consider Geographically Weighted Regression (Bendler et the existence of historical crime data. For this al, 2014b) have been used to predict crime approach, quantitative methods were developed to occurrences using geotagged tweets or, in more categorize crime data in objective ways and to find detail, text mining from tweets. The algorithms characteristics such as the type of crime, typology have highly ranked results; however there are not of offender, result of investigation, confidential many explanations about why the accuracy is information using geospatial and statistical changing for different crime or social media techniques, such as hot spot analysis (Eck et al, datasets. As for our knowledge, very few previous 2005), regression, cluster determination or works are considering the effect of events on spatiotemporal pattern recognition. spatial crime distribution while using social media In recent period the crime predictive analytics are in prediction. getting more interdisciplinary. This is also related There is an important body of literature focusing to the “big data” growth, the last decade being the on spatial crime distribution from the events most productive in respect to social media data mirror and on social media during events, such as exploration. Researchers from informatics, big or mega events, sporting events, natural computer science, mathematics and statistics are disasters. However, not so much research attempt collaborating with criminologists, sociologists and has been done before specifically for predicting others in developing new prediction models. planned events considering social media and crime Moreover, the high evolution of the technology is data, at a specific location or at a venue spot and being a very important process in crime analytics also including environmental explanatory variables AGILE 2018 – Lund, June 12-15, 2018 in the models. Population trajectories and their include hot spot analysis, regression methods, data impact on crime likelihood are different according mining and machine learning algorithms, near- to the environmental factors. repeat concept, spatiotemporal analysis and risk Finding attributes from social media that can terrain analysis (Perry, 2013). For a better give a boost in crime prediction models and their prediction algorithms are selected accordingly implementation along with the crime data for a with the research approach. better prediction is the core part of the PhD, with a Crowd based events (high attendance events) are main focus on public events. Three main elements considered attractors and generators of crime. are the base of this PhD research: crime There are studies emphasizing potential occurrences, social media (mostly Twitter data) implications of theories like the routine activity, and events (planned events and emerging events). involved in the hooliganism and violence crime An event can be defined as a matter that happens and the crime pattern theory, related to crime in a place, especially one of importance, such as a increase in specific areas for events such as planned public and social occasion or particular sporting events (Kurland et al, 2014). contests making up a sports competition. The The analyses of crime patterns are the base of planned events are the ones for which their main determining crime displacement, spatially and parameters are defined, such as the location or the temporally. However, there is not a lot of focus on public attendance. The emerging events refer to specific events in the growing field of spatial the ones from which basic elements have the crime predictive analytics. This research aims to ability to develop novel relations and identities adapt and use the already mentioned crime designed into higher-level elements. prediction methods for events. The social media Overall, the spatiotemporal analysis is the base data processing for event analysis and the of this PhD study, managed along with spatial integration of the outcomes in the crime prediction relationships such as distance, connectivity, models may improve the final results. distribution, form, and space between spatial units. The opportunities offered by social media require The study cases will be carefully chosen and the establishment of research methodology for discussed particularly, following a final drawing insights into extraction of information that comparison where an adapted and robust crime can be helpful in many fields, as crime analysis. prediction model for events will be defined. There is a huge volume of data that social media This PhD research aims at filling this gap of the networks offer and it is analyzed in branches like social media integration in spatial crime prediction social sciences, economics, GiScience, computer for different event occurrences. During the PhD science, psychology or philosophy. study I will use the tools to extract, quantify and Key techniques go beyond text analytics to normalize the social media data and attributes that include opinion mining, entity extraction, event can lead to better results in geospatial crime recognition, sentiment analysis, topic modeling, prediction analytics models for different events. social network analysis, trend analysis, and visual Therewith, this research will aim at paving the analytics. The density of words and their way for the usage of multidisciplinary tools and consistency from a lexicon (dictionary) have the integration of the results in geospatial prediction likelihood to define relationships between the data. models that can answer spatial and temporal Therefore, it is still an open field of research patterns in crime analysis. Spatial criminology because of the noisy, unstructured and highly theories will be support of the developed analyses diverse social media data. The analysis of social during my PhD studies. data parameters, not considering the "spatial" component, was performed mostly from a 2 Related work computer and data science point of view. The implementation of social media data in Crime presents an increased strategic complexity crime prediction models started just recently. and interaction with other networks that are not However, crime prediction algorithms were tested necessarily connected. The main categories of in details through studies in the last five years, the prediction models applied in crime applications AGILE 2018 – Lund, June 12-15, 2018 same can be confirmed for prediction algorithms An additional innovative attempt considers the for social media. implication of sentiment analysis by applying One approach for combining social media and lexicon-based methods and of weather parameters, crime data is developed through topic extraction combine with crime data in a kernel density and the connections with crime occurrences. The algorithm (Cheng & Smyth, 2015). For the same 2012 was the first time of bringing the social city, researchers calculated user ranking for the media and crime together in order to make a concept of user credibility and then captured prediction (Wang et al, 2012). Automatic semantic predictive context hidden variables to test in crime analysis and NLP of Twitter data, dimensionality rate trend prediction. reduction through LDA and prediction with linear Past research has already confirmed that crime modeling for hit-and-run crimes in Charlottesville, types distribution show some similarities Virginia represented the earliest research on this throughout different cultures, religions, languages, topic. Another study investigated the possible and socio-economic statuses. However, no integration of rich textual content to predict users research attempt has ever been done before spatial trajectories, followed by the correlation specifically for predicting planned and emerging with crime occurrences in Chicago, IL (Wang & events considering social media and crime data, at Gerber, 2015). different locations and also at a venue spot. A second approach points out the importance of Besides the crime occurrences connected with the social media density. If the social media usage sport events, research shows results in detecting is sufficient in an area of study, it may establish a sport events on Twitter, the public’s overall higher predictive value (Featherstone, 2013a; b). perception of highly ranked events such as the Researchers implemented Twitter data as SuperBowl, and crowd activities related to sport predictors along with archived crime data, which events. Moreover, some researchers are interested resulted in an increase in the prediction for in crowd events such as festivals, concerts, burglaries and robberies (Bendler et al, 2014a). political summits, expos, city traffic, etc. However, the analysis considered just the number Another important type of event considered in of the tweets and the number and crime type. crime research is protests, which can lead to high Twitter data is considered a proxy for ambient crime displacement. Recent theoretical population used in crime rate calculations, background argues that social media may increase showing impact on crime hotspots (Malleson & the occurrence of emerging events, such as Andresen, 2015; 2016). Moreover, other datasets protests. The spatiotemporal variation in the event can be supportive for ambient population intensity can be connected with social media calculations. Considering social media as a activity. On the other hand, the coordination and dynamic variable, it is important to create also a management of the protest activity might be done dynamic population variable (ambient), challenge on social media, and also the social pressure might that would be tested during my PhD development be developed through online announcements. The (Kounadi et al, 2017). limited existing research in this field considers Topic modeling and linguistic analysis of crowd activities related to events as a proxy for spatiotemporal tagged tweets added to crime data crime analysis and prediction. in kernel density estimation at neighborhoods level As discussed before, there is a growing literature resulted in good predictions for the City of that investigates the impact on crime from events Chicago, IL (Gerber, 2014). Through this (sporting events, for example), as well as a research, it was shown that Twitter-derived growing literature that shows how peoples’ attributes improve prediction in 19 from 25 crime behavior on social media changes during types. Acknowledging the importance of the study, (sporting) events. However, there is limited the temporal patterns might be different for a research that investigates the relationship, if longer period of time than the three months dataset present, between events, social media activity, and used. Also the seasonality of crime can affect the criminal events. prediction accuracy. 3 Objectives. Research description AGILE 2018 – Lund, June 12-15, 2018 Prediction of crime incidents can benefit from Datasets: crime, tweets, points of interest, old social media implementation as an exogenous protest data, and socio-economic information predictor and for possibly improving the precision Methods: topic extraction, exponential dispersion of results. The innovative aspect of this research models, logistic regression, crime displacement project will be the integration of social media methods, trajectory analysis. analysis into crime prediction models for specific events and the evaluation of the quality of such 4 Discussion predictions. Three main objectives followed by research questions and shortly presented data and Overall, this dissertation will focus on geospatial methods are in the following rows: crime predictive analysis concerning planned and emerging events analysis through the exploration • Objective 1: examine the relationship of the complex parameters of social media data. between the distribution of crime and social Moreover, the study will explore historical crime media at regularly occurring events data and analyze the correlation between crime RQ1: What is the relationship between specific occurrences and social media data parameters types of events and crime types? (topic, term frequency, emotions). According to RQ2: How can social media predict the diffusion research, there is a tendency of crime prevention of crimes related to the end of events? initiatives to displace crime or diffuse crime Datasets: crime, tweets, points of interest, reduction benefits. The analysis will identify residential population, Landscan population. information from social media that may help Methods: topic extraction, text classification by predict crime related to spatial displacement finding “violent tweets”; heat maps, point pattern regarding the occurrence of an event. Also other analyses, hierarchical clustering (KNN), logistic possible risk factors will be considered. Population regression. data is very important in determining crime rates, so determining population at crime risk will be an • Objective 2: investigate the relationship additional risk factor into the crime prediction between crime occurrences at a venue and models. various event types The distinctive characteristic of this approach RQ1: How does the event type affect crime lies in the use of the three data elements in prediction at a venue? combination with some other information, such as RQ2: How are social media and the number of demographic, to provide a new interpretation of crimes correlated? social media integration in spatial crime prediction Datasets: crime, tweets, points of interest. for different event occurrences. Methods: topic extraction, opinion mining (using Several spatial statistical models will be applied, Naïve Bayes); Gi* (clusters of points with values including, spatial regression analysis for finding higher in magnitude than expected in randomize spatial relationships among crime and social data distributions), Moran's Index I (clustering variables, geographically weighted regression for likelihood), negative binomial logistic regression, point data validation; linear and logistic evaluation using Area under the Curve (AUC). regression; global spatial autocorrelation for finding the degree of dependency among the • Objective 3: explore the adaptability of occurrences in the same geographic space. spatiotemporal techniques in the evaluation The above listed methods will help the of emerging events (protests, riots) evaluation and integration of social media RQ1: How may a spatiotemporal analysis of information in crime analysis and predictive social media help identify emerging events analytics for event based occurrences. There are influencing crime? limitations in respect to the location of social RQ2: How may social media predict crime media data. Because of the rather small percentage related to the spatial displacement of an emerging of the people who use geo-tagging, algorithms to event? improve the locational quality through text mining (the location is extracted from the text) were AGILE 2018 – Lund, June 12-15, 2018 developed. Other limitation may also be the Gerber, M. S. (2014) Predicting crime using quality of the crime data. We have to remember Twitter and kernel density estimation. Decision that these data are collected by humans, so it is Support Systems, 61, 115-125. very difficult to eliminate the bias included in all Kounadi, O., Ristea, A., Leitner, M. & Langford, datasets used in research. C. (2017) Population at risk: using areal As a follow up application of this PhD, the interpolation and Twitter messages to create results may be used for a higher effectiveness of population models for burglaries and robberies. police patrols allocation in a larger area of Cartography and Geographic Information influence, not just on the event location vicinity, Science, 1-15. and also in monitoring emerging events for Kurland, J., Tilley, N. & Johnson, S. D. (2014) negative effects. This would ideally increase The Football ‘Hotspot’Matrix. Football policing efficiency, and prevent damages to public Hooliganism, Fan Behaviour and Crime: property. Contemporary Issues, 21. Malleson, N. & Andresen, M. A. (2015) The 5 References impact of using social media data in crime rate calculations: shifting hot spots and changing Alruily, M. (2012) Using text mining to identify spatial patterns. Cartography and Geographic crime patterns from arabic crime news report Information Science, 42(2), 112-121. corpus. Malleson, N. & Andresen, M. A. (2016) Exploring Bendler, J., Brandt, T., Wagner, S. & Neumann, D. the impact of ambient population measures on (2014a) Investigating crime-to-twitter London crime hotspots. Journal of Criminal relationships in urban environments-facilitating a Justice, 46, 52-63. virtual neighborhood watch. Perry, W. L. (2013) Predictive policing: The role Bendler, J., Ratku, A. & Neumann, D. (2014b) of crime forecasting in law enforcement Crime Mapping through Geo-Spatial Social Media operationsRand Corporation. Activity. Wang, M. & Gerber, M. S. (2015) Using Twitter Burnap, P. & Williams, M. L. (2015) Cyber Hate for Next-Place Prediction, with an Application to Speech on Twitter: An Application of Machine Crime Prediction, Computational Intelligence, Classification and Statistical Modeling for Policy 2015 IEEE Symposium Series on. IEEE. and Decision Making. Policy & Internet. Wang, X., Gerber, M. S. & Brown, D. E. (2012) Cheng, Z. & Smyth, R. (2015) Crime Automatic crime prediction using events extracted Victimization, Neighbourhood Safety and from twitter posts, Social Computing, Behavioral- Happiness in China. Cultural Modeling and PredictionSpringer, 231- Corso, A. J. (2015) Toward Predictive Crime 238. Analysis via Social Media, Big Data, and GIS Spatial Correlation. iConference 2015 Proceedings. Eck, J., Chainey, S., Cameron, J. & Wilson, R. (2005) Mapping crime: Understanding hotspots. Featherstone, C. (2013a) Identifying vehicle descriptions in microblogging text with the aim of reducing or predicting crime, Adaptive Science and Technology (ICAST), 2013 International Conference on. IEEE. Featherstone, C. (2013b) The relevance of social media as it applies in South Africa to crime prediction, IST-Africa Conference and Exhibition (IST-Africa), 2013. IEEE.