Introduction

MigrAnalytics: Entity-based Analytics of Migration Tweets

Mehwish Alam

0 1

Genet Asefa Gesese

0 1

Zahra Rezaie

0 1

Harald Sack

0 1 0 FIZ Karlsruhe 1 Karlsruhe Institute of Technology (KIT) , Karlsruhe , Germany 2 Leibniz Institute for Information Infrastructure , Germany

This poster focuses on a visual analysis of the tweets related to European migration crisis. It uses TweetsKB as a starting point and then formulates a search criteria for extracting tweets by enriching semantic entities and hashtags starting from the seed word \Refugee". It combines European migration statistics with the information obtained by the tweets and provides visual analysis from di erent perspectives.

Knowledge Graph Migration Visual Analytics

Introduction

Migration related data is one of the most important elements in determining the patterns causing the ow of migration from source to the host country such as poor health care system, war, poverty, etc. Moreover, another important aspect is the sentiments of the citizens living in the host countries. These sentiments, either negative or positive, could in uence the prospective migrants' decisions to choose or not to choose the country as a destination. Social media has become one of the most common platforms where users including experts share their opinions. However, processing tweets leads to other kind of challenges, i.e., huge amounts of noisy data are being posted each day which is not processable by humans leading to the necessity of automated processing.

Some of the studies have targeted this problem from di erent perspectives such as authors in [ 2 ] used geo-tagged Twitter data of about 62,000 individuals for 6 years to estimate a set of US internal migration ows. Their ndings show the relationship between short-term mobility and long-term migration. Another study [ 3 ] focuses on analyzing the social media for cyber hate towards the immigrants in Italy by using geo-tagged tweets as well as the o cial statistical data of Italy (ISTAT). It uses supervised classi cation for detecting hateful tweets. Another such resource is TweetsKB [ 1 ], a publicly available huge collection of Twitter data in RDF format on any topic. It contains more than 1.5 billion ? First three authors contributed equally to this work.

Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). tweets spanning from February 2013 to April 2020. In addition to metadata, the tweets are annotated with semantic entities as well as sentiment polarities. This paper introduces a tool for visual analysis of migration related tweets namely MigrAnalytics3. It uses TweetsKB as a starting point instead of crawling the whole Twitter data again for the peak migration period, i.e., 2016 and 2017. It then formulates search criteria in TweetsKB by creating and enriching a set of entities and hashtags starting from the single seed word \Refugee" and then further combines European migration statistics with the information obtained via the selected tweets followed by visual analysis from di erent aspects. 2

MigrAnalytics

MigrAnalytics follows a three step approach: (a) Extracting Entities and Hashtags, (b) Query Formulation and Migration Tweet Filtering, and (c) Entitybased Visual Analytics. 2.1

Extracting Entities and Hashtags

3 https://ise-fizkarlsruhe.github.io/MigrAnalytics/ 4 @prefix dbr: <http://dbpedia.org/resource/>. 5 https://wordnet.princeton.edu/ 6 https://en.wikipedia.org/ Wikipedia page titles, pre-trained word2vec embeddings are utilized for computing cosine similarity between the seed word \refugee" and Wikipedia page titles. In pre-processing step only alphanumeric characters are kept and then lowercase conversion, stop words removal, and lemmatization are applied to page titles. The similarity threshold was chosen to be 0:5, which led to the selection of 50% of page titles (28 pages out of 56) at depth 1. For depths 2 to 5, percentage of similar page titles are 19%, 7%, 3.6%, and 1% (20 pages) respectively. For depths 6, 7, 8, number of pages with similarity greater than 0.5 is only 2, 2, and 0 respectively. Thus, Wikipedia page titles up to depth 5 has been chosen. Finally, these Wikipedia pages are mapped to corresponding DBpedia entities. 2.2

Query Formulation and Migration Tweet Filtering

Based on the entities and seed words extracted as described previously, SPARQL queries are formulated for extracting the tweets from TweetsKB. Table 1 shows the statistics of the extracted tweets. #tweets is the number of tweets extracted for each year, #entities is the number of entities contained in those tweets as annotated in TweetsKB, and nally #hashtags is the number of hashtags contained in the extracted tweets.

Total (2016) Distinct (2016) Total (2017) Distinct (2017) #tweets 197,813 197,813 208,492 208,492 #entities 340,694 23,261 371,944 24,009 #hashtags 238,545 29,756 172,327 28,135

Table 1. Statistics of the information extracted from TweetsKB. 2.3

Entity-based Visual Analytics

Various plots are used to visualize the interactions between the number of tweets regarding refugees along with the hashtags and entities. It also considers the relationship between the tweets extracted in the previous steps and the number of asylum applications during the period of peak migration crisis7.

The total number of rst time asylum applications in EU28 in year 2016 and 2017 were 1,204,280, and 649,855 respectively8. Monthly gures for each year were rather steady; however, in 2016 EU received almost twice as many monthly applications as in 2017.

First, the top 20 entities and hashtags in terms of number of occurrences are selected separately for each year. Then, these entities and hashtags are ranked and depicted based on their frequencies on a weekly basis. Among the top 20 entities and hashtags for the year 2016, 7 and 6 of them are terms that cooccurred with the keywords used in the query, respectively. They include relevant countries, politicians, political events, and so on. For example, the term United Kingdom withdrawl from the Europen Union appears as an entity and #brexit as a hashtag. Both of them refer to the same political event during 2016 which could indicate that Brexit has a signi cant impact on migrant crisis matter. Among the top 20 entities and hashtags for the year 2017, 7 and 9 of them are terms co-occurred with the keywords used in the query, respectively. Several of these co-occurring terms are related to US political issues regarding migrants, e.g., Executive order, Deferred Action for Childhood Arrivals or its equivalent hashtag #daca, #nobannowall, and #muslimban. Finally, in order to plot a word cloud of entities and hashtags, top 100 of them (in terms of frequency) were chosen over the course of each week. For example, as shown in the plot, \Immigration" and \Refugee" are some of the words which are among the most frequent entities and hashtags. 3

Discussion and Perspectives

The current study provides an entity-based analysis over the migration related tweets by using European Migration Statistics. As a perspective, the experts related to migrations will be determined on social media and analysis of their views on factors causing migration will be performed. Moreover, the full text of the tweets will also be processed for extraction and analysis purposes. 7 These visualizations are shown on the associated homepage. 8 https://ec.europa.eu/eurostat

1. Fafalios , P. , Iosi

dis

, V., Ntoutsi , E. , Dietze , S.: TweetsKB: A public and large-scale rdf corpus of annotated tweets . In: Extended Semantic Web Conference (ESWC'18) . Heraklion, Crete, Greece, ( 2018 )

2. Fiorio , L. , Abel , G. , Cai , J. , Zagheni , E. , Weber , I. , Vinue , G. : Using Twitter data to estimate the relationship between short-term mobility and long-term migration . In: WebSci ( 2017 )

3. Florio , K. , Basile , V. , Lai , M. , Patti , V. : Leveraging Hate Speech Detection to Investigate Immigration-related Phenomena in Italy . In: 8th International Conference on A ective Computing and Intelligent Interaction Workshops and Demos (ACIIW) ( 2019 )

4. Lehmann , J. , Isele , R. , Jakob , M. , Jentzsch , A. , Kontokostas , D. , Mendes , P.N. , Hellmann , S. , Morsey , M. , Van Kleef , P. , Auer , S. , et al.: Dbpedia{a large-scale, multilingual knowledge base extracted from wikipedia . Semantic web 6(2) , 167 { 195 ( 2015 )

5. Pedersen , T. , Patwardhan , S. , Michelizzi , J. , et al.: Wordnet:: Similarity-measuring the relatedness of concepts . In: AAAI . vol. 4 , pp. 25 { 29 ( 2004 )