<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MigrAnalytics: Entity-based Analytics of Migration Tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mehwish Alam</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Genet Asefa Gesese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zahra Rezaie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harald Sack</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FIZ Karlsruhe</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Karlsruhe Institute of Technology (KIT)</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Leibniz Institute for Information Infrastructure</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This poster focuses on a visual analysis of the tweets related to European migration crisis. It uses TweetsKB as a starting point and then formulates a search criteria for extracting tweets by enriching semantic entities and hashtags starting from the seed word \Refugee". It combines European migration statistics with the information obtained by the tweets and provides visual analysis from di erent perspectives.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph</kwd>
        <kwd>Migration</kwd>
        <kwd>Visual Analytics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Migration related data is one of the most important elements in determining the
patterns causing the ow of migration from source to the host country such as
poor health care system, war, poverty, etc. Moreover, another important aspect
is the sentiments of the citizens living in the host countries. These sentiments,
either negative or positive, could in uence the prospective migrants' decisions to
choose or not to choose the country as a destination. Social media has become
one of the most common platforms where users including experts share their
opinions. However, processing tweets leads to other kind of challenges, i.e., huge
amounts of noisy data are being posted each day which is not processable by
humans leading to the necessity of automated processing.</p>
      <p>
        Some of the studies have targeted this problem from di erent perspectives
such as authors in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] used geo-tagged Twitter data of about 62,000 individuals
for 6 years to estimate a set of US internal migration ows. Their ndings show
the relationship between short-term mobility and long-term migration. Another
study [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] focuses on analyzing the social media for cyber hate towards the
immigrants in Italy by using geo-tagged tweets as well as the o cial statistical data
of Italy (ISTAT). It uses supervised classi cation for detecting hateful tweets.
Another such resource is TweetsKB [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a publicly available huge collection of
Twitter data in RDF format on any topic. It contains more than 1.5 billion
? First three authors contributed equally to this work.
      </p>
      <p>Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
tweets spanning from February 2013 to April 2020. In addition to metadata, the
tweets are annotated with semantic entities as well as sentiment polarities. This
paper introduces a tool for visual analysis of migration related tweets namely
MigrAnalytics3. It uses TweetsKB as a starting point instead of crawling the
whole Twitter data again for the peak migration period, i.e., 2016 and 2017.
It then formulates search criteria in TweetsKB by creating and enriching a set
of entities and hashtags starting from the single seed word \Refugee" and then
further combines European migration statistics with the information obtained
via the selected tweets followed by visual analysis from di erent aspects.
2</p>
    </sec>
    <sec id="sec-2">
      <title>MigrAnalytics</title>
      <p>MigrAnalytics follows a three step approach: (a) Extracting Entities and
Hashtags, (b) Query Formulation and Migration Tweet Filtering, and (c)
Entitybased Visual Analytics.
2.1</p>
      <sec id="sec-2-1">
        <title>Extracting Entities and Hashtags</title>
        <p>3 https://ise-fizkarlsruhe.github.io/MigrAnalytics/
4 @prefix dbr: &lt;http://dbpedia.org/resource/&gt;.
5 https://wordnet.princeton.edu/
6 https://en.wikipedia.org/
Wikipedia page titles, pre-trained word2vec embeddings are utilized for
computing cosine similarity between the seed word \refugee" and Wikipedia page titles.
In pre-processing step only alphanumeric characters are kept and then lowercase
conversion, stop words removal, and lemmatization are applied to page titles.
The similarity threshold was chosen to be 0:5, which led to the selection of 50%
of page titles (28 pages out of 56) at depth 1. For depths 2 to 5, percentage
of similar page titles are 19%, 7%, 3.6%, and 1% (20 pages) respectively. For
depths 6, 7, 8, number of pages with similarity greater than 0.5 is only 2, 2,
and 0 respectively. Thus, Wikipedia page titles up to depth 5 has been chosen.
Finally, these Wikipedia pages are mapped to corresponding DBpedia entities.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Query Formulation and Migration Tweet Filtering</title>
        <p>Based on the entities and seed words extracted as described previously, SPARQL
queries are formulated for extracting the tweets from TweetsKB. Table 1 shows
the statistics of the extracted tweets. #tweets is the number of tweets extracted
for each year, #entities is the number of entities contained in those tweets as
annotated in TweetsKB, and nally #hashtags is the number of hashtags contained
in the extracted tweets.</p>
        <p>Total (2016) Distinct (2016) Total (2017) Distinct (2017)
#tweets 197,813 197,813 208,492 208,492
#entities 340,694 23,261 371,944 24,009
#hashtags 238,545 29,756 172,327 28,135</p>
        <p>Table 1. Statistics of the information extracted from TweetsKB.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Entity-based Visual Analytics</title>
        <p>Various plots are used to visualize the interactions between the number of tweets
regarding refugees along with the hashtags and entities. It also considers the
relationship between the tweets extracted in the previous steps and the number
of asylum applications during the period of peak migration crisis7.</p>
        <p>The total number of rst time asylum applications in EU28 in year 2016 and
2017 were 1,204,280, and 649,855 respectively8. Monthly gures for each year
were rather steady; however, in 2016 EU received almost twice as many monthly
applications as in 2017.</p>
        <p>First, the top 20 entities and hashtags in terms of number of occurrences are
selected separately for each year. Then, these entities and hashtags are ranked
and depicted based on their frequencies on a weekly basis. Among the top 20
entities and hashtags for the year 2016, 7 and 6 of them are terms that
cooccurred with the keywords used in the query, respectively. They include
relevant countries, politicians, political events, and so on. For example, the term
United Kingdom withdrawl from the Europen Union appears as an entity and
#brexit as a hashtag. Both of them refer to the same political event during 2016
which could indicate that Brexit has a signi cant impact on migrant crisis
matter. Among the top 20 entities and hashtags for the year 2017, 7 and 9 of them
are terms co-occurred with the keywords used in the query, respectively. Several
of these co-occurring terms are related to US political issues regarding migrants,
e.g., Executive order, Deferred Action for Childhood Arrivals or its equivalent
hashtag #daca, #nobannowall, and #muslimban. Finally, in order to plot a
word cloud of entities and hashtags, top 100 of them (in terms of frequency)
were chosen over the course of each week. For example, as shown in the plot,
\Immigration" and \Refugee" are some of the words which are among the most
frequent entities and hashtags.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Discussion and Perspectives</title>
      <p>The current study provides an entity-based analysis over the migration related
tweets by using European Migration Statistics. As a perspective, the experts
related to migrations will be determined on social media and analysis of their
views on factors causing migration will be performed. Moreover, the full text of
the tweets will also be processed for extraction and analysis purposes.
7 These visualizations are shown on the associated homepage.
8 https://ec.europa.eu/eurostat</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Fafalios</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iosi</surname>
            <given-names>dis</given-names>
          </string-name>
          , V.,
          <string-name>
            <surname>Ntoutsi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dietze</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>TweetsKB: A public and large-scale rdf corpus of annotated tweets</article-title>
          .
          <source>In: Extended Semantic Web Conference (ESWC'18)</source>
          . Heraklion, Crete, Greece, (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Fiorio</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abel</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zagheni</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinue</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Using Twitter data to estimate the relationship between short-term mobility and long-term migration</article-title>
          .
          <source>In: WebSci</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Florio</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lai</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Leveraging Hate Speech Detection to Investigate Immigration-related Phenomena in Italy</article-title>
          .
          <source>In: 8th International Conference on A ective Computing and Intelligent Interaction Workshops and Demos (ACIIW)</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isele</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jentzsch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morsey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Kleef</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>Dbpedia{a large-scale, multilingual knowledge base extracted from wikipedia</article-title>
          .
          <source>Semantic web 6(2)</source>
          ,
          <volume>167</volume>
          {
          <fpage>195</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Pedersen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patwardhan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michelizzi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Wordnet:: Similarity-measuring the relatedness of concepts</article-title>
          .
          <source>In: AAAI</source>
          . vol.
          <volume>4</volume>
          , pp.
          <volume>25</volume>
          {
          <issue>29</issue>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>