<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Enrichment of Mobile Phone Data Records Using Linked Open Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zolzaya Dashdorj</string-name>
          <email>dashdorj@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luciano Serafini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SKILL, Telecom Italia and DKM, Fondazione Bruno Kessler University of Trento, ICT International Doctoral School</institution>
          ,
          <addr-line>Trento</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Users of mobile (smart) phones, generate an enormous amount of data every day. Most of them are not accessible due to privacy reason, but anonimized metadata, such as for instance, the location, the time and the duration of the interaction with the smartphone are nowadays available for analysis. We address these data as “Call Data Records” (CDR). CDR metadata constitute an important source of information for investigating on general human behavior, such as mobility [5, 3, 6], and communication patterns [9, 2, 4, 1, 8]. Currently, most of the analyses provide a quantitative description of human behavior, which is presented via visual analytics techniques. The outcome of these analysis are usually quantitative models estimating for instance, the number of people present in a certain area at a certain time, the number of people who moves from point a to point b within a certain period, and so on. Less interest has been dedicated to the creation of model that describe human activity at qualitative/semantic level, i.e., in terms of semantically rich concepts in order to estimate/predict for instance, the actions performed by a group of people in a certain situation or the type of event is happening around in a certain location, on the basis of the CDR. The analyses presented in [11, 10, 1, 5, 3] need to be extended making use of a relevant knowledge of the context of the user. User contextual information includes all those information describing the objects present and the events happening in the place and at the time that the user is interacting with his/her smartphone. Contextual information includes information about the territory (e.g., points of interests - POIs), weather conditions, public and private events (e.g., concerts, sport events, public spontaneous meeting etc) and emergency events (accidents, strikes, etc.), transportation schedule, energy or water consumption, etc. In this paper, we propose a first step of investigation developing an ontological and statistical model (HRBModel) capable to predict the possible human activities in a certain place and on a certain time on the basis of the contextual information describing the POI's of the place and information about the time of the day at which certain actions are usually performed. POI's are taken from Openstreetmap1. The model enables early identification of standard type of hetergenous human activities in various geographical area profiles and at different times in which CDR occur.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>We are interested in predicting the most probable human activity of a user when he/she
is in a specific geographical area at a given time. We develop our experiment
consider1 http://www.openstreetmap.org/
ing the area of Trento - Italy, but the process and the software is a general enough to be
applicable to any geographical area.</p>
      <p>Details about the experiment are described in the following.</p>
      <p>We divide the geographical area of the city into sub-divisions in a way that POIs
are uniformly distributed over the subareas (see Figure 1(a)). In each sub area, POIs
are extracted using geographical tools like OSM2PGSQL. We do not consider the POIs
which don’t derive human activities, such as benches, towers, emergency phones, etc.
In total for this city, we extracted 333,809 POIs from the OSM map which cleaned to
159,314.</p>
      <p>To annotate the POIs with the human activities, we propose a Human Behavioral
Ontology (HBOnto) that with the help of the OSMOnto ontology[7], associates POIs
with all the human activities that can be performed or hosted there or nearby. The human
activities are hierarchically organized from specific activities to 10 high level categories,
that around 220 human activities. For this association, we take day and time into account
as all the activitiies are connected to a day-time range of validity. A stochastic behavior
model (SBM) we propose that estimates the probability of human activities given the
location and time of an event, on the basis of the ontological model as follow:</p>
      <p>P(ajt; l) = P (ajt) P (ajl) ( t, l are independent)</p>
      <p>As an initial step, we manually created a fuzzy model for P (ajt) that performs
reasoning on the importance of activities given a time that is estimated as follow:
P(ajt) =</p>
      <p>F uzzyActivity(a; t)</p>
      <p>
        F uzzyT ime(t)
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
      </p>
      <p>We collected the user-data2 through the experiment application described above (
see Figure 1(b) ) for one week with 32 partipants involved (see Table 1). It emerged
that most of the user feedback comes from the areas of (Trento-Povo and
TrentoDowntown).</p>
      <p>Every user’s feedback is collected in a record containing: the latitude/longitude of
the location selected on the map, the radius of the area, the selected activity (among
top-5 or one freely chosen), the semantic day and time (e.g., weekday, saturday.., early
morning, mid morning.. etc), and the current time of the system.
2 http://dkmlab.fbk.eu:8080/BHRModelTest/data/semantic data BHRModel2013.csv</p>
      <p>P (ajl) is estimated based the importance of the activities given a location using
TF/IDF for the POIs importance that derives a weight to the activities as follow:
tf
idf(f; l) =</p>
      <p>N (f; l)
argmaxfN (w; l) : w 2 lg
w
log</p>
      <p>L
j j
jfl 2 L : f 2 lgj</p>
      <p>To avoid the spatial gap, P (ajl) can be extended if we consider the nearby activities
in a given radius r of a circular area around the location.</p>
      <p>P (ajl; r) =
argmaxfW (a; li)</p>
      <p>a
P argmaxfW (a; li)
a2r a
ig
ig</p>
      <p>n
: r \ li
i=1
Given the collected data, we measured the accuracy of the HRBModel considering the
correctly predicted activities w/regardless of the ranking position. We also analyzed the
divergence of the probability activities comparing to the probability from the feedback
in the areas with the highest number of feedback: Trento-Downtown and Trento-Povo.
The example of the results, Figure 2(a) and 2(b) describe the divergence of the
probability activities that occur in those areas on weekday mornings and afternoons, in which
we propogated the probability activity to the child activities. In these figures, the
activity indexing (x-axis) is different in each area, y-axis is the probability associated to the
activities. The figures show that the probability activities from the user feedback can
still follow the trend of the probability from our model.
4</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusion and Future works</title>
      <p>Within the 481 users’ feedback collected, 341 activities have been correctly predicted,
corresponding to an overall accuracy of 70.89%. The overall accuracy of a correct
activity prediction (among the top-5) corresponds to 61.95%. We have done the evaluation
considering the high level (parent) activities, the overall accuracy has been increased to
80.23%. We showed the divergence between the probability activities in our model
compared to the probability from the feedback by various locations and times that can be
further studied in order to understand the correlation between human activities and
contexts. We will extend the evaluation of the model making use of mobile phone survey,
(a) Trento-Downtown area on weekday
mornings
(b) Trento-Povo area on weekday
afternoons
crowd sourcing, and social networks (e.g., twitter, foursquare ). The proposed model
will be a baseline to characterize geographical areas by activity of interests in the areas
where CDRs are occurred. This allows identification and prediction of human
behaviors by various area profiles (e.g., business, shopping, or leisure areas etc) in certain
contextual conditions and detection of anomalous behavioral events.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>B.</given-names>
            <surname>Furletti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gabrielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Renso</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Rinzivillo</surname>
          </string-name>
          .
          <article-title>Identifying users profiles from mobile calls habits</article-title>
          .
          <source>In the Proc. of the ACM SIGKDD Int.Workshop on Urban Computing, UrbComp '12</source>
          , pages
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Ratti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sobolevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Calabrese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Andris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reades</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Claxton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.H.</given-names>
            <surname>Strogatz</surname>
          </string-name>
          .
          <article-title>Redrawing the map of great britain from a network of human interactions</article-title>
          .
          <source>PLoS ONE</source>
          ,
          <volume>5</volume>
          (
          <issue>12</issue>
          ):
          <year>e14248</year>
          ,
          <year>12 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>F.</given-names>
            <surname>Calabrese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.C.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.Di</given-names>
            <surname>Lorenzo</surname>
          </string-name>
          , L.Liu, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Ratti</surname>
          </string-name>
          .
          <article-title>The geography of taste: analyzing cell-phone mobility and social events</article-title>
          .
          <source>In the Proc. of the 8th Intl.Conf. on Pervasive Computing, Pervasive'10</source>
          , pages
          <fpage>22</fpage>
          -
          <lpage>37</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Candia</surname>
          </string-name>
          , M.C.Gonza´lez, P.Wang,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schoenharl</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Madey, and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Baraba</surname>
          </string-name>
          <article-title>´si. Uncovering individual and collective human dynamics from mobile phone records</article-title>
          .
          <source>Journal of Physics A: Mathematical and Theoretical</source>
          ,
          <volume>41</volume>
          (
          <issue>22</issue>
          ):
          <fpage>224015</fpage>
          ,
          <year>June 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.P.</given-names>
            <surname>Bagrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A.</given-names>
            <surname>Baraba</surname>
          </string-name>
          <article-title>´si. Collective response of human populations to largescale emergencies</article-title>
          .
          <source>CoRR, abs/1106.0560</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>M.C.Gonzalez</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          <string-name>
            <surname>Hidalgo</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Barabasi</surname>
          </string-name>
          .
          <article-title>Understanding individual human mobility patterns</article-title>
          .
          <source>Nature</source>
          ,
          <volume>453</volume>
          (
          <issue>7196</issue>
          ):
          <fpage>779</fpage>
          -
          <lpage>782</lpage>
          ,
          <year>June 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Codescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Horsinka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mossakowski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rau</surname>
          </string-name>
          .
          <article-title>Osmonto - an ontology of openstreetmap tags</article-title>
          .
          <source>In State of the map Europe (SOTM-EU)</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Eagle and (Sandy) A.Pentland. Reality mining: sensing complex social systems</article-title>
          .
          <source>Personal Ubiquitous Comput.</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <fpage>255</fpage>
          -
          <lpage>268</lpage>
          ,
          <year>March 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Onnela</surname>
          </string-name>
          , J. Sarama¨ki, J. Hyvo¨nen, G. Szabo´,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lazer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kaski</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Kerte´sz, and</article-title>
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Baraba</surname>
          </string-name>
          <article-title>´si. Structure and tie strengths in mobile communication networks</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          ,
          <volume>104</volume>
          (
          <issue>18</issue>
          ):
          <fpage>7332</fpage>
          -
          <lpage>7336</lpage>
          , May
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. S.Phithakkitnukoon,
          <string-name>
            <given-names>T.</given-names>
            <surname>Horanont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Lorenzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shibasaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Ratti</surname>
          </string-name>
          .
          <article-title>Activity-aware map: identifying human daily activity pattern using mobile phone data</article-title>
          .
          <source>In the Proc. of the 1st Intl. Conf. Human Behavior Understanding</source>
          , pages
          <fpage>14</fpage>
          -
          <lpage>25</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dashdorj</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          .
          <article-title>Semantic interpretation of mobile phone records exploiting background knowledge</article-title>
          .
          <source>In Intl.Conf. Semantic Web Conference (ISWC</source>
          <year>2013</year>
          ), Doctoral Consortium,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>