<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Spatio-temporal knowledge discovery from georeferenced mobile phone data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yihong Yuan</string-name>
          <email>yuan@geog.ucsb.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Raubal</string-name>
          <email>raubal@geog.ucsb.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Geography, University of California</institution>
          ,
          <addr-line>Santa Barbara, USA, 93106</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>121</fpage>
      <lpage>126</lpage>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Information and communication technologies (ICTs), such as mobile phones and the
Internet, are increasingly pervasive in modern society. These technologies provide
greater flexibility regarding when, where, and how to travel. Understanding the
influence of ICTs in our current mobile information society will be essential for
updating environmental policies, and maintaining sustainable mobility and
transportation
        <xref ref-type="bibr" rid="ref4">(De Souza e Silva 2007)</xref>
        . Moreover, ICTs have provided a wide range
of spatio-temporal data sources, which can be used for geographic knowledge
discovery and data mining in studies on geographic dynamics, such as human travel
behavior and mobility patterns
        <xref ref-type="bibr" rid="ref10 ref12 ref8">(Song et al. 2010; Yuan 2009; Miller 2009)</xref>
        . There
have been several studies focusing on extracting spatio-temporal data from
georeferenced mobile phone data. For example, Ahas’ social positioning method
(SPM) combines both location data and social attributes of mobile phone users to
study the dynamics of urban systems
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref7">(Ahas and Mark 2005, Ahas et al. 2007)</xref>
        .
        <xref ref-type="bibr" rid="ref5">Gonzalez et al. (2008)</xref>
        studied the individual trajectory of 100,000 mobile phone users
based on tracked location data, providing new insights to understanding the basic law
of human motion.
      </p>
      <p>
        As a generalized research frame,
        <xref ref-type="bibr" rid="ref8">Miller (2009)</xref>
        discussed five major tasks in
geographic data mining and knowledge discovery: spatial classification and capturing
spatial dependency, spatial segmentation and clustering, spatial trends, spatial
generalization, and spatial association. Traditional geographic knowledge discovery
mainly focuses on obtaining new knowledge from a relatively comprehensive dataset,
such as extracting movement patterns based on high resolution trajectories. However,
several spatio-temporal datasets (e.g., georeferenced mobile phone data), only provide
incomplete data with relatively low resolution and few individual attributes. Therefore,
it is important to determine how much and to what extent we can extract knowledge
from sparse data sources, as well as dealing with uncertainty in incomplete datasets.
In this paper, we will provide a framework of extracting spatio-temporal knowledge
in a typical georeferenced mobile phone dataset. This will be helpful in updating the
research tasks of geographic knowledge discovery in the age of instant access.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>This research utilizes a dataset from Harbin City, China. Harbin city is a major
commercial, industrial, and transportation center situated in northeast China. It was
ranked as one of the top ten populated cities in China in the year 2009. The dataset
covers over one million people from Harbin city, including mobile phone connection
records for a time span of 9 days (07/21/07-07/29/07). The data include the time,
duration, and location1 of mobile phone connections, as well as the age and gender
attributes of the users. Moreover, it provides the phone number and city code2 of the
other end of each phone call. Note that the location records in the dataset cannot
represent the accurate moving trajectory of each user, since the locations are recorded
only when there is a phone call connection. However, based on a summary of 9 days’
records, the data are still useful for depicting the general characteristics of individual
travel mobility.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Extracting spatio-temporal knowledge in georeferenced mobile phone data</title>
      <p>Generally, the dataset provides us with three types of directly recorded information
for each mobile phone user:
1) Cell phone usage
2) Social attributes (age, gender)
3) Spatio-temporal points within a given time span
All the data mining and knowledge discovery tasks are based on the combination and
interaction of the above three information categories. Moreover, since urban systems
are considered organized aggregations of human settlements, we can also obtain
inferential knowledge for the city based on the behavior of associated citizens, such as
indentifying spatial cluttering of traffic in different urban areas. Therefore, the
research questions are divided into two categories: individual-oriented research and
urban-oriented research.
1 For each user, the location of the nearest cell phone tower is recorded both when the user makes and
receives a phone call. Since the towers are located every 300m-500m in the city, the location accuracy
is about 300m-500m.</p>
      <p>2 The city code indicates in which city the other side of a particular phone call is located.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1 Individual-oriented research</title>
      <p>
        3.1.1 Analysis of movement patterns
Although human activity is potentially random and irregular, there still exist
identifiable patterns in every person’s life. It is easier to predict the behavior of people
who travel little than those who travel more frequently
        <xref ref-type="bibr" rid="ref5">(Gonzalez et al. 2008)</xref>
        .
(1) Trajectory patterns: Based on the scattered spatio-temporal points provided in
the dataset, it is possible to identify spatio-temporal paths through interpolation
methods. However, this method may not be applicable for every user, since we
need a certain number of well-distributed points to create a user trajectory. Based
on trajectories, it is feasible to study the patterns (e.g., clustering, periodicity,
predictivity) of individual trajectories. Figure 1 depicts the activity path for a
specific individual during a week. Furthermore, we can identify particular
patterns for population groups divided by social attributes.
(2) Point patterns (points of interest - POIs): Based on the analysis of trajectory
patterns, the POIs associated with each individual can be extracted by applying
predefined discovery rules. POI discovery provides a method to obtain more
sufficient user attributes in an incomplete dataset. For example, it is feasible to
extract the work and home locations, regular entertainment places, and other POIs
for each mobile phone user.
(3) Correlation between mobile phone usage and movement patterns: Previous
studies have focused on the interaction between ICT and human activity-travel
behavior. However, due to the lack of sufficient data and the complicated nature
of the interaction, there is still a continuing debate on how it works in everyday
life. Based on the phone usage information and social attributes, we examined
how population heterogeneity impacts the relationship between mobile phone
usage and individual movement radius. We argue that the heterogeneity of the
population should be taken into account when analyzing the correlation between
ICT and human activity-travel behavior. Therefore it is important to specify
individual attributes (e.g., cultural, social, institutional, physical aspects) when
investigating this problem. The mobile phone usage and travel behavior correlate
differently among various social groups. Therefore, a general conclusion for the
population is insufficient to represent the complicated nature of this problem.
Future research should focus on studying the correlation between phone usage
and trajectory patterns. This would provide more specific results on how mobile
phone usage impacts an individual’s daily life, as well as offering references to
policy makers.
3.1.2 Analysis of social networks
        <xref ref-type="bibr" rid="ref6">Janelle (1995)</xref>
        introduced four types of communication modes based on different
spatio-temporal constraints: Synchronous Presence (SP), Asynchronous Presence
(AP), Synchronous Tele-presence (ST), and Asynchronous Tele-presence (AT).
Communication often occurs within members of a particular social network.
Therefore, geographic mobility and cell phone usage could both be considered as
connections in the same social network. Traditional research questions include the
prediction and inference of network topology, the flow of information, and the
interaction among networks. For instance,
        <xref ref-type="bibr" rid="ref9">Pultar and Raubal (2009)</xref>
        studied the
integration of social, transportation, and data networks. Particularly, in the case of
georeferenced mobile phone data, a potential research question would be the
combination, differentiation, and interaction of social networks associated with
different communication modes.
3.2 Urban-oriented research
Cities are complex systems constituted by a myriad of processes and elements (Batty
2005). Since individuals are atoms in an urban system, the spatio-temporal
characteristics of an urban system could be viewed as a conceptual generalization of
individual behavior.
(1) Spatial division: Mobile communications might alter the traditional spatial
division of urban spaces
        <xref ref-type="bibr" rid="ref7">(Kwan et al. 2007)</xref>
        , resulting in a change in urban
planning and transportation systems. Potential research questions include the
involvement of city structure under the impact of ICT
        <xref ref-type="bibr" rid="ref11">(Torrens 2008)</xref>
        , the
comparison of “phone usage flow maps” and “travel flow maps”, etc.
(2) Spatial clustering: Clustering refers to different types of hotspots in the city, for
example, the hotspots of mobile phone usage, traffic jams, or night life (Figure 2).
These patterns can be found on a range of timescales: from daily scale to yearly
scale. The study of hotspot clustering patterns would be helpful for constructing a
more efficient urban system.
(3) Spatial central tendency and spread: Central tendency refers to a “middle”
value or a typical value of the distribution. For a given city, the central tendency
of spatio-temporal behavior reflects various characteristics of the urban system.
For example, Figure 3 shows the distribution of individual travel radius in Harbin
city. As can be seen, the movement radius of most people is around 3km.
However, if we switch to another target city, will the distribution be similar to this
one? Therefore, it would be interesting to find the correlation between travel
distance distribution and the attributes (area, structure, population, average phone
usage) of a given city.
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>Spatio-temporal knowledge discovery has gained wide attention due to the increasing
availability of georeferenced data. Much progress has been made regarding the
theories, methodologies, and applications in this field. In this research, we focus on
extracting spatio-temporal knowledge in a restricted and sparse dataset (mobile phone
data). A framework of potential research questions is provided as two parts:
individual-oriented research and urban-oriented research. This also provides a
guideline for our future research on extracting spatio-temporal knowledge in
incomplete datasets.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Ahas R</given-names>
            and
            <surname>Mark</surname>
          </string-name>
          <string-name>
            <surname>Ü</surname>
          </string-name>
          ,
          <year>2005</year>
          ,
          <article-title>Location based services - new challenges for planning and public administration</article-title>
          .
          <source>Futures</source>
          ,
          <volume>37</volume>
          (
          <issue>6</issue>
          ):
          <fpage>547</fpage>
          -
          <lpage>561</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ahas</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aasa</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slim</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aunap</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalle</surname>
            <given-names>H</given-names>
          </string-name>
          and
          <string-name>
            <surname>Mark</surname>
            <given-names>Ü</given-names>
          </string-name>
          ,
          <year>2007</year>
          , Mobile Positioning in Space - Time
          <source>Behaviour Studies: Social Positioning Method Experiments in Estonia. Cartography and Geographic Information Science</source>
          ,
          <volume>34</volume>
          (
          <issue>4</issue>
          ):
          <fpage>259</fpage>
          -
          <lpage>273</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Batty</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <year>2007</year>
          ,
          <article-title>Cities and complexity</article-title>
          . MIT Press, Cambridge, MA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>De Souza e Silva</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <year>2007</year>
          ,
          <article-title>Mobile phones and places: The use of mobile technologies in Brazil</article-title>
          . In: Miller HJ (ed),
          <article-title>Societies and cities in the age of instant access</article-title>
          , Dortdrecht, The Netherlands, Springer,
          <fpage>295</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Gonzalez</surname>
            <given-names>MC</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hidalgo</surname>
            <given-names>CA</given-names>
          </string-name>
          , et al.,
          <year>2008</year>
          ,
          <article-title>Understanding individual human mobility patterns</article-title>
          .
          <source>Nature</source>
          ,
          <volume>453</volume>
          (
          <issue>7196</issue>
          ):
          <fpage>779</fpage>
          -
          <lpage>782</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Janelle</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <year>1995</year>
          ,
          <article-title>Metropolitan expansion, telecommuting, and transportation</article-title>
          . In: Hanson S (ed),
          <source>The Geography of Urban Transportation</source>
          , New York, The Guilford Press,
          <fpage>407</fpage>
          -
          <lpage>434</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Kwan</surname>
            <given-names>MP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dijst</surname>
            <given-names>M</given-names>
          </string-name>
          and
          <string-name>
            <surname>Schwanen</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <year>2007</year>
          ,
          <article-title>The interaction between ICT and human activity-travel behavior</article-title>
          .
          <source>Transportation Research Part A-Policy and Practice</source>
          ,
          <volume>41</volume>
          (
          <issue>2</issue>
          ):
          <fpage>121</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Miller</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <year>2009</year>
          ,
          <article-title>Geographic data mining and knowledge discovery: An overview</article-title>
          .
          <source>In: Miller HJ and Han J (eds)</source>
          ,
          <source>Geographic Data Mining and Knowledge Discovery (Second Edition)</source>
          , London, CRC Press,
          <fpage>3</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Pultar E</given-names>
            and
            <surname>Raubal</surname>
          </string-name>
          <string-name>
            <surname>M</surname>
          </string-name>
          ,
          <year>2009</year>
          ,
          <string-name>
            <given-names>Progressive</given-names>
            <surname>Tourism</surname>
          </string-name>
          :
          <article-title>Integrating Social, Transportation, and Data Networks</article-title>
          . In: Sharda N (ed),
          <source>Tourism Informatics: Visual Travel Recommender Systems, Social Communities, and User Interface Design</source>
          , Hershey, PA, USA, IGI Global,
          <volume>145</volume>
          -
          <fpage>159</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Song</surname>
            <given-names>CM</given-names>
          </string-name>
          and
          <string-name>
            <surname>Qu</surname>
            <given-names>ZH</given-names>
          </string-name>
          , et al.,
          <year>2010</year>
          ,
          <article-title>Limits of Predictability in Human Mobility</article-title>
          .
          <source>Science</source>
          ,
          <volume>327</volume>
          (
          <issue>5968</issue>
          ):
          <fpage>1018</fpage>
          -
          <lpage>1021</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Torrens</surname>
            <given-names>PM</given-names>
          </string-name>
          ,
          <year>2008</year>
          ,
          <article-title>Wi-Fi geographies</article-title>
          .
          <source>Annals of the Association of American Geographers</source>
          ,
          <volume>98</volume>
          (
          <issue>1</issue>
          ):
          <fpage>59</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Yuan</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <year>2009</year>
          ,
          <article-title>Toward Knowledge Discovery about Geographic Dynamics in Spatiotemporal Databases</article-title>
          . In: Miller HJ and Han J (eds),
          <source>Geographic Data Mining and Knowledge Discovery (Second Edition)</source>
          , London, CRC Press,
          <fpage>347</fpage>
          -
          <lpage>365</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>