=Paper= {{Paper |id=None |storemode=property |title=Spatio-temporal Knowledge Discovery from Georeferenced Mobile Phone Data |pdfUrl=https://ceur-ws.org/Vol-652/MPA10-07.pdf |volume=Vol-652 }} ==Spatio-temporal Knowledge Discovery from Georeferenced Mobile Phone Data== https://ceur-ws.org/Vol-652/MPA10-07.pdf
MPA'10 in Zurich                                           121                                        September 14th, 2010




                   Spatio-temporal knowledge discovery from
                       georeferenced mobile phone data

                                        Yihong Yuan, Martin Raubal

                       Department of Geography, University of California, Santa Barbara, USA, 93106
                                       yuan@geog.ucsb.edu, raubal@geog.ucsb.edu



    1. Introduction

    Information and communication technologies (ICTs), such as mobile phones and the
    Internet, are increasingly pervasive in modern society. These technologies provide
    greater flexibility regarding when, where, and how to travel. Understanding the
    influence of ICTs in our current mobile information society will be essential for
    updating environmental policies, and maintaining sustainable mobility and
    transportation (De Souza e Silva 2007). Moreover, ICTs have provided a wide range
    of spatio-temporal data sources, which can be used for geographic knowledge
    discovery and data mining in studies on geographic dynamics, such as human travel
    behavior and mobility patterns (Song et al. 2010; Yuan 2009; Miller 2009). There
    have been several studies focusing on extracting spatio-temporal data from
    georeferenced mobile phone data. For example, Ahas’ social positioning method
    (SPM) combines both location data and social attributes of mobile phone users to
    study the dynamics of urban systems (Ahas and Mark 2005, Ahas et al. 2007).
    Gonzalez et al. (2008) studied the individual trajectory of 100,000 mobile phone users
    based on tracked location data, providing new insights to understanding the basic law
    of human motion.
       As a generalized research frame, Miller (2009) discussed five major tasks in
    geographic data mining and knowledge discovery: spatial classification and capturing
    spatial dependency, spatial segmentation and clustering, spatial trends, spatial
    generalization, and spatial association. Traditional geographic knowledge discovery
    mainly focuses on obtaining new knowledge from a relatively comprehensive dataset,
    such as extracting movement patterns based on high resolution trajectories. However,
    several spatio-temporal datasets (e.g., georeferenced mobile phone data), only provide
    incomplete data with relatively low resolution and few individual attributes. Therefore,
    it is important to determine how much and to what extent we can extract knowledge
    from sparse data sources, as well as dealing with uncertainty in incomplete datasets.
    In this paper, we will provide a framework of extracting spatio-temporal knowledge
    in a typical georeferenced mobile phone dataset. This will be helpful in updating the
    research tasks of geographic knowledge discovery in the age of instant access.
MPA'10 in Zurich                                          122                                     September 14th, 2010




    2. Dataset

    This research utilizes a dataset from Harbin City, China. Harbin city is a major
    commercial, industrial, and transportation center situated in northeast China. It was
    ranked as one of the top ten populated cities in China in the year 2009. The dataset
    covers over one million people from Harbin city, including mobile phone connection
    records for a time span of 9 days (07/21/07-07/29/07). The data include the time,
    duration, and location1 of mobile phone connections, as well as the age and gender
    attributes of the users. Moreover, it provides the phone number and city code2 of the
    other end of each phone call. Note that the location records in the dataset cannot
    represent the accurate moving trajectory of each user, since the locations are recorded
    only when there is a phone call connection. However, based on a summary of 9 days’
    records, the data are still useful for depicting the general characteristics of individual
    travel mobility.


    3. Extracting spatio-temporal knowledge in georeferenced

    mobile phone data

    Generally, the dataset provides us with three types of directly recorded information
    for each mobile phone user:
    1) Cell phone usage
    2) Social attributes (age, gender)
    3) Spatio-temporal points within a given time span

    All the data mining and knowledge discovery tasks are based on the combination and
    interaction of the above three information categories. Moreover, since urban systems
    are considered organized aggregations of human settlements, we can also obtain
    inferential knowledge for the city based on the behavior of associated citizens, such as
    indentifying spatial cluttering of traffic in different urban areas. Therefore, the
    research questions are divided into two categories: individual-oriented research and
    urban-oriented research.




    1
         For each user, the location of the nearest cell phone tower is recorded both when the user makes and
    receives a phone call. Since the towers are located every 300m-500m in the city, the location accuracy
    is about 300m-500m.
    2
        The city code indicates in which city the other side of a particular phone call is located.
MPA'10 in Zurich                                 123                              September 14th, 2010




    3.1 Individual-oriented research

    3.1.1 Analysis of movement patterns


    Although human activity is potentially random and irregular, there still exist
    identifiable patterns in every person’s life. It is easier to predict the behavior of people
    who travel little than those who travel more frequently (Gonzalez et al. 2008).

    (1) Trajectory patterns: Based on the scattered spatio-temporal points provided in
        the dataset, it is possible to identify spatio-temporal paths through interpolation
        methods. However, this method may not be applicable for every user, since we
        need a certain number of well-distributed points to create a user trajectory. Based
        on trajectories, it is feasible to study the patterns (e.g., clustering, periodicity,
        predictivity) of individual trajectories. Figure 1 depicts the activity path for a
        specific individual during a week. Furthermore, we can identify particular
        patterns for population groups divided by social attributes.




                        Figure 1: A week-long individual travel-activity path.

    (2) Point patterns (points of interest - POIs): Based on the analysis of trajectory
        patterns, the POIs associated with each individual can be extracted by applying
        predefined discovery rules. POI discovery provides a method to obtain more
        sufficient user attributes in an incomplete dataset. For example, it is feasible to
        extract the work and home locations, regular entertainment places, and other POIs
        for each mobile phone user.
    (3) Correlation between mobile phone usage and movement patterns: Previous
        studies have focused on the interaction between ICT and human activity-travel
        behavior. However, due to the lack of sufficient data and the complicated nature
        of the interaction, there is still a continuing debate on how it works in everyday
        life. Based on the phone usage information and social attributes, we examined
        how population heterogeneity impacts the relationship between mobile phone
        usage and individual movement radius. We argue that the heterogeneity of the
        population should be taken into account when analyzing the correlation between
        ICT and human activity-travel behavior. Therefore it is important to specify
MPA'10 in Zurich                                124                             September 14th, 2010




          individual attributes (e.g., cultural, social, institutional, physical aspects) when
          investigating this problem. The mobile phone usage and travel behavior correlate
          differently among various social groups. Therefore, a general conclusion for the
          population is insufficient to represent the complicated nature of this problem.
          Future research should focus on studying the correlation between phone usage
          and trajectory patterns. This would provide more specific results on how mobile
          phone usage impacts an individual’s daily life, as well as offering references to
          policy makers.

    3.1.2 Analysis of social networks


    Janelle (1995) introduced four types of communication modes based on different
    spatio-temporal constraints: Synchronous Presence (SP), Asynchronous Presence
    (AP), Synchronous Tele-presence (ST), and Asynchronous Tele-presence (AT).
    Communication often occurs within members of a particular social network.
    Therefore, geographic mobility and cell phone usage could both be considered as
    connections in the same social network. Traditional research questions include the
    prediction and inference of network topology, the flow of information, and the
    interaction among networks. For instance, Pultar and Raubal (2009) studied the
    integration of social, transportation, and data networks. Particularly, in the case of
    georeferenced mobile phone data, a potential research question would be the
    combination, differentiation, and interaction of social networks associated with
    different communication modes.

    3.2 Urban-oriented research

    Cities are complex systems constituted by a myriad of processes and elements (Batty
    2005). Since individuals are atoms in an urban system, the spatio-temporal
    characteristics of an urban system could be viewed as a conceptual generalization of
    individual behavior.

    (1) Spatial division: Mobile communications might alter the traditional spatial
        division of urban spaces (Kwan et al. 2007), resulting in a change in urban
        planning and transportation systems. Potential research questions include the
        involvement of city structure under the impact of ICT (Torrens 2008), the
        comparison of “phone usage flow maps” and “travel flow maps”, etc.

    (2) Spatial clustering: Clustering refers to different types of hotspots in the city, for
        example, the hotspots of mobile phone usage, traffic jams, or night life (Figure 2).
        These patterns can be found on a range of timescales: from daily scale to yearly
        scale. The study of hotspot clustering patterns would be helpful for constructing a
        more efficient urban system.
MPA'10 in Zurich                               125                              September 14th, 2010




      Figure 2: The changing clustering of mobile phone usage in Harbin city at different
                                         time points.

    (3) Spatial central tendency and spread: Central tendency refers to a “middle”
        value or a typical value of the distribution. For a given city, the central tendency
        of spatio-temporal behavior reflects various characteristics of the urban system.
        For example, Figure 3 shows the distribution of individual travel radius in Harbin
        city. As can be seen, the movement radius of most people is around 3km.
        However, if we switch to another target city, will the distribution be similar to this
        one? Therefore, it would be interesting to find the correlation between travel
        distance distribution and the attributes (area, structure, population, average phone
        usage) of a given city.




                   Figure 3: The distribution of individual movement radius.



    4. Conclusions

    Spatio-temporal knowledge discovery has gained wide attention due to the increasing
    availability of georeferenced data. Much progress has been made regarding the
    theories, methodologies, and applications in this field. In this research, we focus on
MPA'10 in Zurich                             126                             September 14th, 2010




    extracting spatio-temporal knowledge in a restricted and sparse dataset (mobile phone
    data). A framework of potential research questions is provided as two parts:
    individual-oriented research and urban-oriented research. This also provides a
    guideline for our future research on extracting spatio-temporal knowledge in
    incomplete datasets.


    References

    Ahas R and Mark Ü, 2005, Location based services - new challenges for planning and
          public administration. Futures, 37(6): 547-561.
    Ahas R, Aasa A, Slim S, Aunap R, Kalle H and Mark Ü, 2007, Mobile Positioning in
          Space – Time Behaviour Studies: Social Positioning Method Experiments in
          Estonia. Cartography and Geographic Information Science, 34(4): 259-273.
    Batty M, 2007, Cities and complexity. MIT Press, Cambridge, MA, USA.
    De Souza e Silva A, 2007, Mobile phones and places: The use of mobile technologies
          in Brazil. In: Miller HJ (ed), Societies and cities in the age of instant access,
          Dortdrecht, The Netherlands, Springer, 295-310.
    Gonzalez MC and Hidalgo CA, et al., 2008, Understanding individual human
          mobility patterns. Nature, 453(7196):779-782.
    Janelle D, 1995, Metropolitan expansion, telecommuting, and transportation. In:
          Hanson S (ed), The Geography of Urban Transportation, New York, The
          Guilford Press, 407-434.
    Kwan MP, Dijst M and Schwanen T, 2007, The interaction between ICT and human
          activity-travel behavior. Transportation Research Part A-Policy and Practice,
          41(2):121-124.
    Miller H, 2009, Geographic data mining and knowledge discovery: An overview. In:
          Miller HJ and Han J (eds), Geographic Data Mining and Knowledge Discovery
          (Second Edition), London, CRC Press, 3-32.
    Pultar E and Raubal M, 2009, Progressive Tourism: Integrating Social, Transportation,
          and Data Networks. In: Sharda N (ed), Tourism Informatics: Visual Travel
          Recommender Systems, Social Communities, and User Interface Design,
          Hershey, PA, USA, IGI Global, 145 -159.
    Song CM and Qu ZH, et al., 2010, Limits of Predictability in Human Mobility.
          Science, 327(5968):1018-1021.
    Torrens PM, 2008, Wi-Fi geographies. Annals of the Association of American
          Geographers, 98(1): 59-84.
    Yuan M, 2009, Toward Knowledge Discovery about Geographic Dynamics in
          Spatiotemporal Databases. In: Miller HJ and Han J (eds), Geographic Data
          Mining and Knowledge Discovery (Second Edition), London, CRC Press,
          347-365.