<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Large-scale Online Mobility Monitoring with Exponential Histograms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christine Kopp Fraunhofer IAIS St. Augustin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Mock Fraunhofer IAIS St. Augustin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael May Fraunhofer IAIS St. Augustin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Odysseas Papapetrou Technical University of Crete Chania</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The spread of digital signage and its instantaneous adaptability of content challenges out-of-home advertising to conduct performance evaluations in an online fashion. This implies a tremendous increase in the granularity of evaluations as well as a complete new way of data collection, storage and analysis. In this paper we propose a distributed system for the large-scale online monitoring of poster performance indicators based on the evaluation of mobility data collected by smartphones. In order to enable scalability in the order of millions of users and locations, we use a local data processing paradigm and apply exponential histograms for an e cient storage of visit statistics over sliding windows. In addition to an immediate event centralization we also explore a hierarchical architecture based on a merging technique for exponential histograms. We provide an evaluation on the basis of a real-world data set containing more than 300 million GPS points corresponding to the movement activity of nearly 3,000 persons. The experiments show the accuracy and e ciency of our system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Advertising media are under the obligation to provide
reliable performance indicators for the pricing of advertising
campaigns. For the German out-of-home (OOH)
advertising industry, generating yearly net sales of about 760 million
Euro [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], this has meant to establish a system of
geographically di erentiating performance indicators over the past
years1. However, with the spread of digital signage also a
1http://www.agma-mmc.de/media-analyse/plakat.html
      </p>
      <p>ne-grained temporal di erentiation will be required in
future. While current performance indicators inform about
the poster contacts of seven or ten average days (the two
standard durations of poster campaigns in Germany),
digital out-of-home (DOOH) advertising spots have a duration
of only a few seconds. Assuming an evaluation period of
10 seconds, the granularity of the performance indicators
(and consequently of the required input data) increases by
four orders of magnitude. DOOH therefore has to face the
challenge of collecting and analyzing big data. In addition,
digital content has the advantage that it can be instantly
adapted to a changing audience. This adaptation, however,
requires online performance information, which forms the
second challenge of DOOH performance evaluation.</p>
      <p>In this paper we propose a distributed system for the
large-scale online monitoring of poster performance
indicators based on the evaluation of mobility data collected by
smartphones. We hereby consider two use cases which the
system shall cover. First, we want to be able to perform
online queries which obtain performance measures for the
recent past in a sliding window style. Second, we want to
analyze historic data for various time intervals. The rst
type of query allows the online monitoring of poster
performance and thus the targeted placement of advertisement
spots. The second type of query can be used for billing
purposes or to analyze previously collected data sets (e.g. to
nd interesting visit patterns that can then be monitored in
the online system). Although our use cases di er with
respect to their system requirements (distributed online
processing vs. analysis of massive amounts of centralized data),
we want to keep the maintenance e ort of the system as low
as possible. Our goal is therefore to set up a scalable system
architecture that allows an e cient re-use of code from the
online scenario for historic data analysis.</p>
      <p>
        The key component of our approach to handle massive
streams of data is to use exponential histograms for data
compression. This data structure has the advantage that
it o ers sliding window query capabilities with a
guaranteed maximum relative error. In addition, exponential
histograms can be applied in a distributed setting [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] thus
allowing for scalability when the number of users increases.
      </p>
      <p>
        Our online system relies on an Android implementation
that we have used in previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to detect visit
patterns on mobile phones. For the analysis of historic data we
have set up a Storm environment. In combination with the
Kafka messaging system we are able to perform historic data
analysis in a distributed streaming fashion. In this way we
can apply the same system architecture for online and
historic data analysis. We use the Storm/Kafka environment
to perform the experiments in this paper.
      </p>
      <p>We analyze the performance of our system using a
realworld GPS data set containing trajectories of 2,967 persons
containing more than 300 million GPS points over a
period of one week. We extract visit events from this data
set using 400,988 points of interest (POI) in Germany from
OpenStreetMap (OSM). Our experiments show that the
usage of exponential histograms results in an average error of
less than 1/10 of the maximum acceptable error while
reducing the storage space to an amount as small as 9.7% of
the baseline storage space.</p>
      <p>The remainder of our paper is organized as follows.
Section 2 discusses related work. Section 3 shows our system
architecture and Section 4 provides the experiments. We
conclude our paper in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK 2. 2.1</title>
    </sec>
    <sec id="sec-3">
      <title>Exponential Histograms</title>
      <p>
        Exponential histograms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are a deterministic structure,
proposed to address the basic counting problem, i.e., for
counting the number of true bits in the last N stream
arrivals. They belong to a family of methods that break the
sliding window range into smaller windows, called buckets or
basic windows, to enable e cient maintenance of the
statistics. Each bucket contains the aggregate statistics, i.e., the
number of arrivals and bucket bounds, for the
corresponding sub-range. Buckets that no longer overlap with the
sliding window are expired and discarded from the structure.
To compute an aggregate over the whole (or a part of the)
sliding window, the statistics from all buckets overlapping
with the query range are aggregated. For example, for basic
counting, aggregation is a summation of the number of true
bits in the buckets. A possible estimation error can be
introduced due to the oldest bucket inside the query range, which
usually has only a partial overlap with the query. Therefore,
the maximum possible estimation error is bounded by the
size of the last bucket.
      </p>
      <p>To reduce the space requirements, exponential histograms
maintain buckets of exponentially increasing sizes. Bucket
boundaries are chosen such that the ratio of the size of each
bucket b with the sum of the sizes of all buckets more recent
than b is upper bounded. In particular, the following
invariant is maintained for all buckets j: Cj =(2(1 + Pij=11 Ci)) "
where " denotes the maximum acceptable relative error and
Cj denotes the size of bucket j (number of true bits
arrived in the bucket range), with bucket 1 being the most
recent bucket. Queries are answered by summing the sizes
of all buckets that fully overlap the query range, and half
of the size of the oldest bucket, if it partially overlaps the
query. The estimation error is solely contained in the oldest
bucket, and is therefore bounded by this invariant, resulting
in a maximum relative error of ".</p>
      <p>
        Recently, Papapetrou et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] showed how an arbitrary
number of exponential histograms EH1; EH2; :::; EHn (each
one corresponding to an individual stream) can be
aggregated/merged, in order to produce a single exponential
histogram EH that corresponds to the order-preserving union
of the streams. More precisely, let " denote the maximum
error parameter of the original exponential histograms, and "0
the parameter of the merging algorithm. The algorithm
supports the creation of an aggregated exponential histogram
with a maximum relative error of (" + "0 + " "0). In this
work we use this merging algorithm to reduce the memory
required for storing the exponential histograms of the visit
events coming from various input sources.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Distributed Evaluation of Visit Events</title>
      <p>
        In previous work we have provided a set of visit quantities
that can be used to de ne performance measures in OOH
advertisement [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this paper we concentrate on the
evaluation of gross visits which state the number of total visits
to a certain location and which can be used to estimate the
total contacts to a poster site. In addition, we have
provided a methodology for the privacy-preserving, distributed
collection of visit quantities in previous work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The basic idea of the approach is to decentralize the data
collection and evaluation process of movement data. Instead
of constantly submitting location information of a user to
some central server, the evaluation of visits (or visit
patterns) is performed locally on a mobile device (e.g.
smartphone). The device submits only aggregated and
anonymized statistics to a central coordinator. In addition, web
anonymization techniques such as onion routing [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] can be
used to prevent that the coordinator reconstructs visit
histories from several messages of a person based on the
communication protocol. A similar, however analytically less
powerful framework has previously been proposed by Hoh
et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for the distributed, privacy-preserving
monitoring of tra c. However, both papers do not consider the
practical aspect of scaling the proposed method to
thousands and potentially millions of users. In fact, considering
movement statistics from our GPS data set, every person
traverses more than 200 street segments per day. If we
assume further that each person visits 10 di erent locations
(e.g. work location, shops, bus stops) per day and 20
million persons participate in data collection, about 4.2 billion
events occur every day. In order to cope with this number of
events, sophisticated analysis and storage algorithms as well
as a sophisticated system architecture have to be devised.
The design and performance analysis of such a system is the
scope of our paper.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3. SYSTEM ARCHITECTURE</title>
      <p>
        Our architecture consists of two or alternatively three
layers (see Figure 1). The lowest layer holds the user nodes,
which collect the users' GPS data and extract visit events.
The visit events are forwarded either directly to the central
coordinator ( at setting) or to a layer of intermediate nodes
(hierarchical setting). In the at setting, the coordinator
aggregates the visit information of each POI in an
exponential histogram. I.e., for each POI an exponential histogram
is maintained that records the visit events for this POI. As
the exponential histogram stores a time aggregate with the
event, queries over time windows can be answered. In the
hierarchical setting, the exponential histograms reside already
at the intermediate nodes. In regular time intervals the
intermediate nodes submit the exponential histograms to the
coordinator, which merges them and answers user queries.
The exponential histograms cannot be applied at the user
level because the number of visit events per user is too small
to make the data structure e cient. The layer of
intermediate nodes was introduced for horizontal scalability and to
avoid an overload of the coordinator. However, it also serves
a privacy purpose given that the intermediate nodes do not
collude (see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). As a user can freely select an intermediate
node when submitting a visit event, no intermediate node
will obtain the whole event history of a single user. The
intermediate nodes submit their data structure in regular
time intervals to the coordinator, which nally merges the
data structures and answers user queries.
      </p>
      <p>
        As motivated by our use case, our system shall be able
to perform analyses online as well as on historic data. The
above architecture describes the online use case. For
historic data analysis we have to substitute the layer of local
nodes. This substitution should still allow to process data in
parallel in order to scale to large amounts of data. In
addition, a streaming environment would be preferable in order
to re-use existing code. Both aspects can be met by using
a distributed streaming processing system as, for example,
Storm2 or S4 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We have ported the Android code of event
detection to run as Storm bolts. The input is streamed into
the system via the Kafka messaging system [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which allows
to handle each GPS point of the recorded trajectories as
individual message. Thus, with this mechanisms we can scale
the parallel simulation of event detection horizontally in the
cluster. In our experiments described in the next section we
used this technique to emulate the event detection on GPS
traces of 2,967 test persons in an experimental cluster.
Detected events are sent to the intermediate nodes similar to
the online setting.
      </p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
    </sec>
    <sec id="sec-7">
      <title>Data Set</title>
      <p>
        For our experiments we use a subset of a large-scale GPS
survey [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] commissioned by the Arbeitsgemeinschaft
MediaAnalyse e.V.3, a joint industry committee of German
advertising vendors and customers. The GPS data has been
collected in the year 2011 and contains 2,967 persons with
valid GPS data. The persons are recruited from 31 major
2http://storm-project.net
3http://www.agma-mmc.de
cities in Germany and are asked to carry the GPS devices
for one week.
      </p>
      <p>
        After clean-up the data set contains 304 million GPS points.
In addition, we extracted 400,988 points of interest (POI)
from OpenStreetMap4 (OSM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] marked with the keys shop,
amenity, leisure, tourism, historic, sport, public transport,
railway. We grouped the POI into the following categories:
shop, restaurant, leisure, education, parking and public
transport stops. We limited our experiments to those POI
because digital posters are still very expensive and therefore
placed mostly at attractive places as train stations or
shopping locations. For each POI category we de ned a
minimum stay time and a 50x50 meter spatial bu er in order to
extract visit events. Table 1 shows the number of POI
aggregated to the six types along with the assumed minimum
stay times. Figure 2 left shows a one-day trajectory of one
test person along with the extracted POI in its surrounding.
      </p>
      <p>POI type
shop
restaurant
leisure
education
parking
public transport stop
total</p>
      <p>
        The extraction of visit events is performed by the local
nodes (see Section 3). A visit results from the spatial
intersection of a trajectory and a geographic location and has to
last a given minimum period of time. For a formal de
nition of a visit see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Figure 2 right shows exemplary the
extraction of visit events. The POI are colored according
to their minimum required stay time (green = 5 minutes,
orange = 10 minutes, red = 15 minutes). In the top right
picture one visit occurs in the orange colored POI (where
a dense cluster of GPS points exists). In the bottom right
picture the user passes the POI merely on his way. As the
duration of spatial intersection lies below the minimum stay
time, no visit events are generated. For the extraction of
visits we apply an algorithm from previous work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which
(a) one-day trajectory of a test person
(b) trajectory excerpts showing one POI
visit on top (dense cluster of points) and
two POI passages on bottom
was designed to extract visit patterns from a stream of GPS
positions online on mobile phones.
      </p>
      <p>In total we extracted 23,508 visit events to 11,809 di
erent POI for all test persons. This number has been
considerably below our expectations. Most likely it results from
two reasons. First, the number of OSM POI are incomplete.
From the online source http://www.haltestellen-suche.de we
know to expect at least 217,000 stations of public transport
in Germany, and also the number of shops in Germany is
considerably above the extracted number of POI. Second,
GPS signals are typically blocked inside of buildings. As we
applied a light-weight event extraction algorithm (that can
run on a mobile device), we may have lost a number of visit
events.</p>
      <p>Table 2 shows an overview of the number of visit events
per POI. Most often, only a single visit occurred. This
number is quite reasonable given our low number of visits and
the independent movement behavior of the test persons.</p>
      <p>In order to perform experiments also on a large-scale data
set resembling more closely the real-world situation, we
replicated the original visit data by a factor of 1,000. We set the
time of each such visit by adding Gaussian noise to the
current time with = 0 and = 10,000 seconds.
4.2</p>
    </sec>
    <sec id="sec-8">
      <title>Experimental Set-Up</title>
      <p>In our experiments we conducted point queries in a
sliding window fashion. I.e., we queried the number of events
per POI in the past t seconds. The selected query
windows were of length 30, 600, 1800, 3600 or 86400 seconds.
We performed those queries every 10 minutes (in the
hierarchical setting this coincides with the time interval of the
force action). For our observation period of one week this
resulted in nt = 1,008 queries per query window for each of
the np = 11,809 visited POI. In accordance with our
maximum query interval, we set the sliding window parameter
of the exponential histogram to 86,400 seconds in all
experiments. Further, we varied the maximum acceptable relative
error " to take the values 0.01, 0.02, 0.04, 0.08 and 0.16.
In the hierarchical setting we used 10 intermediate nodes
which submitted their data structures every 600 seconds to
the coordinator.</p>
      <p>We measured the error for each experiment using the
mean absolute percentage error (MAPE), which is de ned
as follows:</p>
      <p>M AP E =</p>
      <p>Pnp Pnt xij xbij
i=1 j xij
np nt
where xij denotes the true number of visit events at POI
i in query window j and xbij denotes the number of events
returned from the exponential histogram. In the case of
xij = 0 we added a relative error of zero if our estimate
was correct (xbij = 0) and a relative error of 1 if xbij 6= 0.
This latter case, however, did not occur. We performed all
experiments for the at and hierarchical setting as well as
for the original and multiplied data set.
4.3</p>
    </sec>
    <sec id="sec-9">
      <title>Results</title>
      <p>Figure 3 shows the results for the at and hierarchical
setting of the multiplied data set. The respective numbers
are provided in Tables 3 and 4. Note that we display only
the results for the multiplied data set because due to the
few visit events in the original data set the error was nearly
always zero there.</p>
      <p>In general, the MAPE is very low, lying with one
exception below 1%. For both the at and hierarchical setting
two trends can be observed. First, the MAPE decreases
with smaller ". Second, the MAPE decreases with
decreasing size of the query window. The rst e ect is nearly linear
for all query windows and can be expected from the
characteristics of exponential histograms. The second is also
expected because the error guarantees are given on the size
of the sliding window, which was xed to 86,400 seconds.
Accordingly, the error for smaller time intervals has to be
lower. However, the e ect is linear to the logarithm of the
query window sizes, i.e. when increasing the query window,
the MAPE increases sublinearly.</p>
      <p>When comparing the error between the at and
hierarchical setting, the merge operations result in only a small
increase in error.</p>
      <p>"=0.01 "=0.02 "=0.04 "=0.08 "=0.16
mem. 14.5 MB
8.8 MB
5.5 MB
3.4 MB
2.5 MB</p>
      <p>In order to set the MAPE in perspective to the number
of visit events, Table 5 shows the average and maximum
number of visits per POI and query interval. The average is
hereby calculated once for all POI and time slots and once
only for those containing at least one event.</p>
      <p>
        The memory usage of the exponential histogram at the
end of the observation period is depicted in the last line in
Tables 3 and 4. Assuming xed 32-bit counters, it depends
only on " and the maximum possible count N in the sliding
window of each POI, requiring O( 1" log N ) space [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As
we maintain an exponential histogram for each POI, the
required memory depends also linearly on the number of
(distinct) visited POI which is, however, constant in our
experiments.
query
wind.
      </p>
      <p>30 s
600 s
1,800 s
3,600 s
86,400 s
avg. events
max. events
avg. events</p>
      <p>&gt; 0</p>
      <p>Our experiments show that the resultant error is very low.
For all settings of " the mean error (MAPE) is less than
1/10 of the maximum acceptable error. This is a very good
result. Especially we can be sure for small total number of
visits that the query results are always correct. For example,
setting " = 0:01 will result in no errors if less than 100 events
occur per POI. This is an important characteristic because
the visit frequency of POI is right-tailed, containing only
few POI with very high frequencies.</p>
      <p>Further the experiments show that our setting scales
horizontally. By introducing a layer of 10 intermediate nodes,
the MAPE was on average 7.5% higher and at most 23%
higher than in the at setting. Both numbers are
considerably below the maximum acceptable error as well as the
maximum relative error guaranteed for the join of
exponential histograms.</p>
      <p>Finally, to evaluate the memory usage, we can compare
the numbers to the following baseline scenario. Whenever
a visit event occurs, the POI identi er and timestamp are
stored at the coordinator using two 4 Byte integers. As our
sliding window covers only one day, we will assume that we
have to store 1/7 of the total visit events. For the
original 23,509 events this results in 0.026 MB. For the
multiplied data set it results in 25.6 MB. The storage amount
for the original events using exponential histograms varied
between 0.54-4.7 MB. In this case we did not save on
memory. However, using the more realistic multiplied data set
with exponential histogram sizes between 2.5-14.5 MB, our
experiments require only 9.7-56.6% of the baseline storage
space depending on the selected ".</p>
      <p>When extrapolating to the envisioned setting of
monitoring 20 million persons generating each 210 events per day
on about 6,500,000 distinct POIs in Germany (including
the 6,000,000 distinct street segments), just storing the raw
event data would result in 31.3 GB memory consumption.
This is considerably above the 1.3-7.8 GB required by the
exponential histograms (by just taking into account that our
memory consumption increases linearly with the number of
distinct POIs).</p>
      <p>Considering our entire approach including exponential
histograms and local evaluation, the storage reduction is even
much higher compared to a naive centralized setting where
the users submit a GPS position every second to some
central coordinator.</p>
      <p>Also note that inserting into and querying an exponential
histogram almost takes constant time far below a
microsecond, which is much faster than searching an event database
of raw events.</p>
    </sec>
    <sec id="sec-10">
      <title>CONCLUSIONS</title>
      <p>In this paper we propose a distributed system for the
large-scale online monitoring of poster performance
indicators based on the evaluation of mobility data. Our system
relies on the collection and local processing of mobility data
via smartphones and uses exponential histograms for the
e cient storage and querying of visit statistics in a sliding
window fashion. Our experiments on a multiplied real-world
data set with nearly 3,000 persons show that the usage of
exponential histograms results in an average error of less than
1/10 of the maximum acceptable error while reducing the
storage space to an amount as small as 9.7% of the baseline
storage space.</p>
    </sec>
    <sec id="sec-11">
      <title>ACKNOWLEDGMENTS</title>
      <p>We thank our colleague Sebastian Bothe for supporting
us to run the cluster-based version of the experiments and
the Arbeitsgemeinschaft Media-Analyse e.V. for granting
the use of the GPS data set. The research leading to these
results has received funding from the European Union's
Seventh Framework Programme (FP7/2007-2013) under grant
agreement no. 255951 (LIFT).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Datar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gionis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Indyk</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Motwani</surname>
          </string-name>
          .
          <article-title>Maintaining stream statistics over sliding windows</article-title>
          .
          <source>SIAM J. Comput.</source>
          ,
          <volume>31</volume>
          (
          <issue>6</issue>
          ):1794{
          <year>1813</year>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Fachverband</given-names>
            <surname>Au</surname>
          </string-name>
          enwerbung e.V.
          <article-title>Netto-Werbeeinnahmen erfassbarer Werbetrager in Deutschland, 2002-2010 (Net turnover of con rmable advertising media in Gemany,</article-title>
          <year>2000</year>
          -
          <fpage>2010</fpage>
          ),
          <year>2011</year>
          . http://www.faw-ev.de/media/download/ marktdaten/4\_Nettoumsaetze\_aller_ Werbemedien\_ab\_
          <year>2002</year>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Florescu</surname>
          </string-name>
          , C. Korner, M. Mock, and
          <string-name>
            <given-names>M.</given-names>
            <surname>May</surname>
          </string-name>
          .
          <article-title>E cient mobility pattern stream matching on mobile devices</article-title>
          .
          <source>In Proc. of the Ubiquitous Data Mining Workshop (UDM</source>
          <year>2012</year>
          ), pages
          <fpage>23</fpage>
          {
          <fpage>27</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Goldschlag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reed</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Syverson</surname>
          </string-name>
          .
          <article-title>Onion routing for anonymous and private internet connections</article-title>
          .
          <source>Comm. of the ACM</source>
          ,
          <volume>42</volume>
          :
          <fpage>39</fpage>
          {
          <fpage>41</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Haklay</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <article-title>OpenStreetMap: User-Generated Street Maps</article-title>
          .
          <source>IEEE Pervasive Computing</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <volume>12</volume>
          {
          <fpage>18</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gruteser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Herring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ban</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Work</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Bayen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Annavaram</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Jacobson</surname>
          </string-name>
          .
          <article-title>Virtual trip lines for distributed privacy-preserving tra c monitoring</article-title>
          .
          <source>In Proc. of the 6th Int. Conf. on Mobile Systems</source>
          , Applications, and
          <string-name>
            <surname>Services</surname>
          </string-name>
          (
          <source>MobiSys'08)</source>
          , pages
          <fpage>15</fpage>
          {
          <fpage>28</fpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mock</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>May</surname>
          </string-name>
          .
          <article-title>Privacy-preserving distributed monitoring of visit quantities</article-title>
          .
          <source>In SIGSPATIAL 2012 Int. Conf. on Advances in Geographic Information Systems(SIGSPATIAL/GIS)</source>
          , pages
          <fpage>438</fpage>
          {
          <fpage>441</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Ko</surname>
          </string-name>
          <article-title>rner. Modeling Visit Potential of Geographic Locations Based on Mobility Data</article-title>
          .
          <source>PhD thesis</source>
          , University of Bonn,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kreps</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Narkhede</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Rao</surname>
          </string-name>
          .
          <article-title>Kafka: A distributed messaging system for log processing</article-title>
          .
          <source>In Proceedings of 6th International Workshop on Networking Meets Databases (NetDB)</source>
          , Greece,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Media-Micro-Census GmbH</surname>
          </string-name>
          . ma 2012 Plakat - Methoden-Steckbrief zur Berichterstattung,
          <year>2012</year>
          . http://www.agma-mmc.de/publikationen/ methodische-berichte/methoden-steckbriefe. html?eID=dam\_frontend\_push\&amp;docID=
          <fpage>179Z</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Neumeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Robbins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nair</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kesari</surname>
          </string-name>
          . S4:
          <article-title>Distributed stream computing platform</article-title>
          .
          <source>In Proceedings of the 2010 IEEE Int. Conf. on Data Mining Workshops, ICDMW '10</source>
          , pages
          <fpage>170</fpage>
          {
          <fpage>177</fpage>
          , Washington, DC, USA,
          <year>2010</year>
          . IEEE Computer Society.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O.</given-names>
            <surname>Papapetrou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Garofalakis</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Deligiannakis</surname>
          </string-name>
          .
          <article-title>Sketch-based querying of distributed sliding-window data streams</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>5</volume>
          (
          <issue>10</issue>
          ):
          <volume>992</volume>
          {
          <fpage>1003</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>