=Paper= {{Paper |id=Vol-2129/paper12 |storemode=property |title=Predict Cellular Network Traffic with Markov Logic |pdfUrl=https://ceur-ws.org/Vol-2129/paper12.pdf |volume=Vol-2129 |authors=Marco Lippi,Marco Mamei,Franco Zambonelli |dblpUrl=https://dblp.org/rec/conf/ijcai/0001MZ18 }} ==Predict Cellular Network Traffic with Markov Logic== https://ceur-ws.org/Vol-2129/paper12.pdf
                 Predict Cellular Network Traffic with Markov Logic
                            Marco Lippi, Marco Mamei, Franco Zambonellli

                             Dipartimento di Scienze e Metodi dell’Ingegneria
                              University of Modena and Reggio Emilia, Italy
                          {marco.lippi,marco.mamei,franco.zambonelli}@unimore.it

                        Abstract
                                                                      Table 1: Example of aggregated CDR data.
     Forecasting spatio-temporal data is a challeng-                   Timestamp       Cell     Type    Count
     ing task in transportation scenarios involving                  1/3/2015 13:00 3943       SMS In    4.34
     agents. In this paper, we propose a statisti-                   1/3/2015 13:00 3943       Call In   2.34
     cal relational learning approach to cellular net-               2/3/2015 13:15 3943 Network         9.34
     work traffic forecasting, that exploits spatial re-                   ...         ...       ...      ...
     lationships between close cells in the network                  30/4/2015 20:45 3943 Call Out       3.45
     grid. The approach is based on Markov logic
     networks, a powerful framework that combines
     first-order logic and graphical models into a
     hybrid model capable of handling both uncer-
     tainty in data, and background knowledge of                without any reference to the IDs of the people being in-
     the problem. Experimental results conducted                volved). This kind of data presents a number of ready-
     on a real-world data set show the potential of             to-market applications as privacy concerns are much re-
     using such information. The proposed method-               duced (in contrast with CDRs with anonymized IDs).
     ology can have a strong impact in mobility de-             In fact, all major telecommunication companies already
     mand forecasting and in transportation appli-              have services for the analysis and commercial exploita-
     cations.                                                   tion of this data. While there are several works analyzing
                                                                properties of aggregated CDR data and predicting user
                                                                movements from individual CDRs [16], the task of pre-
1    Introduction                                               dicting future density of CDRs from aggregated data is
                                                                relatively under-explored.
The widespread diffusion of mobile phones and cell net-
works provides a practical way to collect geo-located in-          Accordingly, the main contribution of this paper is to
formation from a large user population. The analysis of         present a novel approach based on Markov Logic (ML)
such collected data can be a fundamental asset in the           [15] to predict cellular network traffic across the city (as
development of traffic management, urban planning and           cellular network traffic is a widely used proxy for people
transport applications [3; 5; 6; 1; 17]. In this work, we ex-   presence, our approach can be applied to predict crowd
plore the use of anonymized Call Detail Records (CDRs)          distribution). The main advantage of ML with respect
from a cellular network to estimate and predict the dis-        to other time-series forecasting methods is that ML can
tribution of people across the city. CDRs are generated         easily model the relationship between different areas of
every time a mobile phone interacts with the cellular           the city imposing (probabilistic) constraints on how traf-
network (e.g., to send/receive calls and text messages,         fic in an area can influence traffic in another one.
or to connect to the Internet). Each CDR contains in-
formation about the identity of the mobile phone (typ-             The rest of the paper is structured as follows: Sec-
ically anonymized with an hashed id), its approximate           tion 2 describes CDR data that are employed in our ex-
location (i.e., the network cell where the phone is con-        periments; Section 3 presents Markov Logic Networks
nected) and a time stamp Accordingly, CDRs can serve            (MLN), the main approach we used for our prediction
as sporadic samples of the approximate locations of the         task; Section 4 describes experiments applying MLN
phone’s owner [11]. On the basis of such location sam-          to a set of CDR data and discusses results; Section 5
ples, we estimate the distribution of people across the         overviews related work in the area of CDR data analysis
city and try to predict how they will move.                     and time-series forecasting; finally, Section 6 concludes
   We focus on aggregated CDR data (i.e., data mea-             the paper and indicates some interesting directions for
suring the number of CDRs generated from a region,              future research.
                                                                           100% more – as many people than usual), see Figure 2
                                                                           (bottom). Finally, we discretized x̂t into a set of classes.
                                                                           Working with discrete values notably simplify computa-
                                                                           tions, without compromising the actual significance and
                                                                           interpretability of the results.
                                                                              To predict cellular network traffic, we apply Markov
                                                                           logic [15], a statistical relational learning method, to per-
                                                                           form collective classification on a grid of cells. While
                                                                           traditional machine learning classifiers typically treat
                                                                           the examples as independent and identically distributed,
                                                                           statistical reltional learning approaches are capable of
                                                                           taking into account relations and inter-dependencies be-
      Figure 1: Irregular grid tessellating the area                       tween the examples to be classified, so that a joint clas-
                                                                           sification spanning multiple examples can be performed.
                                                                           In particular, we aim to exploit the spatial relationships
            1000
                                                                           between cells, as the nature of CDR data is inherently
                                                                           relational along this dimension: at time t, the traffic at
      cdr




             500
                                                                           two cells c1 and c2 spatially close in the network will
                                                                           typically be strongly inter-related.
                 0
                     mar 30    apr 06    apr 13          apr 20   apr 27
                                                  time

             4                                                             3    Methodology
             2
                                                                           Inspired by the work in [10] for road traffic flow fore-
                                                                           casting, we modeled our domain with a set of logic pred-
      z




             0                                                             icates that describe the dynamics of CDR traffic data
                                                                           during time and across different cells. Supposing to dis-
            −2
                 mar 30       apr 06    apr 13
                                                  time
                                                         apr 20   apr 27   cretize the amount of traffic in C classes, predicates
                                                                           Class0(cell,time), . . . , ClassC(cell,time) can be
                                                                           used to indicate the fact that, at a certain cell and at
Figure 2: Example of data in a grid cell: (top) Origi-
                                                                           a given time, the traffic quantity falls in one of such
nal behaviour extracted from mobile phone data. (bot-
                                                                           classes. A simple predicate Neighbors(cell,cell) in-
tom) “standardized” score showing the deviation from
                                                                           dicates that two cells are spatially close in the grid.
the mean of that cell at that time, in mean units.
                                                                           Time is modeled with predicate Next(time,time). Ad-
                                                                           ditional information about the day of the week and the
2   Dataset                                                                part of the day can be also easily modeled with logic
                                                                           predicates. Given a domain described in terms of logic
We focus the analysis on aggregated CDR data that
                                                                           facts, a Markov Logic Network (MLN) consists in a set
count SMSs, calls and Internet traffic over specific ar-
                                                                           of weighted rules that model relationships among the ex-
eas of the city and at time intervals. Specifically, the
                                                                           isting predicates. For example, a simple predictor that
geographic area under analysis is tessellated with an ir-
                                                                           forecasts, for cell c and time t, the same traffic class ob-
regularly shaped grid, similar to a Voronoi tessellation.
                                                                           served at cell c at time t − 1, is obtained with the rules:
Thus, the more cell network antennas present, the denser
the grid (see Figure 1.                                                         Class0(c,t1) ∧ Next(t1,t2) => Class0(c,t2)
  The resulting dataset (illustrated in Table 1 is a set of                                         ...
counters that estimate, for each cell of the grid and each                      ClassC(c,t1) ∧ Next(t1,t2) => ClassC(c,t2)
15-minutes-interval, the number of SMSs, calls and In-                     Clearly, such rules are not always true, but they are
ternet traffic. Counters can also be fractional to take into               true with a certain probability. Given a collection of ob-
account CDR interactions originating in a cell and end-                    servations of past events, the Markov logic framework
ing in another one. The original data comprises about                      allows to learn the weights of such rules directly from
12 million records like the ones depicted. For each cell                   data. The higher the weight, the higher is a probability
and CDR type, the typical plot resembles the one in                        that the rule will be true. Once the weights of the MLN
Figure 2 (top) where it is possible to observe daily and                   have been learned, one can use the model to compute
weekly patterns in city dynamics.                                          the truth value of some query predicates. In the case of
  To better highlight variability in our data, for each                    this work, the aim is to forecast the dynamics of CDR
cell, we computed the mean (µt ) of the time series in                     traffic in the future. Given the current state of the CDR
that 15 minutes interval (i.e., the mean among all days                    traffic network, by performing inference over the MLN
at that time) and obtain a “standardized” score by com-                    it is possible to retrieve the cell configuration that maxi-
puting x̂t = (xt −µt )/µt . The resulting time series shows                mizes the probability of the rules in the model. In order
the deviation from the mean of that cell at that time, in                  to exploit spatial relationships, rules like the following
mean units (e.g., x̂t = 1 means that there is twice –                      one can be used:
   ClassC(c1,t1) ∧ Next(t1,t2) ∧ Neighbors(c1,c2)
                 => ¬ Class0(c2,t2)
Such rule means that, if there is a high traffic (class C)
in a certain cell c1 at time t1, then at the next time step




                                                                       0.90
t2 it is unlikely that a neighbor cell c2 will have very low
traffic (class 0). Complex relationships and dependencies
can by modeled with such rules.
   Formally, a Markov logic network (MLN) is a set of                                                      ●
first-order logic formulae F = {F1 , . . . , Fn }, each with an




                                                                       0.80
attached real-valued weight w = {w1 , . . . , wn }. Given a
finite set of constants C = {c1 , . . . ck } (the objects in the                    ●
domain – in our case described above, cells and times-
tamps), an MLN induces a Markov network (or Markov                              Π   unc
                                                                                               Πt        Πs          Πst
random field) where nodes are ground atoms,1 and edges
are present between nodes that appear together in at
least one grounding of some formula. An MLN defines                Figure 3: Predictability according to Fano’s inequal-
a probability distribution over possible variable configu-         ity associated with uncorrelated, time, spatial and spa-
rations (i.e., worlds):                                            tiotemporal correlated entropies.
                                                  
                                     |F |
                          1         X
            P (X = x) = exp              wj nj (x)         (1)   Table 2: Comparison of the classifiers employed in the
                          Z            j                           experimental study. We report accuracy and average F1
                                                                   over the five classes.
where nj (x) is the number of groundings that satisfy for-                      Predictor Accuracy Avg F1
mula Fj in x. As stated above, the higher is the weight                         Random      65.7        19.8
of a rule, the higher the probability that formula is sat-                      Majority    80.3        16.1
isfied.                                                                            RW       74.3        38.8
   In most of the application scenarios, some of the pred-                        MLN       77.4        40.6
icates (i.e., of random variables) are always observed,
which means they are given as evidence, whereas oth-
                                                                                                                   ∗
ers are to be guessed at prediction time, thus being ob-           solution is to use the counts in the MAP state yw : in
served only during the training phase. This is called a            this way, MAP inference is called as a subroutine for
discriminative setting. The truth value of query vari-             each step of the learning algorithm.
ables is obtained from the evidence variables, and from
the weighted MLN, through a process of inference, that             4     Experiments
computes the maximum a posteriori (i.e., the most prob-            We focused the analysis on the province of Milan and
able) configuration of such variables.                             we used aggregated CDR data collected over a period
   The set of parameters wj associated to the MLN rules            of two months: from March 1, 2015 to April 30, 2015
can be learned with several different algorithms. In               and including calls and SMS sent and received and net-
a discriminative setting, MLN weights are learned by               work traffic. In our experiments we sum together all
maximizing the conditional log-likelihood (CLL) of query           SMS and calls sent/received, while we do not consider
atoms Y , given the evidence X. The conditional proba-             network traffic (as it is expressed in a different format in
bility P (Y = y|X = x) is defined as:                              our data). For each cell, at a given time t this sum is our
                                                   !
                          1        X                               xt . The area under analysis is tessellated in 1,419 cells.
     P (Y = y|X = x) =      exp        wi ni (x, y)    (2)         Cell areas can range from 0.04 Km2 in the city center to
                         Zx
                                     i∈FY                          40 Km2 in the suburbs. Temporal resolution is 15 min-
where ni (x, y) is the number of groundings of formula i           utes. For each cell, we compute the standardized score
in the configuration (x, y) and FY is the set of first-order       x̂t = (xt − µt )/µt and we discretized those values in 5
clauses containing query predicates Y . The gradient of            classes associated to intervals: [−∞, 0.25], [0, 25, 0.50],
the CLL can be computed as:                                        [0.50, 0.75], [0.75, 1], [1, ∞] (i.e., the class [0.75, 1] con-
   ∂                                                               tains those values having from 75% to 100% more traffic
      log Pw (y|x) = ni (x, y) − y0 Pw (y 0 |x)ni (x, y 0 )(3)
                                     P
                                                                   than the mean at that time). Overall, 79% of data fall
 ∂wi                                                               in the first class, 11% in the second one, 4% both in the
                    =         ni (x, y) − Ew [ni (x, y)]   (4)     third and fifth one, 2% in the fourth one.
Exactly computing the expected number of true ground-                 Following the work in [16], we try to establish upper
ings Ew [ni (x, y)] is an intractable problem, thus approx-        bounds for the predictability of aggregate (discretized)
imate algorithms are typically employed. One possible              CDR behavior across cells. We compute different en-
   1
                                                                   tropy measures for each cell: (i) The random entropy
     In a ground atom all variables have been substituted by
constants.                                                                                srand = log2 N = 2.3
                    0              1           2         3          4                      performance of a classifier that randomly predicts one of
           0     48,015          3,738        931       437        976                     the four classes, by drawing from a probability distribu-
           1      4,257          1,894        381       110        122                     tion that knows the true proportions between the classes
           2       951            418         568       103        145                     (called Random in Table 2). As a second baseline, we
           3       470            134         164       111        191                     employ a classifier that simply always predicts the most
           4      1,163           163         194       208       1,546                    frequent class (named Majority in Table 2), that is class
                                                                                           0 in our case, corresponding to low traffic. As a third
Table 3: Confusion matrix of the MLN predictor. Rows:                                      predictor, we use a Random Walk (RW in Table 2) that
true labels, columns: predictions.                                                         produces as a forecast at time t the same class that was
                                                                                           observed at time t − 1 (for each cell independently). Fi-
N is the number of values exhibited by the cell – all cells                                nally, we employ an MLN exploiting spatial relationships
have 5 values.                                                                             between the cells. The task is to predict the status of
  (ii) The uncorrelated entropy                                                            the grid 15 minutes ahead in the future. We report both
                                                                                           the accuracy and average F1 over the five classes (be-
                                       N
                                       X                                                   ing F1 of a single class the harmonic mean between its
                        sunc = −             pj ∗ log2 pj                                  precision and recall). As already introduced when dis-
                                       j=1                                                 cussing Πunc predictability, the accuracy is clearly dom-
  (iii) The time (Markov) correlated entropy                                               inated by the most frequent class, that is present 80.3%
                                                                                           of the times in the test set (which is, in fact, the ac-
                                                                                           curacy of the Majority predictor), while the average F1
                XX
         st = −       p(x̂t , x̂t−1 )log2 p(x̂t |x̂t−1 )
                                                                                           gives the same importance to each of the five classes,
                        x̂t x̂t−1
                                                                                           and it is thus more significant in this setting. For exam-
  (iv) The spatial correlated entropy                                                      ple, the Majority predictor achieves the best accuracy,
               X                                                                           but actually it is a completely useless system, as it never
        ss = −     p(x̂t , x̂1t ...x̂kt )log2 p(x̂t |x̂1t ...x̂kt )                        predicts something different from the low-level traffic.
                        x̂t                                                                It is interesting to see that the RW predictor already
  Where x̂1t ...x̂kt are values in neighbor cells.                                         achieves a significant improvement over Random, thus
  (v) Finally, the spatio-temporal correlated entropy                                      proving to be a very strong competitor. This behavior
                                                                                           suggests that CDR traffic has a dynamic which changes
           XX                                                                              smoothly through time, and 15 minutes ahead in the
sst = −                 p(x̂t , x̂t−1 , x̂1t ...x̂kt )log2 p(x̂t |x̂t−1 , x̂1t ...x̂kt )   future is a short horizon to observe big changes in the
            x̂t x̂t−1                                                                      network configuration. Nevertheless, the MLN approach
                                                                                           achieves better results than RW, both in terms of accu-
    Naturally, for each cell, we will have srand ≥ sunc ≥                                  racy, and of average F1 . Table 3 reports the confusion
st , ss ≥ sst . We then compute the predictability Π as-                                   matrix for the MLN model: rows/columns represent the
sociated to each entropy according to Fano’s inequality                                    true/predicted values, respectively (position i, j in the
[16]. This is an upper bound for any algorithm predict-
                                                                                           matrix indicates the number of examples of class i that
ing x̂t . Πrand = 20% for all cells. Results for other                                     are predicted to belong to class j).
predictabilities are in Figure 3. For example, from these
                                                                                              Figure 4 shows a case study in which spatial relation-
results, we can infer that the upper bound for a classifier
                                                                                           ships help to improve the accuracy of the predictions.
using only temporal information (1 step – 15 minutes
                                                                                           The traffic classes for the cells in the network are rep-
back) is about 85% (median value). Despite these en-
                                                                                           resented for some timestamp t (left), and for the sub-
couraging predictability results, it is worth noting that
                                                                                           sequent timestamp t + 1 (right). In this scenario, the
the class distribution is highly skewed (e.g., 79% of all
                                                                                           traffic in cells A and B increases (from green to yellow),
the x̂t are in one class), and thus a simple majority clas-
                                                                                           which is a case where a Random Walk predictor would
sifier would get very good results in terms of accuracy,
                                                                                           fail. The MLN model, on the other hand, correctly fore-
according to Πunc . Therefore, more careful analysis is
                                                                                           casts the traffic classes for cells A and B by exploiting
needed.
                                                                                           spatial relationships, as most of the neighbors at time t
    We run experiments to test the Markov Logic predic-
                                                                                           belong to a high (yellow, orange or red) traffic class.
tor focusing on a subset of 23 cells2 , and we employed
the first half of the data (March) for training our system,
while the remaining part (April) was used as test set. For                                 5   Related Work
Markov logic, we used the Alchemy software,3 training
our model for 1,000 epochs with the voted perceptron al-                                   Fueled by the “recent” availability of telecoms’ CDR
gorithm. All the other software parameters were left to                                    data, a number of researchers try to estimating and pre-
their default values. We compared four different predic-                                   dicting the distribution of people across the city on this
tors (see Table 2). As a first baseline, we measured the                                   basis. The works in [7; 2; 14; 12] present approaches
                                                                                           to estimate the attractiveness of areas in the city from
   2
       We chose the cells whose id contains the prefix 3943.                               the combination of cellular network activity and other
   3
       http://alchemy.cs.washington.edu                                                    information sources. They try to estimate the location
                                                               vidual people mobility. Individual predictions are aggre-
                                                               gated to predict density. [8] deals with individuals CDR
                                                               data and build people profile in terms of home and work
                                                     High
                                                               locations. [4] deal with finer-grained GPS data and pre-
                                                               cisely model trajectories on a shorter time frame. We
                                                               think that both these aspects could take advantage of
                                                               the explicit spatial (inter-personal) relations that can be
                                                               expressed via Markov Logic.

                                                               6   Conclusion
                                                     Low
                                                               We presented a novel approach based on Markov Logic
                                                               [15] to predict cellular network traffic across the city. As
Figure 4: CDR traffic at time t (left) and t + 1 (right).      this traffic is a widely used proxy for people presence, our
Traffic at cells A and B increases, following the trend in     approach can be applied to predict crowd distribution.
the neighboring cells.                                         The distribution of people across the city and the predic-
                                                               tion of their aggregated mobility can find application in
                                                               a number of scenarios from transport and mobility to ur-
of cellular network traffic and to use it as a proxy of the    ban planning. Result show the effectiveness of the MLN
number of people in that area.                                 framework in this prediction task. MLN is able to take
   A similar approach is also adopted by Telefonica’s          advantage of spatial relationships among cells to perform
Smart Steps, Telecom Italia (TIM) City Forecast, and           a collective forecasting of people distribution across the
Vodafone mAnalytics platforms. All these approaches            city. While in the present work we modeled only rela-
estimate the present people distribution, and could fruit-     tions between neighboring cells, in our future work we
fully be enriched by our forecasting module.                   will take into account also relations between distant cells
   The work in [10] presents a comparison of multiple          (e.g., a large crowd in a stadium can impact the num-
algorithms for forecasting vehicular traffic. The data         ber of people of a train station in the future event if
used is based on ad-hoc road traffic sensors, but provides     the station is far away form the staduim). Moreover, we
a representation of vehicle counts that is roughly similar     will conduct a comprehensive analysis trying to compare
to our CDR counters. The algorithm presented in this           different algorithms for this task.
work extends the algorithms discussed in [10] to the case
of multi-class prediction.                                     Acknowledgment
   The work in [13] predicts people density on the basis of    Work supported by the POR-FESR 2014-2020 Project:
individual CDR data. To predict a user’s position, they        LUME PlannER - Tools per la pianificazione di viaggi
use a simple model based on previous most frequent lo-         sostenibili presso LUoghi storici, Musei, Eventi artistici
cations (in that day at that time). Since humans tend to       e culturali dell’Emilia Romagna
have very predictable mobility patterns [1, 4, 5], this sim-   Source of the Dataset: TIM Big Data Challenge 2015,
ple model turns out to give a good predictability base-        www.telecomitalia.com/bigdatachallenge
line. Our work deals with aggregated data that can be
processed in a much more privacy-compliant way. Never-         References
theless, we think that our Markov Logic approach could
be fruitfully extended to this other setting and could         [1] R. Becker, R. Caceres, K. Hanson, S. Isaacman,
improve performance by modeling relations between in-              J. M. Loh, M. Martonosi, J. Rowland, S. Urbanek,
dividuals.                                                         A. Varshavsky, and C. Volinsky. Human mobility
   The work in [9] and [18] deal with the problem of               characterization from cellular network data. Com-
people density prediction in urban areas on the basis              munications of the ACM, 56(1):74–82, 2013.
of CDR and GPS traces respectively. Both approaches            [2] F. Calabrese, F. Pereira, G. Lorenzo, L. Liang, and
are based on a recurrent (deep) neural network. Neu-               C. Ratti. The geography of taste: Analyzing cell-
ral networks for different regions are trained separately,         phone mobility and social events. In International
but both approaches introduce mechanisms (e.g., using              Conference on Pervasive Computing, Helsinki, Fin-
a shared layer between networks of neighbor areas) to              land, 2010.
take into account spatial dependencies. In our future
work we plan to better investigate and compare Markov          [3] F. Calabrese, C. Ratti, M. Colonna, P. Lovisolo, and
Logic and deep neural networks. A first comment we can             D. Parata. Real-time urban monitoring using cell
make is that Markov Logic encoding relations via first-            phones: a case study in rome. IEEE Transactions
order predicate is typically much more interpretable than          on Intelligent Transportation Systems, 12(1):141–
neural networks.                                                   151, 2011.
   [8] and [4] presents a set of mechanisms to compute         [4] Z. Fan, X. Song, R. Shibasaki, and R. Adachi. City-
people density on the basis of advanced models of indi-            momentum: An online approach for crowd behavior
    prediction at a citywide level. In Ubicomp, Osaka,     [18] J. Zhang, Y. Zheng, D. Qi, R. Li, X. Yi, and T. Li.
    Japan, 2015.                                                Predicting citywide crowd flows using deep spatio-
                                                                temporal residual networks. Artificial Intelligence,
[5] L. Ferrari and M. Mamei. Discovering city dynamics
                                                                259:147 – 166, 2018.
    through sports tracking applications. IEEE Com-
    puter, 44(12):61–66, 2011.
[6] L. Ferrari, M. Mamei, and M. Colonna. Discov-
    ering events in the city via mobile network analy-
    sis. Journal of Ambient Intelligence and Humanized
    Computing, 5(3):265–277, 2014.
[7] F. Girardin, A. Vaccari, A. Gerber, A. Biderman,
    and C. Ratti. Quantifying urban attractiveness
    from the distribution and density of digital foot-
    prints. International Journal of Spatial Data In-
    frastructure Research, 4:175–200, 2009.
[8] S. Isaacman, R. Becker, R. Cceres, M. Martonosi,
    J. Rowland, A. Varshavsky, and W. Willinger. Hu-
    man mobility modeling at metropolitan scales. In
    ACM International Conference on Mobile Systems,
    Applications, and Services (MobiSys), Low Wood
    Bay, Lake District, UK, 2013.
[9] V. Liang, R. Ma, W. S. Ng, L. Wang, M. Winslett,
    H. Wu, S. Ying, and Z. Zhang. Mercury: Metro
    density prediction with recurrent neural network on
    streaming cdr data. In IEEE International Confer-
    ence on Data Engineering, Helsinki, Finland, 2016.
[10] M. Lippi, M. Bertini, and P. Frasconi. Collective
     traffic forecasting. In ECML/PKDD Proceedings,
     Barcelona, Spain, 2010.
[11] M. Mamei and M. Colonna. Estimating attendance
     from cellular network data. Internaional Journal of
     Geographic Information Science, 30(7):1281–1301,
     2016.
[12] J. Neumann, M. Zao, A. Karatzoglou, and
     N. Oliver. Event detection in communication and
     transportation data. Pattern Recognition and Im-
     age Analysis, 7887:827–838, 2013.
[13] N. Ponieman, A. Salles, and C. Sarraute. Human
     mobility and predictability enriched by social phe-
     nomena information. In IEEE/ACM International
     Conference on Advances in Social Networks Analy-
     sis and Mining, Niagara, Ontario, Canada, 2013.
[14] D. Quercia, G. D. Lorenzo, F. Calabrese, and
     C. Ratti. Mobile phones and outdoor advertising:
     Measurable advertising. IEEE Pervasive Comput-
     ing, 10(2):28–36, 2011.
[15] M. Richardson and P. Domingos. Markov logic net-
     works. Machine Learning, 62(1):107–136, 2006.
[16] C. Song, Z. Qu, N. Blumm, and A. Barabsi. Lim-
     its of predictability in human mobility. Science,
     327(5968), 2010.
[17] F. Zambonelli. Toward sociotechnical urban super-
     organisms. IEEE Computer, 45(8):76–78, 2012.