=Paper=
{{Paper
|id=Vol-2129/paper12
|storemode=property
|title=Predict Cellular Network Traffic with Markov Logic
|pdfUrl=https://ceur-ws.org/Vol-2129/paper12.pdf
|volume=Vol-2129
|authors=Marco Lippi,Marco Mamei,Franco Zambonelli
|dblpUrl=https://dblp.org/rec/conf/ijcai/0001MZ18
}}
==Predict Cellular Network Traffic with Markov Logic==
Predict Cellular Network Traffic with Markov Logic Marco Lippi, Marco Mamei, Franco Zambonellli Dipartimento di Scienze e Metodi dell’Ingegneria University of Modena and Reggio Emilia, Italy {marco.lippi,marco.mamei,franco.zambonelli}@unimore.it Abstract Table 1: Example of aggregated CDR data. Forecasting spatio-temporal data is a challeng- Timestamp Cell Type Count ing task in transportation scenarios involving 1/3/2015 13:00 3943 SMS In 4.34 agents. In this paper, we propose a statisti- 1/3/2015 13:00 3943 Call In 2.34 cal relational learning approach to cellular net- 2/3/2015 13:15 3943 Network 9.34 work traffic forecasting, that exploits spatial re- ... ... ... ... lationships between close cells in the network 30/4/2015 20:45 3943 Call Out 3.45 grid. The approach is based on Markov logic networks, a powerful framework that combines first-order logic and graphical models into a hybrid model capable of handling both uncer- tainty in data, and background knowledge of without any reference to the IDs of the people being in- the problem. Experimental results conducted volved). This kind of data presents a number of ready- on a real-world data set show the potential of to-market applications as privacy concerns are much re- using such information. The proposed method- duced (in contrast with CDRs with anonymized IDs). ology can have a strong impact in mobility de- In fact, all major telecommunication companies already mand forecasting and in transportation appli- have services for the analysis and commercial exploita- cations. tion of this data. While there are several works analyzing properties of aggregated CDR data and predicting user movements from individual CDRs [16], the task of pre- 1 Introduction dicting future density of CDRs from aggregated data is relatively under-explored. The widespread diffusion of mobile phones and cell net- works provides a practical way to collect geo-located in- Accordingly, the main contribution of this paper is to formation from a large user population. The analysis of present a novel approach based on Markov Logic (ML) such collected data can be a fundamental asset in the [15] to predict cellular network traffic across the city (as development of traffic management, urban planning and cellular network traffic is a widely used proxy for people transport applications [3; 5; 6; 1; 17]. In this work, we ex- presence, our approach can be applied to predict crowd plore the use of anonymized Call Detail Records (CDRs) distribution). The main advantage of ML with respect from a cellular network to estimate and predict the dis- to other time-series forecasting methods is that ML can tribution of people across the city. CDRs are generated easily model the relationship between different areas of every time a mobile phone interacts with the cellular the city imposing (probabilistic) constraints on how traf- network (e.g., to send/receive calls and text messages, fic in an area can influence traffic in another one. or to connect to the Internet). Each CDR contains in- formation about the identity of the mobile phone (typ- The rest of the paper is structured as follows: Sec- ically anonymized with an hashed id), its approximate tion 2 describes CDR data that are employed in our ex- location (i.e., the network cell where the phone is con- periments; Section 3 presents Markov Logic Networks nected) and a time stamp Accordingly, CDRs can serve (MLN), the main approach we used for our prediction as sporadic samples of the approximate locations of the task; Section 4 describes experiments applying MLN phone’s owner [11]. On the basis of such location sam- to a set of CDR data and discusses results; Section 5 ples, we estimate the distribution of people across the overviews related work in the area of CDR data analysis city and try to predict how they will move. and time-series forecasting; finally, Section 6 concludes We focus on aggregated CDR data (i.e., data mea- the paper and indicates some interesting directions for suring the number of CDRs generated from a region, future research. 100% more – as many people than usual), see Figure 2 (bottom). Finally, we discretized x̂t into a set of classes. Working with discrete values notably simplify computa- tions, without compromising the actual significance and interpretability of the results. To predict cellular network traffic, we apply Markov logic [15], a statistical relational learning method, to per- form collective classification on a grid of cells. While traditional machine learning classifiers typically treat the examples as independent and identically distributed, statistical reltional learning approaches are capable of taking into account relations and inter-dependencies be- Figure 1: Irregular grid tessellating the area tween the examples to be classified, so that a joint clas- sification spanning multiple examples can be performed. In particular, we aim to exploit the spatial relationships 1000 between cells, as the nature of CDR data is inherently relational along this dimension: at time t, the traffic at cdr 500 two cells c1 and c2 spatially close in the network will typically be strongly inter-related. 0 mar 30 apr 06 apr 13 apr 20 apr 27 time 4 3 Methodology 2 Inspired by the work in [10] for road traffic flow fore- casting, we modeled our domain with a set of logic pred- z 0 icates that describe the dynamics of CDR traffic data during time and across different cells. Supposing to dis- −2 mar 30 apr 06 apr 13 time apr 20 apr 27 cretize the amount of traffic in C classes, predicates Class0(cell,time), . . . , ClassC(cell,time) can be used to indicate the fact that, at a certain cell and at Figure 2: Example of data in a grid cell: (top) Origi- a given time, the traffic quantity falls in one of such nal behaviour extracted from mobile phone data. (bot- classes. A simple predicate Neighbors(cell,cell) in- tom) “standardized” score showing the deviation from dicates that two cells are spatially close in the grid. the mean of that cell at that time, in mean units. Time is modeled with predicate Next(time,time). Ad- ditional information about the day of the week and the 2 Dataset part of the day can be also easily modeled with logic predicates. Given a domain described in terms of logic We focus the analysis on aggregated CDR data that facts, a Markov Logic Network (MLN) consists in a set count SMSs, calls and Internet traffic over specific ar- of weighted rules that model relationships among the ex- eas of the city and at time intervals. Specifically, the isting predicates. For example, a simple predictor that geographic area under analysis is tessellated with an ir- forecasts, for cell c and time t, the same traffic class ob- regularly shaped grid, similar to a Voronoi tessellation. served at cell c at time t − 1, is obtained with the rules: Thus, the more cell network antennas present, the denser the grid (see Figure 1. Class0(c,t1) ∧ Next(t1,t2) => Class0(c,t2) The resulting dataset (illustrated in Table 1 is a set of ... counters that estimate, for each cell of the grid and each ClassC(c,t1) ∧ Next(t1,t2) => ClassC(c,t2) 15-minutes-interval, the number of SMSs, calls and In- Clearly, such rules are not always true, but they are ternet traffic. Counters can also be fractional to take into true with a certain probability. Given a collection of ob- account CDR interactions originating in a cell and end- servations of past events, the Markov logic framework ing in another one. The original data comprises about allows to learn the weights of such rules directly from 12 million records like the ones depicted. For each cell data. The higher the weight, the higher is a probability and CDR type, the typical plot resembles the one in that the rule will be true. Once the weights of the MLN Figure 2 (top) where it is possible to observe daily and have been learned, one can use the model to compute weekly patterns in city dynamics. the truth value of some query predicates. In the case of To better highlight variability in our data, for each this work, the aim is to forecast the dynamics of CDR cell, we computed the mean (µt ) of the time series in traffic in the future. Given the current state of the CDR that 15 minutes interval (i.e., the mean among all days traffic network, by performing inference over the MLN at that time) and obtain a “standardized” score by com- it is possible to retrieve the cell configuration that maxi- puting x̂t = (xt −µt )/µt . The resulting time series shows mizes the probability of the rules in the model. In order the deviation from the mean of that cell at that time, in to exploit spatial relationships, rules like the following mean units (e.g., x̂t = 1 means that there is twice – one can be used: ClassC(c1,t1) ∧ Next(t1,t2) ∧ Neighbors(c1,c2) => ¬ Class0(c2,t2) Such rule means that, if there is a high traffic (class C) in a certain cell c1 at time t1, then at the next time step 0.90 t2 it is unlikely that a neighbor cell c2 will have very low traffic (class 0). Complex relationships and dependencies can by modeled with such rules. Formally, a Markov logic network (MLN) is a set of ● first-order logic formulae F = {F1 , . . . , Fn }, each with an 0.80 attached real-valued weight w = {w1 , . . . , wn }. Given a finite set of constants C = {c1 , . . . ck } (the objects in the ● domain – in our case described above, cells and times- tamps), an MLN induces a Markov network (or Markov Π unc Πt Πs Πst random field) where nodes are ground atoms,1 and edges are present between nodes that appear together in at least one grounding of some formula. An MLN defines Figure 3: Predictability according to Fano’s inequal- a probability distribution over possible variable configu- ity associated with uncorrelated, time, spatial and spa- rations (i.e., worlds): tiotemporal correlated entropies. |F | 1 X P (X = x) = exp wj nj (x) (1) Table 2: Comparison of the classifiers employed in the Z j experimental study. We report accuracy and average F1 over the five classes. where nj (x) is the number of groundings that satisfy for- Predictor Accuracy Avg F1 mula Fj in x. As stated above, the higher is the weight Random 65.7 19.8 of a rule, the higher the probability that formula is sat- Majority 80.3 16.1 isfied. RW 74.3 38.8 In most of the application scenarios, some of the pred- MLN 77.4 40.6 icates (i.e., of random variables) are always observed, which means they are given as evidence, whereas oth- ∗ ers are to be guessed at prediction time, thus being ob- solution is to use the counts in the MAP state yw : in served only during the training phase. This is called a this way, MAP inference is called as a subroutine for discriminative setting. The truth value of query vari- each step of the learning algorithm. ables is obtained from the evidence variables, and from the weighted MLN, through a process of inference, that 4 Experiments computes the maximum a posteriori (i.e., the most prob- We focused the analysis on the province of Milan and able) configuration of such variables. we used aggregated CDR data collected over a period The set of parameters wj associated to the MLN rules of two months: from March 1, 2015 to April 30, 2015 can be learned with several different algorithms. In and including calls and SMS sent and received and net- a discriminative setting, MLN weights are learned by work traffic. In our experiments we sum together all maximizing the conditional log-likelihood (CLL) of query SMS and calls sent/received, while we do not consider atoms Y , given the evidence X. The conditional proba- network traffic (as it is expressed in a different format in bility P (Y = y|X = x) is defined as: our data). For each cell, at a given time t this sum is our ! 1 X xt . The area under analysis is tessellated in 1,419 cells. P (Y = y|X = x) = exp wi ni (x, y) (2) Cell areas can range from 0.04 Km2 in the city center to Zx i∈FY 40 Km2 in the suburbs. Temporal resolution is 15 min- where ni (x, y) is the number of groundings of formula i utes. For each cell, we compute the standardized score in the configuration (x, y) and FY is the set of first-order x̂t = (xt − µt )/µt and we discretized those values in 5 clauses containing query predicates Y . The gradient of classes associated to intervals: [−∞, 0.25], [0, 25, 0.50], the CLL can be computed as: [0.50, 0.75], [0.75, 1], [1, ∞] (i.e., the class [0.75, 1] con- ∂ tains those values having from 75% to 100% more traffic log Pw (y|x) = ni (x, y) − y0 Pw (y 0 |x)ni (x, y 0 )(3) P than the mean at that time). Overall, 79% of data fall ∂wi in the first class, 11% in the second one, 4% both in the = ni (x, y) − Ew [ni (x, y)] (4) third and fifth one, 2% in the fourth one. Exactly computing the expected number of true ground- Following the work in [16], we try to establish upper ings Ew [ni (x, y)] is an intractable problem, thus approx- bounds for the predictability of aggregate (discretized) imate algorithms are typically employed. One possible CDR behavior across cells. We compute different en- 1 tropy measures for each cell: (i) The random entropy In a ground atom all variables have been substituted by constants. srand = log2 N = 2.3 0 1 2 3 4 performance of a classifier that randomly predicts one of 0 48,015 3,738 931 437 976 the four classes, by drawing from a probability distribu- 1 4,257 1,894 381 110 122 tion that knows the true proportions between the classes 2 951 418 568 103 145 (called Random in Table 2). As a second baseline, we 3 470 134 164 111 191 employ a classifier that simply always predicts the most 4 1,163 163 194 208 1,546 frequent class (named Majority in Table 2), that is class 0 in our case, corresponding to low traffic. As a third Table 3: Confusion matrix of the MLN predictor. Rows: predictor, we use a Random Walk (RW in Table 2) that true labels, columns: predictions. produces as a forecast at time t the same class that was observed at time t − 1 (for each cell independently). Fi- N is the number of values exhibited by the cell – all cells nally, we employ an MLN exploiting spatial relationships have 5 values. between the cells. The task is to predict the status of (ii) The uncorrelated entropy the grid 15 minutes ahead in the future. We report both the accuracy and average F1 over the five classes (be- N X ing F1 of a single class the harmonic mean between its sunc = − pj ∗ log2 pj precision and recall). As already introduced when dis- j=1 cussing Πunc predictability, the accuracy is clearly dom- (iii) The time (Markov) correlated entropy inated by the most frequent class, that is present 80.3% of the times in the test set (which is, in fact, the ac- curacy of the Majority predictor), while the average F1 XX st = − p(x̂t , x̂t−1 )log2 p(x̂t |x̂t−1 ) gives the same importance to each of the five classes, x̂t x̂t−1 and it is thus more significant in this setting. For exam- (iv) The spatial correlated entropy ple, the Majority predictor achieves the best accuracy, X but actually it is a completely useless system, as it never ss = − p(x̂t , x̂1t ...x̂kt )log2 p(x̂t |x̂1t ...x̂kt ) predicts something different from the low-level traffic. x̂t It is interesting to see that the RW predictor already Where x̂1t ...x̂kt are values in neighbor cells. achieves a significant improvement over Random, thus (v) Finally, the spatio-temporal correlated entropy proving to be a very strong competitor. This behavior suggests that CDR traffic has a dynamic which changes XX smoothly through time, and 15 minutes ahead in the sst = − p(x̂t , x̂t−1 , x̂1t ...x̂kt )log2 p(x̂t |x̂t−1 , x̂1t ...x̂kt ) future is a short horizon to observe big changes in the x̂t x̂t−1 network configuration. Nevertheless, the MLN approach achieves better results than RW, both in terms of accu- Naturally, for each cell, we will have srand ≥ sunc ≥ racy, and of average F1 . Table 3 reports the confusion st , ss ≥ sst . We then compute the predictability Π as- matrix for the MLN model: rows/columns represent the sociated to each entropy according to Fano’s inequality true/predicted values, respectively (position i, j in the [16]. This is an upper bound for any algorithm predict- matrix indicates the number of examples of class i that ing x̂t . Πrand = 20% for all cells. Results for other are predicted to belong to class j). predictabilities are in Figure 3. For example, from these Figure 4 shows a case study in which spatial relation- results, we can infer that the upper bound for a classifier ships help to improve the accuracy of the predictions. using only temporal information (1 step – 15 minutes The traffic classes for the cells in the network are rep- back) is about 85% (median value). Despite these en- resented for some timestamp t (left), and for the sub- couraging predictability results, it is worth noting that sequent timestamp t + 1 (right). In this scenario, the the class distribution is highly skewed (e.g., 79% of all traffic in cells A and B increases (from green to yellow), the x̂t are in one class), and thus a simple majority clas- which is a case where a Random Walk predictor would sifier would get very good results in terms of accuracy, fail. The MLN model, on the other hand, correctly fore- according to Πunc . Therefore, more careful analysis is casts the traffic classes for cells A and B by exploiting needed. spatial relationships, as most of the neighbors at time t We run experiments to test the Markov Logic predic- belong to a high (yellow, orange or red) traffic class. tor focusing on a subset of 23 cells2 , and we employed the first half of the data (March) for training our system, while the remaining part (April) was used as test set. For 5 Related Work Markov logic, we used the Alchemy software,3 training our model for 1,000 epochs with the voted perceptron al- Fueled by the “recent” availability of telecoms’ CDR gorithm. All the other software parameters were left to data, a number of researchers try to estimating and pre- their default values. We compared four different predic- dicting the distribution of people across the city on this tors (see Table 2). As a first baseline, we measured the basis. The works in [7; 2; 14; 12] present approaches to estimate the attractiveness of areas in the city from 2 We chose the cells whose id contains the prefix 3943. the combination of cellular network activity and other 3 http://alchemy.cs.washington.edu information sources. They try to estimate the location vidual people mobility. Individual predictions are aggre- gated to predict density. [8] deals with individuals CDR data and build people profile in terms of home and work High locations. [4] deal with finer-grained GPS data and pre- cisely model trajectories on a shorter time frame. We think that both these aspects could take advantage of the explicit spatial (inter-personal) relations that can be expressed via Markov Logic. 6 Conclusion Low We presented a novel approach based on Markov Logic [15] to predict cellular network traffic across the city. As Figure 4: CDR traffic at time t (left) and t + 1 (right). this traffic is a widely used proxy for people presence, our Traffic at cells A and B increases, following the trend in approach can be applied to predict crowd distribution. the neighboring cells. The distribution of people across the city and the predic- tion of their aggregated mobility can find application in a number of scenarios from transport and mobility to ur- of cellular network traffic and to use it as a proxy of the ban planning. Result show the effectiveness of the MLN number of people in that area. framework in this prediction task. MLN is able to take A similar approach is also adopted by Telefonica’s advantage of spatial relationships among cells to perform Smart Steps, Telecom Italia (TIM) City Forecast, and a collective forecasting of people distribution across the Vodafone mAnalytics platforms. All these approaches city. While in the present work we modeled only rela- estimate the present people distribution, and could fruit- tions between neighboring cells, in our future work we fully be enriched by our forecasting module. will take into account also relations between distant cells The work in [10] presents a comparison of multiple (e.g., a large crowd in a stadium can impact the num- algorithms for forecasting vehicular traffic. The data ber of people of a train station in the future event if used is based on ad-hoc road traffic sensors, but provides the station is far away form the staduim). Moreover, we a representation of vehicle counts that is roughly similar will conduct a comprehensive analysis trying to compare to our CDR counters. The algorithm presented in this different algorithms for this task. work extends the algorithms discussed in [10] to the case of multi-class prediction. Acknowledgment The work in [13] predicts people density on the basis of Work supported by the POR-FESR 2014-2020 Project: individual CDR data. To predict a user’s position, they LUME PlannER - Tools per la pianificazione di viaggi use a simple model based on previous most frequent lo- sostenibili presso LUoghi storici, Musei, Eventi artistici cations (in that day at that time). Since humans tend to e culturali dell’Emilia Romagna have very predictable mobility patterns [1, 4, 5], this sim- Source of the Dataset: TIM Big Data Challenge 2015, ple model turns out to give a good predictability base- www.telecomitalia.com/bigdatachallenge line. Our work deals with aggregated data that can be processed in a much more privacy-compliant way. Never- References theless, we think that our Markov Logic approach could be fruitfully extended to this other setting and could [1] R. Becker, R. Caceres, K. Hanson, S. Isaacman, improve performance by modeling relations between in- J. M. Loh, M. Martonosi, J. Rowland, S. Urbanek, dividuals. A. Varshavsky, and C. Volinsky. Human mobility The work in [9] and [18] deal with the problem of characterization from cellular network data. Com- people density prediction in urban areas on the basis munications of the ACM, 56(1):74–82, 2013. of CDR and GPS traces respectively. Both approaches [2] F. Calabrese, F. Pereira, G. Lorenzo, L. Liang, and are based on a recurrent (deep) neural network. Neu- C. Ratti. The geography of taste: Analyzing cell- ral networks for different regions are trained separately, phone mobility and social events. In International but both approaches introduce mechanisms (e.g., using Conference on Pervasive Computing, Helsinki, Fin- a shared layer between networks of neighbor areas) to land, 2010. take into account spatial dependencies. In our future work we plan to better investigate and compare Markov [3] F. Calabrese, C. Ratti, M. Colonna, P. Lovisolo, and Logic and deep neural networks. A first comment we can D. Parata. Real-time urban monitoring using cell make is that Markov Logic encoding relations via first- phones: a case study in rome. IEEE Transactions order predicate is typically much more interpretable than on Intelligent Transportation Systems, 12(1):141– neural networks. 151, 2011. [8] and [4] presents a set of mechanisms to compute [4] Z. Fan, X. Song, R. Shibasaki, and R. Adachi. City- people density on the basis of advanced models of indi- momentum: An online approach for crowd behavior prediction at a citywide level. In Ubicomp, Osaka, [18] J. Zhang, Y. Zheng, D. Qi, R. Li, X. Yi, and T. Li. Japan, 2015. Predicting citywide crowd flows using deep spatio- temporal residual networks. Artificial Intelligence, [5] L. Ferrari and M. Mamei. Discovering city dynamics 259:147 – 166, 2018. through sports tracking applications. IEEE Com- puter, 44(12):61–66, 2011. [6] L. Ferrari, M. Mamei, and M. Colonna. Discov- ering events in the city via mobile network analy- sis. Journal of Ambient Intelligence and Humanized Computing, 5(3):265–277, 2014. [7] F. Girardin, A. Vaccari, A. Gerber, A. Biderman, and C. Ratti. Quantifying urban attractiveness from the distribution and density of digital foot- prints. International Journal of Spatial Data In- frastructure Research, 4:175–200, 2009. [8] S. Isaacman, R. Becker, R. Cceres, M. Martonosi, J. Rowland, A. Varshavsky, and W. Willinger. Hu- man mobility modeling at metropolitan scales. In ACM International Conference on Mobile Systems, Applications, and Services (MobiSys), Low Wood Bay, Lake District, UK, 2013. [9] V. Liang, R. Ma, W. S. Ng, L. Wang, M. Winslett, H. Wu, S. Ying, and Z. Zhang. Mercury: Metro density prediction with recurrent neural network on streaming cdr data. In IEEE International Confer- ence on Data Engineering, Helsinki, Finland, 2016. [10] M. Lippi, M. Bertini, and P. Frasconi. Collective traffic forecasting. In ECML/PKDD Proceedings, Barcelona, Spain, 2010. [11] M. Mamei and M. Colonna. Estimating attendance from cellular network data. Internaional Journal of Geographic Information Science, 30(7):1281–1301, 2016. [12] J. Neumann, M. Zao, A. Karatzoglou, and N. Oliver. Event detection in communication and transportation data. Pattern Recognition and Im- age Analysis, 7887:827–838, 2013. [13] N. Ponieman, A. Salles, and C. Sarraute. Human mobility and predictability enriched by social phe- nomena information. In IEEE/ACM International Conference on Advances in Social Networks Analy- sis and Mining, Niagara, Ontario, Canada, 2013. [14] D. Quercia, G. D. Lorenzo, F. Calabrese, and C. Ratti. Mobile phones and outdoor advertising: Measurable advertising. IEEE Pervasive Comput- ing, 10(2):28–36, 2011. [15] M. Richardson and P. Domingos. Markov logic net- works. Machine Learning, 62(1):107–136, 2006. [16] C. Song, Z. Qu, N. Blumm, and A. Barabsi. Lim- its of predictability in human mobility. Science, 327(5968), 2010. [17] F. Zambonelli. Toward sociotechnical urban super- organisms. IEEE Computer, 45(8):76–78, 2012.