Suggesting Just Enough (Un)Crowded Routes and
Destinations
Claudia Cavallaro, Gabriella Verga, Emiliano Tramontana and Orazio Muscato
Department of Mathematics and Computer Science, University of Catania, Italy


                                      Abstract
                                      Though people like to visit popular places, for health-related concerns and due to the recent restrictions
                                      adopted around the world, gatherings should be avoided. When planning a trip, one has to consider
                                      both attractiveness in terms of general interest for the destinations, and the density of people gathering
                                      there. In this work, we propose a recommendation system aiming at offering users some suggestions on
                                      useful routes and destinations that balance both liveliness and overcrowding. Firstly, we use datasets
                                      storing GPS positions as a basis for the statistics on routes and destinations. Then, we use an accurate
                                      probability algorithm that estimates the number of people moving from one place to another in the city
                                      and accordingly we show a list of destinations to users. The destination points are filtered based on the
                                      user’s preference on the density of people. A multi-agent system is used to handle the user requests
                                      to find a route for a trip, statistics on possible destinations, and suggestions to users. Thanks to our
                                      solution we can inform users on suitable routes and destinations, as well as alert them when a preferred
                                      destination is overcrowded.

                                      Keywords
                                      GPS trajectory, Recommendation systems, Movement predictions, Multi-agent system


1. Introduction
Currently, organising a trip should take into account the number of people that will gather in
the chosen destination points, since it is necessary to avoid visiting a place that will become
overcrowded to comply with the restrictions due to the Covid-19 influenza pandemic. Hence,
an estimate of the number of people that will be in some place in a future time can be valuable
for people moving and in situations where they could choose visiting some other place.
   In previous works, the statistics accumulated over time are used to estimate a measure of
traffic or gatherings [1, 2, 3, 4, 5]. Moreover, both popular online services, and other apps just
count the number of people currently present in some place [6, 7, 8, 9]. However, statistics
gathered in the past cannot be a reliable indication for the current situation that has to cope
with e.g. restrictions on gatherings, lower capacity of public transport means, etc. due to the
influenza pandemic. Additionally, a kind of real time measures of gatherings do not let other
people plan their trip, hence understanding whether e.g. one hour later when arriving at the
destination, the place will still be (un)crowded. A better estimate is therefore needed which

WOA 2020: Workshop “From Objects to Agents”, September 14–16, 2020, Bologna, Italy
" claudia.cavallaro@unict.it (C. Cavallaro); gabriella.verga@unict.it (G. Verga); tramontana@dmi.unict.it
(E. Tramontana); muscato@dmi.unict.it (O. Muscato)
 0000-0002-7169-659X (E. Tramontana)
                                    © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                      237
Claudia Cavallaro et al.                                                                   237–251


takes into account: (i) the current amount of people in some place, and (ii) the statistics on the
number of people that being in some origin place typically flow to another place to visit later on.
Moreover, an app behaving as an assistant agent is needed to timely inform interested people.
   This paper proposes an approach to determine the probability that users are moving along
some routes. Given the recordings of several user positions, we compute the probability for
a user that being in some place 𝐴 will move to another place 𝐵 (i.e. a possible destination),
hence when arriving in place 𝐵 he will contribute to the number of people gathering there. By
computing beforehand the probability that he will go to place 𝐵 in a future time, we can guess
whether a place will become overcrowded. Our proposed estimation of people destinations
is based on the analysis of the co-occurrence of places statistically visited by an amount of
people greater than a threshold. Moreover, we propose to give users an app that will let them
know the amount of people that will gather to some areas that can be reached from the point
where the user is. The app provides us support for suggesting the user possible destinations
that are viable both in terms of distance, usefulness, and gathering. Moreover, the app provides
means to collect data on the current amount of people in some place and then their trajectories.
When collecting user data by means of the app we make sure that user privacy is preserved by
providing only an approximate location to a central server.
   Our approach can be useful in many contexts where estimating the number of people before-
hand can be a crucial factor for a better service, such as e.g. when organising public transport, or
for retailer, etc. Moreover, it could be enriched with data, coming from proper authorities, that
reveal some places where a Covid-19-positive has been found. Then, by using our computed
trajectories, we could give for other places the probabilities of having infection spreads.
   The paper is organised as follows. Next section describes the related work. Section 3 explains
our proposed solution. Section 4 illustrates the experiments and shows the viability of our
approach. Finally, conclusions are drawn in Section 5.


2. Related work
This section offers an overview of the studies on multi-agent systems and the analysis of the
movement of people.
   Multi Agent System: a multi-agent system is a system with a significant number of independent
agents that interact with each other [10]. In recent years, multi-agent systems have been
widely used as being regarded suitable for systems with a modular architecture, thanks to their
independence [10]. Generally, agents interact in three ways [11]: (i) each agent can communicate
directly with any other agents (“autonomous agents”); (ii) agents communicate indirectly with
each other via an intermediary (“facilitator”); (iii) all agents communicate with each other via an
intermediary, however the agents can communicate with each other after the communication
has been set up by the intermediary (“Mediator”). In the second case, the robustness can be
poor and the overhead is relatively high but the intermediary acts as a protective wall for users
privacy because agents do not communicate directly and it processes the information received
from the users, decreasing their work [12]. On the other hand, the use of an intermediary has
several advantages in terms of synchronization, reusability, scalability and modularity [13, 14].
   Mobility monitoring: an accurate monitoring of user mobility provides support for efficient


                                                238
Claudia Cavallaro et al.                                                                  237–251


resource usage. E.g., it could help avoid traffic congestion [1, 2, 3], give warnings or make tar-
geted advertising by discovering the next place to visit [15], or simply study user behaviour [16].
The approach described in [4] identifies the routine behaviour of two sample of people by using
a probabilistic approach (Latent Dirichlet Allocation) to extract 10 different positions shared
by multiple users. Other approaches identify people movements. In [5], the proposed system
observes GPS data in the urban area of Milan and creates an origin-destination matrix, then
provides the similarity between two trajectories, leading to the discovery of mobility behaviours.
   Moreover, to find typical trajectories of a group of people, the approach in [17] identifies
flows using a grid with the Apriori algorithm [18], however noting only flows and not the
frequently visited points of interests, nor the probability of some people moving to another
place. In [19], the flows are identified by associating them with spatio-temporal trajectories
shared by multiple users heading in the same direction. Finally, other approaches use machine
learning techniques, e.g. the approach in [20] is based on a clustering algorithm, and uses an
unsupervised learning solution for an automatic lane detection in multidirectional pedestrian
flows.
   Our proposed work uses a multi-agent system communicating with a server that acts as an
intermediary and predicts movements by means of an innovative and reliable mathematical
solution. Unlike the work presented in [4], our goal is to detect the foreseeable routes by
computing their probability, instead their method determines the probability that a group of
users is moving together. The work presented in [20] uses an unsupervised learning approach
and aggregates instantaneous information on the position and speed of pedestrians to form
clusters, calculated on short time windows. We use a fixed grid to group people positions into
cells, and then compute the probabilities using the data arriving from the agents in real time.
The paper [21] presents personalised recommendations for guiding tourists through the city of
Melbourne by observing their actions. This system is modelled as a Markov decision process
that recommends the user in sequence the next place to visit. However, unlike the StayPoint
analysis presented here, it does not consider the stationary nature of visitors over a period
of time and this is a key element in avoiding overcrowding. In [17] the frequent corridors, i.e.
routes, were found on a grid through Apriori algorithm [18], while in this work we initially find
the areas having highly visited points of interest, hence giving great importance to the user
stay time. Moreover, we consider the Confidence and Lift metrics used in the Market Basket
Analysis to know the probability of displacement and therefore predict the contagion areas that
are continuously updated.


3. Proposed Approach
We aim at determining the probability that users are moving from one point to another point.
Such a probability is then used to provide recommendations accordingly. Recommendations,
alerts, or user requests, are communicated by means of a smartphone app. Therefore, our
proposed solution comprises two parts: (i) an algorithm that determines the probability of people
movements, and (ii) an app on the users device to track movements and suggest destinations.


                                               239
Claudia Cavallaro et al.                                                                    237–251


3.1. Determining Probability of Movement
To determine the probability of movement we perform two steps. The first step consists in
determining shared people flows from the points recorded during the previous movements of
each user. Then, the second step consists in obtaining statistics on the amount of people that
being in point 𝐴 subsequently go to point 𝐵.
   By analysing every GPS trajectory, i.e. the set of recorded GPS points temporally ordered,
we extracted its StayPoints (𝑆𝑃 𝑠). They are the centres of the areas within which a user stays
for more than a certain time: for some reason that area is of interest. Then, the geographical
area where the 𝑆𝑃 𝑠 of all dataset are located has been discretised by means of a grid, made up
of equal Square Cells. Each determined 𝑆𝑃 has been associated with a single square cell if it
is contained in that space. Sure, a cell could contain multiple 𝑆𝑃 𝑠 if these are close enough,
depending on the width of the cell. Cells that did not contain any 𝑆𝑃 have not been considered.
   We then determined the subset of frequently visited cells consisting of all the cells that having
at least one 𝑆𝑃 within them have been visited at least by 10% of the people. For the sake of
reliability, we compute only the statistics between the frequently visited cells, and we consider
the Confidence as used by the Market Basket Analysis. Confidence denotes the percentage of
trajectories frequently visiting a cell 𝐵 which also frequently visit cell 𝐴. I.e. for a value of
Confidence higher than a threshold (set as 60% in our experiments), we can assert that a large
group of people having visited cell 𝐴 moves together to cell 𝐵. Confidence is an estimation of
conditioned probability. Two or more cells for which there is a Confidence higher than 60%
that have been visited by a large group of people are dubbed co-visited cells.
   Then, we check the reliability of the association rules obtained (𝐴 ⇒ 𝐵) through Lift, which
will confirm that the transition of a user from the SP in 𝐴 to the 𝑆𝑃 in 𝐵 has a positive
correlation.

3.2. A Multi-Agent Recommendation System
In general, an agent, according to Wooldridge [22], is merely "a software (or hardware) entity that
is situated in an environment and is able to autonomously react to changes in that environment".
Each agent has the basis to learn and communicate, and in our case, learning takes place by
capturing the user GPS positions and, communication is realised by connecting to a centralised
server, which alerts all agents when needed and stores the geographical coordinates of the
points visited by users. Figure 1 shows the main components of the proposed multi-agent
system.
   An agent runs on a smartphone as an app in order to receive suggestions on possible desti-
nation. The agent offers recommendations highlighting any ‘warm’, that is very crowded, or
‘cold’, that is uncrowded, place, using the statistics gathered as described in the previous section.
For this, the agent periodically reads the user position and checks whether a known StayPoint
(SP) is nearby. Then, the agent communicates to the server whether it is close to a SP. This lets
the agents contribute in determining the number of people close to a SP, rather than giving
their actual position, hence preserving the user’s privacy. In this context, the privacy protection
is intended to prevent the disclosure of information relating to the exact location of the user.
Figure 2 shows the app providing information to the user.


                                                240
Claudia Cavallaro et al.                                                                     237–251


Figure 1: System architecture showing the interaction between agents and server: each agent sends his
preference for crowded places and where he is, the server gathers data and creates recommendations.


Figure 2: User communicates with agents via application GUI. The left panel shows the list of destina-
tions suggested by the multi-agent system and the right panel is the administration view where the
user gives his preferences on (un)crowded places. The colour of the icons represents the intensity of
crowding, that is, more (less) red equals more (less) crowded.


   The server, having acquired by the user the position of the nearest SP, returns the list of
other SPs that could be visited according to the probability estimate of passing through that
point (0 equals low probability, 1 equals high probability). In this way, we create a Collaborative
Filtering based recommendation system [23], as it is based on the choices of other users.
   Finally, the user, through an administration panel, can set with a flag, if he prefers ’warm’ or
’cold’ places. Thus, the agent, based on the users choice and the list received from the server,
determines information to show and then suggest. Destinations are displayed as a map or a


                                                 241
Claudia Cavallaro et al.                                                                                237–251


list. All agents are independent of each other and since they extrapolate data directly from the
device they are reliable, making the architecture stable and trustworthy.


4. Experiments
This section describes the experiments carried out using a dataset that collects real movements
from one part of the city to another by taxis and/or people. Positions have been gathered by
periodically reading geo-coordinates from tablets or smartphones. The experiments focus on
data analysis for determining the probabilities of moving from one StayPoint to another as
described in Section 3. This approach is used in our centralized server in order to select the
list of suggestions to send to the agents. The used dataset allows us to simulate the behaviour
of a reasonable number of users, showing the useful of the app. Over time, data are updated
as provided by agents. Below we describe the dataset used in our experiments, then the tests
carried out, and the results that have been found.

4.1. Dataset
The dataset used to perform our tests is Cabspottingdata [24] and includes the trajectories
collected in May 2008 by 536 taxis, for a total of 11, 219, 424 GPS points. Cab mobility traces
are provided by the Exploratorium—the museum of science, art and human perception through
the cabspotting project1 . To gather data each vehicle was outfitted with a GPS tracking device
that was used by dispatchers to efficiently reach customers. Data were sent from each cab
to a central receiving station, and then delivered in real-time to dispatch computers via a
central server. Each mobility trace file, associated to a taxi ID, contains in each line: latitude,
longitude, occupation, timestamp. Where latitude and longitude are in decimal degrees, the
occupation indicates whether a taxi has a fare (1 = busy, 0 = free) and the time is in the UNIX
era format. The area covered by these routes corresponds to the county of San Francisco of
USA and its surroundings in California, with maximum and minimum longitude and latitude
= [−127.08143; 32.8697] x [−115.56218; 50.30546]. The total size of the trajectories registered
with a customer on the taxi consists of 5, 017, 659 points.

4.2. Tests carried out to find flows and StayPoints
A trajectory 𝑇 is an ordered sequence of GPS points, in which the positions occupied (for
example by a vehicle) and the timestamps associated with them are recorded chronologically.
Each position is represented by latitude and longitude of the geographical point. 𝑇 = {𝑝0 =
(𝑥0 , 𝑦0 , 𝑡0 ), 𝑝1 = (𝑥1 , 𝑦1 , 𝑡1 ), . . . , 𝑝𝑛 = (𝑥𝑛 , 𝑦𝑛 , 𝑡𝑛 )}, where ∀𝑖 ∈ [0, 𝑛], 𝑝𝑖 = (𝑥𝑖 , 𝑦𝑖 , 𝑡𝑖 ) with
𝑡𝑖 < 𝑡𝑖+1 , and 𝑥𝑖 , 𝑦𝑖 and 𝑡𝑖 represent longitude, latitude, and timestamp, respectively.
   The first step was the data cleaning in order to eliminate noise, due for example to GPS
errors. It was performed by computing the instantaneous speed of each point of the rides
recorded on the taxi. The maximum acceptable speed threshold has been set for 150 𝑘𝑚                      ℎ . The
trajectories have been summarised for the comparison of the distance, considering for each

    1
        http://cabspotting.org


                                                       242
Claudia Cavallaro et al.                                                                    237–251


path two successive points in temporal order only if they were at a minimum distance of 140
𝑚. This was done to decrease the size of the dataset and therefore will allow a reduction in the
execution time of the algorithm.
   To carry out a statistical analysis, 90% of trajectories were randomly selected, and this set
was the Train set for the the flow detection algorithm. The complementary set, that is the
remaining 10% of the trajectories, consists in the Test set, that is the verification set. We
considered 6 time slots of 4 hours each, to visualise the movement of the vehicles at different
times and the trajectories were therefore split according to the 6 time slots. To identify the
sub-trajectories common to different users in the same time slot, a maximum tolerance distance
was set between two different points of different users as 280 𝑚. The distance between two
points was computed by using the Haversine distance, which given two points 𝑃𝑖 (𝑙𝑡𝑖 , 𝑙𝑔𝑖 ) and
𝑃𝑗 (𝑙𝑡𝑗 , 𝑙𝑔𝑗 ) characterised by latitude and longitude in decimal degrees returns their distance in
meters considering the curvature of the earth:
                                        √︂
                                                𝑙𝑡𝑖 − 𝑙𝑡𝑗                        𝑙𝑔𝑖 − 𝑙𝑔𝑗
                 𝑑(𝑃𝑖 , 𝑃𝑗 ) = 2𝑅 arcsin sin2             + cos 𝑙𝑡𝑖 cos 𝑙𝑡𝑗 sin2
                                                    2                                2
where 𝑅 is the mean radius of the earth.
   We define flows as close sub-trajectories, belonging to different users, spatially similar and
recorded in the same time slot. The density of a flow is the number of users that pass through it.
For detecting flows in this dataset, the minimum density threshold was set to 25. According to
these parameters, 12 flows were identified, ranging from 1 to 2 km in length. The minimum
density of the flows found is 26 taxis, while the maximum density found is 192 taxis. Then, by
taking the complement of the trajectory sample (10% of the taxis, as the test set) we checked
where their GPS points were compared to the previous train set. We found that the points of
the test set intersect with the 12 paths identified on the train set. Another check was carried
out by confirming the correspondence of the points of the flows on a map. It consists of the
process of matching the coordinates of the obtained flows and the road segments, and assessing
that there are no external points with respect to road segments (see Figure 3).
   We apply the StayPoint detection algorithm to each trajectory (more details can be found
in [25]) with time threshold, TimeThr, equal to 10 minutes, and distance threshold, DistThr,
100 meters. Such thresholds should suffice to select the positions in which a user dwells (in
several SPs) as he finds the place interesting, and removes the locations where a user is stopping
because e.g. he is blocked at the traffic light.
   The execution time for the StayPoints detection algorithm on the whole Cabspotting dataset
(536 taxis and more than 11𝑀 points) was 36 minutes and 54 𝑠. We obtained a total of 4261
𝑆𝑃 𝑠, which is an average of 8 𝑆𝑃 𝑠 per vehicle journey. The results show that 98% of users
have at least one StayPoint associated with their trip (523 users out of 536). The implementation
of StayPoints detection algorithm used Python and the experiments were executed in a host
having an Intel Xeon CPU E5-2620 v3 2.40GHz, with RAM 32 GB.
   Figure 4 shows the recorded trace for each trip in blue, and the detected SPs in yellow. Figure 5
shows the detected flows in magenta and the SPs in that area in green.


                                                243
Claudia Cavallaro et al.                                                                 237–251


Figure 3: Flows detected for the Cabspotting dataset.


4.3. Movement prediction
For predicting the movements of people, firstly a grid was built, which covers the map, made
up of square cells with a side of 1 𝑘𝑚. Such a grid lets us discretise the data and estimate the
probability of movement from a cell having some SPs inside it to another cell also having at least
one SP. Two distinct geographical areas comprising some SPs are represented as two square
cells without intersection, therefore a space partition is formed. Figure 6 shows such a grid,
having size 80 × 46 cells (latitude by longitude), and the obtained SPs are mapped in to the grid


                                                244
Claudia Cavallaro et al.                                                                   237–251


Figure 4: Blue points for trajectories and StayPoints obtained in yellow.


and shown as red dots. Some areas consisting of nearby cells have many more SPs than others,
hence red dots are more dense in some areas than others, as shown in the said figure.
  In order to determine whether a cell 𝐴 is a frequent destination, the Support for each cell
was calculated. The Support is the ratio between the number of trajectories that contain the
cell and the total number of trajectories. If this ratio exceeds a certain threshold, i.e. if cell 𝐴


                                                 245
Claudia Cavallaro et al.                                                                    237–251


Figure 5: A zoomed in map of an area in Figure 4, showing flows in magenta and the nearby StayPoints
in green.


is crossed by a certain number of different trajectories (10% value was chosen for Minimum
Support, i.e. 0.1), then the cell (containing one or more SPs) will be a frequently visited cell.
   Our experiments on the above said taxi dataset have shown that there are 43 cells visited by
a number of users greater than or equal to 52. I.e. we can say that in the dataset there are 43
frequently visited 𝑆𝑃 𝑠 cells. This means that there has been a probable meeting in that cell, as
users have remained stationary in the same time slot in the same cell. Data are updated in real
time through the agents running on smartphones as an app, therefore the Minimum Support
is fixed, however the number of frequent cells in output and the position of these frequently
visited cells will vary over time.
   By lowering the Minimum Support, i.e. the threshold of the minimum amount of people
sharing the same cell, the number of cells considered as having a sufficient amount of people
will increase and then the number of cells considered overcrowded will increase. In order to
compute association rules only between the frequently visited cells in the dataset, we considered


                                                246
Claudia Cavallaro et al.                                                                     237–251


Figure 6: Grid formed by squared cells of 1 𝑘𝑚 per side, each red dot represent a cell having at least
one SP.


the Confidence of the Market Basket Analysis for our approach.
  Given two cells called 𝐴 and 𝐵 we have that:
                                                    𝐹 𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦(𝐴, 𝐵)
                              𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴 ⇒ 𝐵) =
                                                           𝑁
the Support of the association rule (𝐴 ⇒ 𝐵) denotes the percentage of trajectories containing
𝐴 which contain also 𝐵, where 𝑁 is the total number of trajectories.
                                                        𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴 ⇒ 𝐵)
                            𝐶𝑜𝑛𝑓 𝑖𝑑𝑒𝑛𝑐𝑒(𝐴 ⇒ 𝐵) =
                                                          𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐴)
Hence, confidence is an estimation of conditioned probability, which can be expressed as follows:
                                                       𝑝(𝐴 ∩ 𝐵)
                           𝐶𝑜𝑛𝑓 𝑖𝑑𝑒𝑛𝑐𝑒(𝐴 ⇒ 𝐵) =                 = 𝑃 (𝐵|𝐴).
                                                         𝑝(𝐴)
   In Market Basket Analysis, Confidence is the probability of purchasing item 𝐵, said consequent,
given the purchase of object 𝐴, said antecedent, within the same transaction. The higher the
Confidence, the greater the reliability of the (𝐴 ⇒ 𝐵) rule (more details can be found in [26]).
In our context, the value computed as the Confidence(𝐴 ⇒ 𝐵) gives the probability that a user
is in a SP in cell 𝐵 moving there together with at least 10% of the total number of users, if he
has already been in cell 𝐴 and dwelling in one of its SPs.
   Going forward along this procedure, we compute Confidence(𝐴, 𝐵 ⇒ 𝐶) and after that
Confidence(𝐴, 𝐵, 𝐶 ⇒ 𝐷), in order to determine a common path that crosses several cells
having highly visited SPs. We compute
                                                        𝐹 𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦(𝐴, 𝐵, 𝐶)
                       𝐶𝑜𝑛𝑓 𝑖𝑑𝑒𝑛𝑐𝑒(𝐴, 𝐵 ⇒ 𝐶) =
                                                         𝐹 𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦(𝐴, 𝐵)
and so on.
  Therefore, the results obtained are useful to predict the number of gatherings on some place.
Moreover, given that there is knowledge about an infected person on some area, our results can


                                                 247
Claudia Cavallaro et al.                                                                 237–251


be used to predict whether a user can be potentially infected (as his trajectory is estimated),
and predict who else he will infect (i.e. people whose trajectories are expected to pass through
the same areas).
   The Confidence limit is due to the fact that it does not consider the Support of the item on
the right side of the rule and therefore does not provide a correct evaluation in case the groups
of items are not stochastically independent.
   A measure that takes this eventuality into account is Lift(𝐴 ⇒ 𝐵), defined as:

                                     𝐶𝑜𝑛𝑓 𝑖𝑑𝑒𝑛𝑐𝑒(𝐴 ⇒ 𝐵)    𝑝(𝐴 ∩ 𝐵)
                    𝐿𝑖𝑓 𝑡(𝐴 ⇒ 𝐵) =                      =
                                        𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐵)        𝑝(𝐴) * 𝑝(𝐵)

Lift(𝐴 ⇒ 𝐵) takes into account the importance (the Frequency) of 𝐵. Using such an amount,
then we can say

    • if 𝐿𝑖𝑓 𝑡 > 1 the events are positively correlated;
    • if 𝐿𝑖𝑓 𝑡 ≤ 1 the events are negatively correlated or independent.

   Therefore, Lift indicates how the occurrence of one event raises the occurrences of the
other. At this point, the setting would be a Minimum Confidence (0.6 i.e. 60%) to skim the
results and obtain only the association rules that had a higher Confidence and also a Support
higher than the Minimum Support (0.1 chosen), such a setting is named Strong rules. Finally,
these rules were checked with the Lift, the last column of Table 1. Then, for the Association
Rule ([2587] ⇒ [2588]) in row 6 the events of movement from cell 𝐴 to cell 𝐵 are negatively
correlated.
   The left panel in Figure 7 shows the plot of every Strong Rule obtained as a point, as a value
for its Support and Confidence (the latter according to the Support). For association rules with
higher support the Confidence, that is the probability of moving to the frequently visited cell 𝐵,
decreases. The Lift and the Confidence of the Strong Rules obtained are directly proportional,
as we can see in the right panel of Figure 7. The Pearson correlation coefficient between them
is 0.9999999999999999 and this implies an exact linear relationship.
   Moreover, the results tell us that the probability of transitioning from one cell with SPs to
another is high even in correspondence with the indicated flows and that different highly visited
cells having SPs belong to different flows. For each cell we have checked which taxis passed
there and which passed at a later time on other flows passing through other frequently visited
cells.


5. Conclusions
Having an educated guess on the amount of people that will gather in some place before
planning a trip can be very useful to avoid overcrowded places and to keep with the current
regulations. We have proposed an approach for predicting the probability of people moving
to some destinations when it is known that a certain amount of people is in some other place.
We use an app that senses the position of people and sends to a server such data. Then, such
an amount is useful, together with previous statistics, to estimating the amount of people in


                                               248
Claudia Cavallaro et al.                                                                237–251


Table 1
A set of Cells and the related Support (Sup), Confidence (Conf), and Lift, when cell B is 2588,
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝐵) = 0.671128, and having Strong Association Rules: Minimum Confidence 60% , Mini-
mum Support 10%


      (𝐴 ⇒ 𝐵)              𝐴     𝑆𝑢𝑝(𝐴)     𝑆𝑢𝑝(𝐴 ⇒ 𝐵)   𝐶𝑜𝑛𝑓 (𝐴 ⇒ 𝐵)      𝐿𝑖𝑓 𝑡(𝐴 ⇒ 𝐵)

          1          [1715]      0.395793    0.284895       0.719807         1.072533
          2          [1716]      0.313576    0.223709       0.713415         1.063008
          3          [2405]      0.151052    0.108987       0.721519         1.075084
          4          [2541]      0.137667    0.099426       0.722222         1.076132
          5          [2543]      0.242830    0.175908       0.724409         1.079391
          6          [2587]      0.435946    0.292543       0.671053         0.999888
          7          [2633]      0.145315    0.103250       0.710526         1.058704
          8          [2634]      0.281071    0.206501       0.734694         1.094715
          9          [2635]      0.235182    0.177820       0.756098         1.126607
         10          [2679]      0.202677    0.147228       0.726415         1.082379
         11          [2680]      0.242830    0.175908       0.724409         1.079391
         12       [1715, 1716]   0.149140    0.112811       0.756410         1.127073
         13       [2587, 1715]   0.156788    0.112811       0.719512         1.072094
         14       [2634, 2587]   0.141491    0.101338       0.716216         1.067183


another place at a later time. The experiments that we have performed on previously gathered
geographical locations have shown the viability and reliability of our approach.
   The more people use the app the more the approach would give a correct estimate. To make
the approach more robust, it could be extended in order to include data available online from
other services that give indications on queues, road traffic, gatherings.
   Future work will consider the geometry of stations, museums, etc. of some popular destina-
tions to compute the average distance of people given the estimated size of crowds. Moreover,
how alerts are spread will consider both the people already in some place and the people moving
towards it.


Acknowledgments
The authors acknowledge the support provided by means of a PO FSE 2014-2020 grant founded
by Regione Siciliana, and by project TEAMS–TEchniques to support the Analysis of big data in
Medicine, energy and Structures–Piano di incentivi per la ricerca di Ateneo 2020/2022.


                                              249
Claudia Cavallaro et al.                                                                          237–251


Figure 7: The left panel shows the Support vs Confidence for Strong Rules obtained by our analysis
and given as rows in Table 1; the points show a pair of cells (𝐴 and 𝐵), or a triple of cells (for the last
three rows of Table 1). The right panel shows Lift vs Confidence in this test, as for the points shown in
the left panel.


References
 [1] P. Castro, D. Zhang, S. Li, Urban traffic modelling and prediction using large scale taxi gps
     traces, in: Proceedings of of International Conference on Pervasive Computing, 2012, pp.
     57–72. doi:10.1007/978-3-642-31205-2_4.
 [2] J. Lee, J. Han, X. Li, A unifying framework of mining trajectory patterns of various
     temporal tightness, IEEE Transactions on Knowledge and Data Engineering 27 (2015)
     1478–1490.
 [3] Z. Wang, M. Lu, X. Yuan, J. Zhang, H. van de Wetering, Visual traffic jam analysis based
     on trajectory data, IEEE Transactions on Visualization and Computer Graphics 19 (2013)
     2159–2168.
 [4] N. Bicocchi, M. Mamei, Investigating ride sharing opportunities through mobility data
     analysis, Pervasive Mobile Computing 14 (2014) 83–94.
 [5] R. Trasarti, et al., Exploring real mobility data with m-atlas, in: Proceedings of Machine
     Learning and Knowledge Discovery in Databases, Springer, 2010, pp. 624–627.
 [6] J. Cranshaw, R. Schwartz, J. Hong, N. Sadeh, The livehoods project: Utilizing social media
     to understand the dynamics of a city, in: Proceedings of AAAI Conference on Weblogs
     and Social Media, 2012.
 [7] C. Berzi, A. Gorrini, G. Vizzari, Mining the social media data for a bottom-up evaluation
     of walkability, in: Proceedings of International Conference on Traffic and Granular Flow,
     Springer, 2017, pp. 167–175.
 [8] B. P. L. Lau, M. S. Hasala, V. S. Kadaba, B. Thirunavukarasu, C. Yuen, B. Yuen, R. Nayak,
     Extracting point of interest and classifying environment for low sampling crowd sensing
     smartphone sensor data, in: Proceedings of IEEE Pervasive Computing and Communica-
     tions, 2017.
 [9] C. Cavallaro, G. Verga, E. Tramontana, O. Muscato, Eliciting cities points of interest from
     people movements and suggesting effective itineraries, Intelligenza Artificiale (2020).
     doi:10.3233/IA-190040.


                                                   250
Claudia Cavallaro et al.                                                                 237–251


[10] K. P. Sycara, Multiagent systems, AI Magazine 19 (1998) 79. URL: https://www.aaai.org/
     ojs/index.php/aimagazine/article/view/1370. doi:10.1609/aimag.v19i2.1370.
[11] W. Shen, D. Norrie, Facilitators, mediators or autonomous agents, in: Proceedings of
     International Workshop on CSCW in Design, 1997, pp. 119–124.
[12] Q. Liu, L. Gao, P. Lou, Resource management based on multi-agent technology for cloud
     manufacturing, in: Proceedings of Electronics, Communications and Control (ICECC),
     IEEE, 2011, pp. 2821–2824.
[13] J. Z. Hernández, S. Ossowski, A. Garcıa-Serrano, Multiagent architectures for intelligent
     traffic management systems, Transportation Research Part C: Emerging Tech. 10 (2002)
     473–506.
[14] E. Tramontana, Minimising changes when refactoring applications to run multiple threads,
     in: Proceedings of IEEE Asia-Pacific Software Engineering Conference (APSEC), Nara,
     Japan, 2018, pp. 713–714.
[15] G. Verga, A. Fornaia, S. Calcagno, E. Tramontana, Yet another way to unknowingly gather
     people coordinates and its countermeasures, in: Proceedings of Internet and Distributed
     Computing Systems (IDCS), Springer LNCS 11874, Naples, Italy, 2019, pp. 130–139.
[16] A. Noulas, S. Scellato, N. Lathia, C. Mascolo, Mining user mobility features for next place
     prediction in location-based services, in: Proceedings of IEEE International Conference
     on Data Mining, 2012, pp. 1038–1043.
[17] C. Cavallaro, J. Vitrià, Corridor detection from large gps trajectories datasets, Applied
     Sciences 10 (2020) 5003. doi:10.3390/app10145003.
[18] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in:
     Proceedings of International Conference on Very Large Data Bases, Morgan Kaufmann,
     Los Altos, CA, 1994, pp. 478–499.
[19] E. Tramontana, G. Verga, Demo: Get spatio-temporal flows from gps data, in: Proceedings
     of IEEE International Conference on Smart Computing (SMARTCOMP), Taormina, Italy,
     2018, pp. 282–284.
[20] L. Crociani, G. Vizzari, A. Gorrini, S. Bandini, Identification and Characterization of Lanes
     in Pedestrian Flows Through a Clustering Approach, volume 11298, Springer Verlag, 2018,
     pp. 71–82. doi:10.1007/978-3-030-03840-3_6.
[21] F. de Nijs, G. Theocharous, N. Vlassis, M. M. de Weerdt, M. T. J. Spaan, Capacity-aware
     sequential recommendations, in: Proceedings of International Conference on Autonomous
     Agents and MultiAgent Systems (AAMAS), 2018, p. 416–424.
[22] M. Wooldridge, N. R. Jennings, Intelligent agents and multi-agent systems I, Applied
     artificial intelligence 9 (1995).
[23] P. Melville, V. Sindhwani, Recommender systems., Encyclopedia of machine learning 1
     (2010) 829–838.
[24] M. Piorkowski, N. Sarafijanovic-Djukic, M. Grossglauser, CRAWDAD dataset epfl/mobility
     (v. 2009-02-24), 2009. doi:10.15783/C7J010.
[25] C. Cavallaro, G. Verga, E. Tramontana, O. Muscato, Multi-agent architecture for point of
     interest detection and recommendation., in: Proceedings of Workshop from Objects to
     Agents (WOA), Parma, Italy, 2019, pp. 98–104.
[26] J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kauf-
     mann Publishers Inc., San Francisco, CA, USA, 2011.


                                               251