<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">On Recommending Urban Hotspots to Find Our Next Passenger</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Luis</forename><surname>Moreira-Matias</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Instituto de Telecomunicac ¸ões</orgName>
								<orgName type="institution">University of Porto</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">LIAAD/INESC TEC</orgName>
								<address>
									<settlement>Porto</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department" key="dep1">Department of Informatics Engineering</orgName>
								<orgName type="department" key="dep2">Faculty of Engineering</orgName>
								<orgName type="institution">University of Porto</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ricardo</forename><surname>Fernandes</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Instituto de Telecomunicac ¸ões</orgName>
								<orgName type="institution">University of Porto</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">João</forename><surname>Gama</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">LIAAD/INESC TEC</orgName>
								<address>
									<settlement>Porto</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
							<affiliation key="aff4">
								<orgName type="department">Faculty of Economics</orgName>
								<orgName type="institution">University of Porto</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Michel</forename><surname>Ferreira</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Instituto de Telecomunicac ¸ões</orgName>
								<orgName type="institution">University of Porto</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
							<affiliation key="aff3">
								<orgName type="department" key="dep1">Department of Computer Sciences</orgName>
								<orgName type="department" key="dep2">Faculty of Sciences</orgName>
								<orgName type="institution">University of Porto</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">João</forename><surname>Mendes-Moreira</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">LIAAD/INESC TEC</orgName>
								<address>
									<settlement>Porto</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department" key="dep1">Department of Informatics Engineering</orgName>
								<orgName type="department" key="dep2">Faculty of Engineering</orgName>
								<orgName type="institution">University of Porto</orgName>
								<address>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luis</forename><surname>Damas</surname></persName>
							<affiliation key="aff5">
								<orgName type="institution">GEOLINK</orgName>
								<address>
									<settlement>Porto</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">On Recommending Urban Hotspots to Find Our Next Passenger</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F5D153410EC0AB4D8EBDCB2DC1A536EB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T07:17+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The rising fuel costs is disallowing random cruising strategies for passenger finding. Hereby, a recommendation model to suggest the most passengerprofitable urban area/stand is presented. This framework is able to combine the 1) underlying historical patterns on passenger demand and the 2) current network status to decide which is the best zone to head to in each moment. The major contribution of this work is on how to combine well-known methods for learning from data streams (such as the historical GPS traces) as an approach to solve this particular problem. The results were promising: 395.361/506.873 of the services dispatched were correctly predicted. The experiments also highlighted that a fleet equipped with such framework surpassed a fleet that is not: they experienced an average waiting time to pick-up a passenger 5% lower than its competitor.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The taxis became crucial for human mobility in medium/large-sized urban areas. They provide a direct, comfortable and speedy way to move in and out of big town centers -as complement to other transportation means or as a main solution. In the past years, the city councils tried to guarantee that the running vacant taxis will always meet the demand in their urban areas by emitting more taxi licenses than the necessary. As result, the cities' cores are commonly crowded by a huge number of vacant taxis -which take desperate measures to find new passengers such as random cruise' strategies. These strategies have undesirable side effects like large wastes of fuel, an inefficient traffic handling, an increase of the air pollution.</p><p>The taxi driver mobility intelligence is one of the keys to mitigate this problems. The knowledge about where the services (i.e. the transport of a passenger from a pick-up to a drop-off location) will actually emerge can truly be useful to the driver -especially where there are more than one competitor operating. Recently, the major taxi fleets are equipped with GPS sensors and wireless communication devices. Typically, these vehicles will transmit information to a data center about their location and the events undergoing like the passenger pick-up and drop-off. These historical traces can reveal the underlying running mobility patterns. Multiple works in the literature have already explored this kind of data successfully with distinct applications like smart driving <ref type="bibr" target="#b5">[Yuan et al., 2010]</ref>, modeling the spatiotemporal structure of taxi services <ref type="bibr" target="#b0">[Deng and Ji, 2011;</ref><ref type="bibr" target="#b2">Liu et al., 2009;</ref><ref type="bibr" target="#b5">Yue et al., 2009]</ref>, building passenger-finding strategies <ref type="bibr" target="#b2">[Li et al., 2011;</ref><ref type="bibr" target="#b2">Lee et al., 2008]</ref> or even predicting the taxi location in a passenger-perspective <ref type="bibr">[Phithakkitnukoon et al., 2010]</ref>. Despite their useful insights, the majority of the techniques reported are offline, discarding the main advantages of this signal (i.e. a streaming one).</p><p>In our work, we focus on the online choice problem about which is the best taxi stand to go to after a passenger dropoff (i.e. the stand where we will pick-up another passenger quicker). Our goal is to use the vehicular network communicational framework to improve their reliability by combining all drivers' experience. In other words, the idea is to forecast how many services will arise in each taxi stand based on the network past behavior to feed a recommendation model to calculate the best stand to head to. An illustration about our problem is presented in Fig. <ref type="figure" target="#fig_0">1</ref> (the five blue dots represent possible stands to head to after a passenger drop-off; our recommendation system outputs one of them as the best choice at the moment).</p><p>Such recommendation model can present a true advantage for a fleet when facing other competitors, which will work with less information than you do. This tool can improve the informed driving experience by transmitting to the driver which is the stand where 1) he will wait less time to get a passenger in; or where 2) he will get the service with the greatest revenue.</p><p>The smart stand-choice problem is based on four key decision variables: the expected price for a service over time, the distance/cost relation with each stand, how many taxis are already waiting at each stand and the passenger demand for each stand over time. The taxi vehicular network can be a ubiquitous sensor of taxi-passenger demand from where we can continuously mine the reported variables. However, the work described here will just address the decision process based on the last three variables.</p><p>In our previous work <ref type="bibr" target="#b3">[Moreira-Matias et al., 2012]</ref>, we already proposed a model to predict the spatiotemporal distribution of the taxi passenger demand (i.e. the number of services that will emerge along the taxi stand network). This study departed from this initial work to extend it along three different dimensions:</p><p>1. The Recommendation System: we use these predictions as input to a Recommendation System that also accounts the number of taxis already in a stand and the distance to it. Such framework will improve the taxi driver mobility intelligence in real time, helping him to decide which is the most profitable stand in each moment. It will be based not only in his own past decisions and outcomes, but on a combination of everyone experience, taking full advantage of the ubiquitous characteristics of the vehicular communicational networks. 2. Test-bed: Our experiments took advantage of the vehicular network online information to feed the predictive framework. Moreover, the recommendation performance was evaluated in real-time, demonstrating its robustness and its ability to learn, decide and evolve without a high computational effort; 3. Dataset: 506.873 services were dispatched to our 441 vehicle fleet during our experiments. This large scale test was carried out along 9 months. There are some works in the literature related with this problem, namely: 1) mining the best passenger-finding strategies <ref type="bibr" target="#b2">[Li et al., 2011;</ref><ref type="bibr" target="#b2">Lee et al., 2008]</ref>, 2) dividing the urban area into attractive clusters based on the historical passenger demand (i.e.: city zones with distinct demand patterns) <ref type="bibr" target="#b0">[Deng and Ji, 2011;</ref><ref type="bibr" target="#b2">Liu et al., 2009;</ref><ref type="bibr" target="#b5">Yue et al., 2009]</ref> and even 3) predicting the passenger demand at certain urban hotspots <ref type="bibr" target="#b2">[Li et al., 2012;</ref><ref type="bibr" target="#b2">Kaltenbrunner et al., 2010;</ref><ref type="bibr" target="#b0">Chang et al., 2010]</ref>. The major contribution of this work facing this state-of-the-art is to build smart recommendations about the taxi stand to head to in an online streaming environment (i.e. real-time; while the taxis are operating) based not only on their historical trace but also on the current network status. In fact, the reported works present offline frameworks and/or test-beds or just account a low number of decision variables.</p><p>The results were obtained using two distinct test-beds: firstly, (1) we let the stream run continuously between August 2011 and April 2012. The predictive model was trained during the first five months and it was stream-tested in the last four. Secondly, (2) we used a traffic simulator to test if our Recommendation System could beat the drivers' expected behavior. We simulated a competitive scenario -with two fleets -using the services historical log and on the existing road network system. The obtained results validated that our method can effectively help the drivers to decide where they can achieve more profit.</p><p>The remainder of the paper is structured as follows. Section 2 formally presents our predictive model while Section 3 details our recommendation one. The fourth section describes our case study, how we acquired and preprocessed the data used as well as some statistics about it. The fifth section describes how we tested the methodology in a concrete scenario: firstly, we introduce the two experimental setups and the metrics used to evaluate both models. Then, the obtained results are detailed, followed by some important remarks. Finally, conclusions are drawn.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The Predictive Model</head><p>In this section we present some relevant definitions and a brief description of the predictive model on taxi passenger demand. The reader should consult the section II in <ref type="bibr" target="#b3">[Moreira-Matias et al., 2012]</ref> for further details. Let S = {s 1 , s 2 , ..., s N } be the set of N taxi stands of interest and D = {d 1 , d 2 , ..., d j } a set of j possible passenger destinations. Our problem is to choose the best taxi stand at the instant t according with our forecast about passenger demand distribution over the time stands for the period [t, t + P ].</p><p>Consider X k = {X k,0 , X k,1 , ..., X k,t } to be a discrete time series (aggregation period of P-minutes) for the number of demanded services at a taxi stand k. The goal is to build a model which determines the set of service counts X k,t+1 for instant t + 1 and per taxi stand k ∈ {1, ..., N }. To do so, three distinct short-term prediction models are proposed, as well as a well-known data stream ensemble framework to use all models. We briefly describe these models along this section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Time Varying Poisson Model</head><p>Consider the probability for n taxi assignments to emerge in a certain time period -P (n) -following a Poisson Distribution.It is possible to define it using the following equation</p><formula xml:id="formula_0">P (n; λ) = e −λ λ n n! (1)</formula><p>where λ represents the rate (average demand for taxi services) in a fixed time interval. However, in this specific problem, the rate λ is not constant but time-variant. Therefore, it was adapted as a function of time, i.e. λ(t), transforming the Poisson distribution into a non homogeneous one. Let λ 0 be the average (i.e. expected) rate of the Poisson process over a full week. Consider λ(t) to be defined as follows</p><formula xml:id="formula_1">λ(t) = λ 0 δ d(t) η d(t),h(t)<label>(2)</label></formula><p>where δ d(t) is the relative change for the weekday d(t) (e.g.: Saturdays have lower day rates than Tuesdays); η d(t),h(t) is the relative change for the period h(t) in the day d(t) (e.g. the peak hours); d(t) represents the weekday 1=Sunday, 2=Monday, ...; and h(t) represents the period when time t falls (e.g.</p><p>the time 00:31 is contained in period 2 if we consider 30minutes periods).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Weighted Time Varying Poisson Model</head><p>The model previously presented can be faced as a timedependent average which produces predictions based on the long-term historical data. However, it is not guaranteed that every taxi stand will have a highly regular passenger demand: actually, the demand in many stands can often be seasonal.</p><p>The sunny beaches are a good example on the demand seasonality: the taxi demand around them will be higher on summer weekends rather than other seasons along the year.</p><p>To face this specific issue, a weighted average model is proposed based on the one presented before: the goal is to increase the relevance of the demand pattern observed in the recent week (e.g. what happened on the previous Tuesday is more relevant than what happened two or three Tuesdays ago). The weight set ω is calculated using a well-known time series approach to these type of problems: the Exponential Smoothing <ref type="bibr" target="#b1">[Holt, 2004]</ref>. This model will enhance the importance of the mid-term historical data rather than the long-term one already proposed in the above section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Autoregressive Integrated Moving Average Model</head><p>The two previous models assume the existence of a regular (seasonal or not) periodicity in taxi service passenger demand (i.e. the demand at one taxi stand on a regular Tuesday during a certain period will be highly similar to the demand verified during the same period on other Tuesdays). However, the demand can present distinct periodicities for different stands.</p><p>The ubiquitous features of this network force us to rapidly decide if and how the model is evolving so that it is possible to adapt to these changes instantly. The AutoRegressive Integrated Moving Average Model (ARIMA) <ref type="bibr" target="#b0">[Box et al., 1976</ref>] is a well-known methodology to both model and forecast univariate time series data such as traffic flow data <ref type="bibr" target="#b2">[Min and Wynter, 2011]</ref>, electricity price <ref type="bibr" target="#b0">[Contreras et al., 2003]</ref> and other short-term prediction problems such as the one presented here. There are two main advantages to using ARIMA when compared to other algorithms. Firstly, 1) it is versatile to represent very different types of time series: the autoregressive (AR) ones, the moving average ones (MA) and a combination of those two (ARMA); Secondly, 2) it combines the most recent samples from the series to produce a forecast and to update itself to changes in the model. A brief presentation of one of the simplest ARIMA models (for non-seasonal stationary time series) is presented below following the existing description in <ref type="bibr" target="#b5">[Zhang, 2003]</ref> (however, our framework can also detect both seasonal and non-stationary series). For a more detailed discussion, the reader should consult a comprehensive time series forecasting text such as the one presented in Chapters 4 and 5 in <ref type="bibr">[Cryer and Chan, 2008]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Sliding Window Ensemble Framework</head><p>Three distinct predictive models have been proposed which focus on learning from the long, medium and short-term historical data. However, a question remains open: Is it pos-sible to combine them all to improve our prediction? Over the last decade, regression and classification tasks on streams attracted the community attention due to their drifting characteristics. The ensembles of such models were specifically addressed due to the challenge related to this type of data. One of the most popular models is the weighted ensemble <ref type="bibr" target="#b4">[Wang et al., 2003]</ref>. This error-based model was employed in this framework. The Averaged Weighted Error(AVE) metric was used to measure such error.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Recommendation Model</head><p>Let X k,t+1 be the number of services to be demanded in the taxi stand k during the 30-minutes period next to the time instant t. Then, a passenger is dropped-off somewhere by a vehicle of interest w minutes after the last forecast on the instantt. The problem is to choice one of the possible taxi stands to head to. This choice is related with four key variables: the expected price for a service over time, the distance to each stand, how many taxis are already waiting at each stand and the predicted passenger demand. However, here we solve this issue like a minimization problem: we want to rank the stands according the minimum waiting time (target variable) to pick-up a passenger, whenever it is directly picked-up or dispatched by the central.</p><p>Let C k,t+1 be the number of taxis already parked in the stand k in the drop-off moment and L k,w be the number of services departed from the same stand between this moment and the moment of the last forecast (i.e.: t).We can define the service deficit -SD k,t+w on the taxi stand k i.e.: a prediction on the number of services that still will be demanded in the stand discounting the vehicles already waiting in the line) as</p><formula xml:id="formula_2">SD k,t+w = (X k,t+1 − C k,t+1 − L k,w ) * ρ H</formula><p>(3) where ρ H is the similarity (i.e.: 1 -error) obtained by our forecasting model in this specific stand during the sliding training window H. In fact, ρ H works as a certainty about our prediction (i.e.: if two stands have the same SD but our model is experiencing a bigger error in one of them, the other stand should be picked instead).</p><p>Let υ k be the distance (in kilometres) between the drop-off location and the taxi stand k. We can define the normalized distance to the stand -U k -as follows</p><formula xml:id="formula_3">U k = 1 − υ k ξ (<label>4</label></formula><formula xml:id="formula_4">)</formula><p>where ξ is the distance to the farthest stand. We can calculate the Recommendation Score of the taxi stand k as</p><formula xml:id="formula_5">RS k = U k * SD k,t+w<label>(5)</label></formula><p>Then, we calculate the Recommendation Score of every stands and we recommend to the driver the stand with the highest one.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Data Acquisition and Preprocessing</head><p>The stream events data of a taxi company operating in the city of Porto, Portugal, was used as case study. This city is the center of a medium-sized urban area (consisting of 1.3 million inhabitants) where the passenger demand is lower than the number of running vacant taxis, resulting in a huge competition between both companies and drivers. The data was continuously acquired using the telematics installed in each one of the 441 running vehicles of the company fleet throughout a non-stop period of nine months. This study just uses as input/output the services obtained directly at the stands or those automatically dispatched to the parked vehicles (more details in the section below). This was done because the passenger demand at each taxi stand is the main feature to aid the taxi drivers' decision.</p><p>Statistics about the period studied are presented. Table <ref type="table" target="#tab_0">1</ref> details the number of taxi services demanded per daily shift and day type. Table <ref type="table" target="#tab_1">2</ref> contains information about all services per taxi/driver and cruise time. The service column in Table <ref type="table" target="#tab_1">2</ref> represents the number of services taken by the taxi drivers, while the second represents the total cruise time of every service. Additionally, it is possible to state that the central service assignment is 24% of the total service (versus the 76% of the service requested directly on the street) while 77% of the service is demanded directly to taxis parked in a taxi stand (and 23% is assigned while they are cruising). The average waiting time (to pick-up passengers) of a taxi parked at a taxi stand is 42 minutes while the average time for a service is only 11 minutes and 12 seconds. Such low ratio of busy/vacant time reflects the current economic crisis in Portugal and the regulators' inability to reduce the number of taxis in the city. It also highlights the importance of the predictive system presented here, where the shortness of services could be mitigated by obtaining services from the competitors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experimental Results</head><p>In this section, we firstly describe the experimental setup developed to test our predictive model on the available data. Secondly, we introduce our simulation model and the experiments associated with. Thirdly, we present our Recommendation System and the metrics used to evaluate our methods. Finally, we present the results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Experimental Setup for the Predictive Model</head><p>Our model produces an online forecast for the taxi-passenger demand at all taxi stands at each P-minutes period. Our test- bed was based on prequential evaluation: data about the network events was continuously acquired. Each data chunk was transmitted and received through a socket. The model was programmed using the R language. The prediction effort was divided into three distinct processes running on a multicore CPU (the time series for each stand is independent from the remaining ones) which reduced the computational time of each forecast. The pre-defined functions used and the values set for the models parameters are detailed along this section.</p><p>An aggregation period of 30 minutes was set (i.e. a new forecast is produced each 30 minutes; P=30) and a radius of 100 m (W = 100 ¿ 50 defined by the existing regulations). It was set based on the average waiting time at a taxi stand, i.e. a forecast horizon lower than 42 minutes.</p><p>The ARIMA model (p,d,q values and seasonality) was firstly set (and updated each 24h) by learning/detecting the underlying model (i.e. autocorrelation and partial autocorrelation analysis) running on the historical time series curve for each considered taxi stand. To do so, we used an automatic time series function in the [forecast] R package <ref type="bibr">[Yeasmin and Rob, 1999]</ref> auto-arima -with the default parameters. The weights/parameters for each model are specifically fit for each period/prediction using the function arima from the built-in R package <ref type="bibr">[stats]</ref>.</p><p>The time-varying Poisson averaged models (both weighted and non-weighted) were also updated every 24 hours. A sliding window of 4 hours (H=8) was considered in the ensemble.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Traffic Simulator: An Online Test-Bed</head><p>The <ref type="bibr">DIVERT [Conceicao et al., 2008]</ref> is a high-performance traffic simulator framework which uses a realistic microscopic mobility model. The main advantage of this framework when facing others is the easiness to create new simulation modules efficiently. Hence, we have created a new model that simulates the real behavior of a taxi fleet. Upon a request, a central entity elects one taxi to do the requested service. Once the service is finished, the same entity recommends a new taxi-stand for the taxi to go to and wait for a new service.</p><p>This framework was employed as an online test-bed for our Recommendation System. Firstly, the realistic map of the city of Porto -containing the real road network topology and the exact location of the 63 taxi stands in the city -was loaded.</p><p>). Secondly, we fed the framework with a service log (i.e. a time-dependent origin-destination matrix) correspondent to the studied period. However, we just accessed the log of one out of the two running fleets in Porto (the largest one, with 441 vehicles). To simulate a scenario similar to our own, we divided this fleet into two using a ratio close to real one (60% for the fleet A1 and 40% to the fleet B1). The services dispatched from the central were also divided in the same proportion while the services demanded in each taxi stand will be the same. The fleet B1 will use the most common and traditional way to choose the best taxi-stand: it will go to the nearest taxi stand of each drop-off location (i.e. after a dropoff, each driver has to head to a specific taxi stand of its own choice). However, the fleet A1 will use our Recommendation System to do an informed driving, which considers multiple variables -like the number of taxis in each stand or the demand prediction on them -to support this important decision. Finally, we ran the simulation and we extract the metrics for each fleet. The framework is used to calculate the optimal paths between the taxi stand and the passenger location and the dependent behavior of the fleets (the location of each vehicle will affect the way they get the services). Our main goal is to simulate a real scenario behavior and its competitive characteristics while we are testing the Recommendation System. It is important to notice that both fleets would get similar results if they did not use any Recommendation System. We also highlight that the vehicles will remain parked in the stand waiting for a service whenever the time it takes to appear. In this case, we consider the maximum threshold of 120 minutes that is deeply detailed in the following section, along with the remaining evaluation metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Evaluation Methods</head><p>We used the data obtained from the last four months to evaluate our both experimental setups (where 506873 services emerged). Firstly, we present two error measurements which were employed to evaluate our output: one from the literature and another from our own specifically adapted to our current problem. Secondly, we detail the two performance metrics used to evaluate our recommendation models.</p><p>Consider R k = {R k,0 , R k,1 , ..., R k,t } to be a discrete time series (aggregation period of P -minutes) with the number of services predicted for a taxi stand of interest k in the period {1, t} and X k = {X k,0 , X k,1 , ..., X k,t } the number of services actually emerged in the same conditions. The (1) Symmetric Mean Percentage Error (sMAPE) is a well-known metric to evaluate the success of time series forecast models. However, this metric can be too intolerant with small magnitude errors (e.g. if two services are predicted on a given period for a taxi stand of interest but no one actually emerges, the error within that period would be 1). Then, we propose to also use an adapted version of Normalized Mean Absolute Error (NMAE).</p><p>The (2) Average Weighted Error (AVE) is a metric of our own based on the NMAE. We defined it as</p><formula xml:id="formula_6">AV E = t i=1 θ k,i * X k,i σ k,i * ψ k (6) σ k,i = X k,i if X k,i &gt; 0 1 if X k,i = 0 (7) θ k,i = |R k,i − X k,i | if X k,i &gt; th 0 if X k,i ≤ th (8) ψ k = t i=1 X k,i , AV E = AV E if AVE' ≤ 1 1 if AVE' &gt; 1 (9)</formula><p>where ψ k is the total of services emerged at the taxi stand k during the time period {1, t}. The main feature about this metric is to weight the error in each period by the number of real events actually emerged (i.e. the errors on periods where more services were actually demanded are more relevant than the remaining ones).</p><p>Both metrics are focused just on one time series for a given taxi stand. However, the results presented below use an averaged error measured based on all stands series -GA. Consider β to be an error metric of interest. AG β is an aggregated metric given by a weighted average of the error in all stands. It is formally presented in the following equation.</p><formula xml:id="formula_7">AG β = N k=1 GA β,k * ψ k µ , µ = N k=1 ψ k<label>(10)</label></formula><p>We considered three performance metrics in the evaluation of our recommendation models: (1) the Waiting Time (WT) and</p><p>(2) the Vacant Running Distance (VRD) and the number of No Services (NS). The Waiting Time is the total time that a driver takes between a drop-off and a pick-up (i.e. to leave a stand with a passenger or to get one in his/her current location). The Vacant Running Distance is the distance that a driver does to get into a stand after a drop-off (i.e.: without any passenger inside). Independently on the time measured on the simulation, we always consider a maximum threshold of 120 minutes to the Waiting Time. The No Service metric is a ratio between the number of times that a taxi parked on a stand had a waiting time greater than the 120 minutes threshold and the number of services effectively dispatched by the respective fleet.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Results</head><p>Firstly, we present the results obtained by the online experiments done with the predictive models. The error measured for each model is highlighted in Table <ref type="table" target="#tab_3">3 and Table 4</ref>. The results are firstly presented per shift and then globally. These error values were aggregated using the AG β previously defined.</p><p>Secondly, the values calculated for our performance metrics using the traffic simulator previously described are detailed in the Table <ref type="table" target="#tab_4">5</ref>. The fleet A1 used the Recommendation Model 1 (RS1) while the B1 uses the common expected behavior (previously defined). Distinct metrics values are presented for the two using different aggregations like the arithmetic mean (i.e. average), the median and the standard deviation. The No Services ratio is also displayed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Final Remarks</head><p>In this paper, we present a novel application of time series forecasting techniques to improve the taxi driver mobility intelligence. We did it in three distinct steps: firstly (1) we mined both GPS and event signals emitted by a company operating in Porto, Portugal (where the passenger demand is   lower than the vacant taxis). Secondly, we predicted -in a real-time experiment -the distribution of the taxi-passenger demand for the 63 taxi stands at 30-minute period intervals. Finally, we recreated the scenario running in Porto, where two fleets (the fleet A and B, which contain 441 and 250 vehicles, respectively) compete to get as many services as possible. We did it using a traffic simulation framework fed by the real services historical log of the largest operating fleet.</p><p>One of the fleets used our Recommendation System for the Taxi Stand choice problem while the other one just picked the stand using a baseline model corresponding to the driver common behavior in similar situations.</p><p>Our predictive model demonstrated a more than satisfactory performance, anticipating in real time the spatial distribution of the passenger demand with an error of just 20%. We believe that this model is a true novelty and a major contribution to the area through its online adapting characteristics:</p><p>• It takes advantage of the ubiquitous characteristics of a taxi communicational network, assembling the experience and the knowledge of all vehicles/drivers while they usually use just their own;</p><p>• It simultaneously uses long-term, mid-term and short term historical data as a learning base;</p><p>• It rapidly produces real-time short-term predictions of the demand, which can truly improve drivers' mobility intelligence and consequently, their profit.</p><p>This approach meets no parallel in the literature also by its test-bed: the models were tested in a streaming environment, while the state-of-art presents mainly offline experimental setups. Our simulation results demonstrated that such informed driving can truly improve the drivers' mobility intelligence: the fleet A1 had an Average Waiting Time 5% lower than its competitor -even if it has a larger fleet. We also highlight the reduction of the No Service ratio in 50% while the Vacant Running Time faced an increase. It is important to state that this Recommendation System is focused on a Scenario like our own -two or more competitors operating in a medium/large city where the demand is lower than the number of running vehicles. Its main goal is to recommend a stand where a service will rapidly emerge -even if this stand is far away. The idea is to be in a position able to pick-up the emerging service demand before the remaining competition. This factor can provoke a slight increase on the Vacant Running Time but it will also reduce the usually large Waiting Times to pick-up passengers. Other scenarios may require a distinct calibration of the model to account different needs/goals.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Taxi Stand choice problem.</figDesc><graphic coords="1,339.30,216.00,194.40,121.55" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Taxi Services Volume (Per Daytype/Shift)</figDesc><table><row><cell>Daytype</cell><cell>Total Services</cell><cell cols="3">Averaged Service Demand per Shift</cell></row><row><cell>Group</cell><cell cols="4">Emerged 0am to 8am 8am to 4pm 4pm to 0am</cell></row><row><cell>Workdays</cell><cell>957265</cell><cell>935</cell><cell>2055</cell><cell>1422</cell></row><row><cell>Weekends</cell><cell>226504</cell><cell>947</cell><cell>2411</cell><cell>1909</cell></row><row><cell>All Daytypes</cell><cell>1380153</cell><cell>1029</cell><cell>2023</cell><cell>1503</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Taxi Services Volume(Per Driver/Cruise Time)</figDesc><table><row><cell></cell><cell cols="2">Services per Driver Total Cruise Time (minutes)</cell></row><row><cell>Maximum</cell><cell>6751</cell><cell>71750</cell></row><row><cell>Minimum</cell><cell>100</cell><cell>643</cell></row><row><cell>Mean</cell><cell>2679</cell><cell>33132</cell></row><row><cell>Std. Dev.</cell><cell>1162</cell><cell>13902</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Error Measured on the Models using AV E</figDesc><table><row><cell>Periods</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Error Measured on the Models using sM AP E</figDesc><table><row><cell></cell><cell></cell><cell cols="2">Periods</cell><cell></cell></row><row><cell>Model</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell cols="3">00h−08h 08h−16h 16h−00h</cell><cell>24h</cell></row><row><cell>Poisson Mean</cell><cell>15.09%</cell><cell>19.20%</cell><cell>17.51%</cell><cell>16.84%</cell></row><row><cell>W. Poisson Mean</cell><cell>17.32%</cell><cell>20.66%</cell><cell>19.88%</cell><cell>18.47%</cell></row><row><cell>ARIMA</cell><cell>16.81%</cell><cell>18.59%</cell><cell>17.85%</cell><cell>18.51%</cell></row><row><cell>Ensemble</cell><cell>14.37%</cell><cell>18.18%</cell><cell>17.19%</cell><cell>15.89%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>An Analysis on the Recommendation Performance</figDesc><table><row><cell cols="3">Performance Metrics A1(RS) B1(common)</cell></row><row><cell>Average WT</cell><cell>38.98</cell><cell>40.84</cell></row><row><cell>Median WT</cell><cell>26.29</cell><cell>27.92</cell></row><row><cell>Std. Dev. WT</cell><cell>33.79</cell><cell>33.22</cell></row><row><cell>Average VRD</cell><cell>3.27</cell><cell>1.06</cell></row><row><cell>Median VRD</cell><cell>2.80</cell><cell>0.98</cell></row><row><cell>Std. Dev. VRD</cell><cell>2.53</cell><cell>0.54</cell></row><row><cell>No Service (%)</cell><cell>11.08%</cell><cell>19.26%</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The authors would like to thank to Geolink and to its team for the data supplied to this work. This work was supported by the projects DRIVE-IN: "Distributed Routing and Infotainment through Vehicular Internet-working", VTL: "Virtual Traffic Lights" and KDUS: "Knowledge Discovery from Ubiquitous Data Streams" under the Grants CMU-PT/NGN/0052/2008, PTDC/EIA-CCO/118114/2010, PTDC/EIA-EIA/098355/2008, respectively, and also by ERDF -European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness), by the Portuguese Funds through the FCT(Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-022701.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Arima models to predict next-day electricity prices</title>
		<author>
			<persName><surname>Box</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Spatiotemporal structure of taxi services in shanghai: Using exploratory spatial data analysis</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Cryer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Chan</surname></persName>
		</editor>
		<meeting><address><addrLine>USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="1976">1976. 1976. 2010. 2010. 2008. 2008. 2003. 2003. 2008. 2008. 2011. 2011</date>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
	<note>Geoinformatics, 2011 19th International Conference on</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Forecasting seasonals and trends by exponentially weighted moving averages</title>
		<author>
			<persName><forename type="first">Charles</forename><surname>Holt</surname></persName>
		</author>
		<author>
			<persName><surname>Holt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Forecasting</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="10" />
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Urban cycles and mobility patterns: Exploring and predicting trends in a bicycle-based public transport system</title>
		<author>
			<persName><surname>Kaltenbrunner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2008">2010. 2010. 2008. 2008. 2011. 2011. March 2011. 2012. 2012. 2009. 2009. 2011</date>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="606" to="616" />
		</imprint>
	</monogr>
	<note>Frontiers of Computer Science in China</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Online predictive model for taxi services</title>
		<author>
			<persName><surname>Moreira-Matias</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Intelligent Data Analysis XI</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Phithakkitnukoon</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2012. 2012. 2010</date>
			<biblScope unit="volume">7619</biblScope>
			<biblScope unit="page" from="230" to="240" />
		</imprint>
	</monogr>
	<note>Phithakkitnukoon et al.</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Taxiaware map: identifying and predicting vacant taxis in the city</title>
		<author>
			<persName><forename type="first">M</forename><surname>Veloso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bento</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Biderman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ratti</surname></persName>
		</author>
		<author>
			<persName><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining</title>
				<editor>
			<persName><forename type="first">Khandakar</forename><surname>Yeasmin</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Hyndman Rob</surname></persName>
		</editor>
		<meeting>the ninth ACM SIGKDD international conference on Knowledge discovery and data mining</meeting>
		<imprint>
			<publisher>Yeasmin and Rob</publisher>
			<date type="published" when="1999">2010. 2003. 2003. 1999. 1999</date>
			<biblScope unit="volume">6439</biblScope>
			<biblScope unit="page" from="226" to="235" />
		</imprint>
	</monogr>
	<note>Automatic Time Series Forecasting: The forecast Package for R</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Mining time-dependent attractive areas and movement patterns from taxi trajectory data</title>
		<author>
			<persName><surname>Yuan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems</title>
				<meeting>the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2003">2010. 2010. 2009. 2009. 2003. 2003</date>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page" from="159" to="175" />
		</imprint>
	</monogr>
	<note>Time series forecasting using a hybrid arima and neural network model</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
