=Paper= {{Paper |id=Vol-1088/paper7 |storemode=property |title=Road-quality Classification and Bump Detection with Bicycle-Mounted Smartphones |pdfUrl=https://ceur-ws.org/Vol-1088/paper7.pdf |volume=Vol-1088 |dblpUrl=https://dblp.org/rec/conf/ijcai/HoffmannMM13 }} ==Road-quality Classification and Bump Detection with Bicycle-Mounted Smartphones== https://ceur-ws.org/Vol-1088/paper7.pdf

Road-quality classification and bump detection with bicycle-mounted smartphones

Marius Hoffmann, Michael Mock, Michael May
Fraunhofer IAIS
Schloss Birlinghoven, 53754 St Augustin, Germany
first.lastname@iais.fraunhofer.de

Abstract in Taxi cabs has already been successfully explored in [Eriks-
son et al., 2008], [Strazdins12 et al., 2011] and [Mednis et
The paper proposes a embedded surface road clas- al., 2012] investigate road condition monitoring for vehicular
sifier for smartphones used to track and classify sensor networks based on time series analysis. We investi-
routes on bikes. The main idea is to provide, along gate experimentally, whether we can achieve a road surface
with the route tracking, information about surface classification using smartphones mounted on bicycles. In or-
quality of the cycling route (is the surface smooth, der to cope with the restricted computational power of these
rough or bumpy?). The main problem is the quan- devices, we apply a machine learning approach: we learn a
tity of accelerometer data that would have to be up- classifier off-line on a standard PC and apply the classifier
loaded along with GPS track, if the analysis was online on the smartphone.
done off-line. Instead, we propose to classify road We collected GPS tracks and acceleration data (based on
surfaces online with an embedded classifier, that the mobile phone’s accelerometer sensor) and applied two
has been trained off-line. More specifically, we rely different approaches for classification of road surface qual-
on the accelerometer of a bicycle-mounted smart- ity, both based on standard machine learning classifiers: in
phone for online classification. We carry out exper- a direct segmentation classification approach, we used man-
iments to collect cycling tracks consisting of GPS ual labeling of road segments of fixed length (smooth, rough,
and accelerometer data, label the data and learn a bumpy) to train a classifier, based of various parameter set-
model for classification, which again is deployed tings for feature extraction. The best result that we obtained
on the smartphone. We report on our experiences in a cross-validation was a 20% increase of accuracy against a
with classification accuracy on and runtime perfor- standard Kappa-Statistics. In a second approach, we trained
mance of the classifier on the smartphone. a classifier for detecting bumps. Here we achieved an ac-
curacy of 97%. Using this bump detector, we performed a
1 Introduction threshold-based road segment classification, which delivered
The main motivation of this work is to provide a community much more comprehensible results. A closer look at input
based cycling route road quality classification service. There data, manual labeling, classification results, and comparison
are many community based services providing cycling routes with the real-world, revealed that the manual labeling was er-
together with altitude profiles, but none of them is providing ror prone . We conclude that the simple bump-detector based
information about the road quality of the route, i.e. whether classification approach can be used for road surface quality
the road is smooth, rough, or bumpy. Many bicycling com- classification and even does not require further manual label-
munity web-portals like [http://www.bikemap.net] offer facil- ing of road segments.
ities for uploading and downloading GPS tracks for cycling
routes. To our knowledge, none of them provides information 2 Classification approach and Results
about road surface quality of the cycling route. Route quality Most of today’s smartphones are equipped with GPS and ac-
information could be gathered together with GPS track using celeromater sensors. In order to ensure that our algorithm
the accelerometer data coming from bicycle mounted smart- performs not only on today’s top range models, we carried
phone. Obviously, including all accelerometer raw data in the out the experiments with a 2-years old Nokia 5800, one of
data upload would increase data traffic significantly and may the first mass models providing accelerometer data. Figure
not be tolerable for the user, especially when gathering long 1 illustrates a track of accelerometer data collected with a
tracks. The solution is to implement a road surface classifi- smartphone.
cation algorithm on the smartphone and to upload the clas- Figure 1 shows the length of the accelerometer vector plot-
sification results together with the GPS track. Similar ap- ted over a track. We see that the data provides a more or
proaches already have been successfully applied for other ve- less continuos signal (at 37 Hz in our case) over the complete
hicles than bicycles. Pothole detection using GPS data and track. As we want to explore a machine learning classifica-
accelerometer data with dedicated hardware devices mounted tion approach for road surface classification, we first have to
values and sometimes even potholes can be detected by (hu-
man) visual inspection of the data. For our machine learning
approach, we extract the mean, the variance and the standard
deviation of the acceleration values of a segment as features
for this segment.
2.1 Direct road surface classification
In this section, we apply standard classification methods to
segments of varying length, based on the features described
above. For our analysis we consider a number of previ-
ous segments which are before the segment that we want
to classify. We define a whole road as a set of segments
S = {s1 , s2 , . . . , sn }. We consider the previous x segments
Figure 1: This figure shows a recorded test track of one road. si−x , si−(x−1) , . . . , si of the segment i which we want to
The peeks s in this chart are bumps on the road. The smart- classify as features for si . In this case the features of the
phone was attached to the handle of the bike previous segments serve us (primarily our machine learning
algorithm) as additional information’s for our analysis. How
much these feature information’s are relevant and how many
define the features which are used to train the classifier. The segments we must consider has be analyzed experimentally.
raw data consists of GPS positions and their time stamps, and The organization of the training data is shown in Table 1.
acceleration values only. Acceleration values are represented Now we use all extracted features of such a set of segments
by a three-dimensional vector. In a first step, we extract as as training data. Every segment has its own row with its own
many features from the data as possible and evaluate experi- features and additional features of previous segments. Each
mentally, which feature selection yields the best classification row in this table also contains the class as entry in the column
result. named label. This column contains class which later on will
In our first approach to classification described in section be learned by the machine learning algorithm. For example
2.1, we divide the road into segments of varying length. In our row 1 has the label smooth as class for segment S1 .
second approach described in section 2.2, we just consider
two subsequent GPS points as boundary of a segment. In fSi−2 fSi−1 fSi label(Si )
both cases, we get a segmentation of the cycling route in a - fS0 fS1 smooth
sequence of segments as shown in Figure 2. As a result, the fS0 fS1 fS2 smooth
recorded acceleration data is associated to a certain segment. fS1 fS2 fS3 smooth
fS2 fS3 fS4 rough
.. .. .. ..
. . . .

Table 1: This table illustrates how the features of each seg-
ment are arranged in order to generate a training set of data

We want to evaluate how well the classifiers can learn from
the provided data and which features and parameters influ-
ence the performance of these classifiers. The goal is to eval-
Figure 2: This figure shows how a track will be segmented uate whether it is possible at all to learn from the data and
into a set of segments if so, which are the best parameters (for example segment
length, number of segments to be included in the table).
The segmentation shown in 2 allows to indicate, which po- As raw data we recorded one route several times. The route
sition of the road has a certain surface property or would even for direct surface classification was recorded 16 times and
contain potholes. We can now analyze segment by segment leads through urban terrain mostly the city of Bonn and they
depending on the data recorded for the segment and make have a length of approximate 13-14km per track (the devia-
statements about its road surface quality. These information’s tion in length results from the GPS inaccuracy). Each track
can be used as features for our machine learning approach. was labeled for classification by hand with the tool presented
At the end each segment contains GPS and acceleration data in [Guc et al., 2008].
which can be used for creating features for this segment. The previously mentioned segment arrangement
Features which can be extracted from the GPS data are si−x , si−(x−1) , . . . , si will further be called Sline which
speed and inclination. To simplify the handling of the accel- only consist of previous segments and where i is our current
eration data provided by the accelerometer, which is made up position. For the segments length , The other Fixed segment
of an 3D vector, we will further
p use L2-norm of this vector length is fixed from the beginning (during the evaluation
which is defined as ||x|| = x21 + . . . + x2n . The example fixed values of 1m, 2m, 5m, 10m, 15m and 20m are used).
shown in Figure 1 already illustrates, that changes in these For the fixed length parameter the amount of acceleration
values can vary, because the amount of values is speed influences of all features, we observed that the speed feature
dependant. For classification we will use two different does not contribute to the classification. The inclination fea-
Algorithms the K-Nearest-Neighbor and the Naı̈ve Bayesian ture, even worse, confuses the classifier.
Classifier. Five different features were extracted from The best results (table 3) are achieved with the features ac-
the training data :speed, inclination, acceleration mean, celeration (mean, variance, standard deviation) and a segment
acceleration variance, acceleration standard deviation. length of 20m and 13 segments must be considered for classi-
The following table shows a compact overview of all pa- fication. The used segment setup is the Sline setup. The cor-
rameters which were evaluated. responding kappa statistic achieves an accuracy of 56,357%
which makes a difference of 21,101% between the classifier
Parameter type Parameter Value and its kappa statistic.
ML-algorithm K-NN, Naive Bayes The overall results of the classification (at best 78%) are
segments lengths variabe length: gps not very satisfying for a classification model. We will see in
fixed length: 1m, 2m, 5m, 10m, 15m, 20m section 2.2 that the bump detection just based on GPS-defined
number of segments 3, 5, 7, 9, 11, 13
extracted features inclination, speed, acceleration (mean, variance, std)
segments performs much better.

Table 2: This table gives an overview of all parameters which 2.2 Bump detection based classification
were changed during evaluation. The acceleration contains In this approach, we first consider the detection of sin-
three features, acceleration-mean, -variance and -standard de- gle bumps or potholes. The classifier in this first just dis-
viation) tinguishes the two classes: ”bump” and ”no bump”. For
the bump classification a different route was selected and
To measure the performance of the classification algorithm recorded 15 times. Each of them has a length between 110m
on the evaluation data, a 10-folded cross-validation was in- and 130m per track (here the deviation in length also results
cluded. A N -folded cross validation splits the test data into from the GPS inaccuracy). Again each track was labeled for
N equally large sets and then uses N − 1 set for training to classification by hand via the already mentioned annotator
classifier and 1 set for validating the learned concept this is re- tool.
peated N times where for every iteration a different set of the The performance of the bump classification works out
N sets is used for validation. At the end a confusion matrix much better compared to the highest accuracy of the sur-
is provided from the cross-validation module which consists face classification. Again, the feature ”speed” turned out to
of the average performance values of the classification. be irrelevant and the feature ”inclination” was confusing the
Additionally we performed a feature selection optimization classifier. It was also observed that (for surface- not bump-
in order to find the best feature combination. This optimiza- classification), the more segments are considered the more
tion allows to find a feature combination wich only contains the accuracy declines. The reason for this is that the longer
features which influence the learning algorithm positivly and the considered area the more unimportant information is con-
result in hoch accuracy. Features which confuse the learning tained in the data which should be classified. In compari-
scheme will not be selected anymore. We found thaht the pre- son to the surface classification, the bump classification needs
viously mentioned speed and inclination feature confuses the shorter segment length’s (1m to 5m) to reach high classifica-
learning scheme and results in performances which are worse tion accuracy. The longer the segment lengths, the worse the
than the corresponding kappa statistics. classification performance gets. The long segment also con-
fuse the classification algorithm, this was verified by com-
paring the results of the classification with the corresponding
true smooth true bumpy true rough class precision
pred. smooth 5785 882 92 85,590% kappa statistic. The best result were achieved with the seg-
pred. bumpy 836 1052 62 53,949% ment length GPS parameter. This is quite expected, because
pred. rough 151 110 492 65,339%
”bumps” are short term events and GPS-based segmentation
class recall 85,425% 51,468% 76,161% accuracy: 77,457% (i.e. every two succeeding GPS points define a segment) is
the smallest achievable spatial granularity.
Table 3
true no bump true bump class precision
The classification performance for the Naive Bayes and the pred. no bump 404 6 98,537%
K-NN were almost similar, but the K-NN performed (on av- pred. bump 2 29 93,548%
erage) slightly better than the Naive Bayes.
class recall 99,507% 82,857% accuracy: 98,186%
For K-NN algorithm, the performance increases with an
increasing number of the segments which are considered
for classification. The Naive Bayes classifier, however, has Table 4
a more constant performance, independently of the num-
ber of segments included in the table. The evaluation also As we can see it is indeed possible to do pothole and
showed that the classification results which use longer seg- bumpy detection with a very high accuracy, just using the
ments lengths (15m and 20m) perform much better than the Naive Bayes Classifier on a single segment. This led us to
ones with short segment length’s (2m). When looking at the extend this simple approach to be applicable in road surface
classification, with the three classes ”smooth”, ”rough”, and For each label class the figure shows the manual labels (light
”bumpy”, as described in the following. gray bars) and the predicted labels (dark gray bars). It can
be seen that the light gray labels for the rough class are not
Extended bump classification The bump detection can be modeled with sufficient detailness (on the left side of the di-
altered slightly to derive another concept for surface classi- agram). It can also be seen from the acceleration values that
fication. The main idea is to count the number of bumpy this label contains parts of different labels like smooth and
segments in a certain road section. Depending on that num- bumpy which were not correctly labeled. The diagram shows
ber, one of the classes ”smooth”, ”rough”, and ”bumpy” is that the classifier indeed is more often correct than the man-
assigned as follows: ual label which is unfortunately the reference for the perfor-
mance. This is the main reason for the ”bad” performance of
• For 0 ≤ |bumps| ≤ N3 , the class smooth is assigned. the classifiers and explains also the confusion matrix (table
• For N3 < |bumps| ≤ 2N
3 , the class rough is assigned
5).
• For 2N 3 < |bumps| ≤ N , the class bumpy is assigned 2.3 Classifier implementation on the smartphone
Not surprisingly, the best results were achieved for N=3, In this section we will discuss the runtime of the whole classi-
i.e. just considering the GPS-Segments Si−1 , Si and Si+1 for fication process which was implemented in J2ME. The one of
the classification of GPS-segment Si . In other words, a GPS initial goals of this work is to make the classification process
segment is considered as, for example, smooth, if at most one possible in the online mode of the client.
of its preceding, the GPS-segment itself, and the succeeding Once learned, the classifier has to execute the following
GPS-segment have a bump. The results are shown in Table 5. steps online on the smartphone.
• calculate the mean, variance and standart deviation of all
true smooth true bumpy true rough class precision
pred. smooth 27865 1113 2129 89,578% previous absolute acceleration vector values
pred. bumpy 2249 1823 3315 24,678%
pred. rough 1488 537 2893 58,825% • assemble classification data
class recall 88,175% 52,491% 34,701% accuracy: 75,051% • applies Naive Bayes classifier for bump detection
• put prediction to bump LIFO (these LIFO stores previ-
Table 5: Confusion matrix of a the best performing classifi- ous classifications, which are needed to calculate surface
cation which considered 3 segments during its classification prediction)
• put GPS coordinates and prediction for this segment to
The classifier with the best accuracy for surface classifica-
ObservationBuilder
tion achieves ≈ 75% the classifiers from the previous sections
which directly learn the labels from the training data perform • builds observation
much worse. For the extended bump classification the K-NN • sends observation
classifier achieves 61% accuracy. A random classifier with
the same label distribution performs with ≈ 57% accuracy. The execution time of the learned classifier took less than 2
The confusion matrix of the extended bump classifier ex- ms in a JME implementation on a Nokia 5800 with an ARM
plains why the accuracy is not higher. The classifier is quite CPU execution at 400 Mhz. The accelerator delivered data
good for smooth data, but it confuses rough and bumpy data. at 37 Hz, resulting in 37 values which must be evaluated at
A closer look and comparison with the recorded variances in each GPS point (given that GPS is running at 1Hz). This
Figure 3 reveals that most probably, the labeling was not con- means that the overall impact of the classifier on the device
sistent in assigning the labels ”rough” and ”bumpy”. performance was very low and that classifier execution fin-
ished safely before the next accelerometer values came in.

3 Conclusion
It was shown that in general a surface and a bump classifi-
cation can be realized via a machine learning approach. It
was shown how the data must be preprocessed to achieve
good classification results and which features play an impor-
tant role in this classification process. At the current state,the
classification is not as good as it could be. We showed that the
correctness and accuracy of the labels in training data should
be improved for training a machine learning algorithm. How-
ever, we also achieved very good bump detection The learned
Figure 3: This diagram illustrates the results of inaccurate classifier is fast enough to be executed online on a moder-
manual data labeling ately fast smartphone hardware and needs no further learning
or labeling. Surface classification may derived from this. As
Figure 3 visualizes the acceleration variance combined the classifier performed best for short segments, mainly based
with their manual labels of a section from a recorded track. on the variance of the length of the acceleration vector, we
also see a good chance for just time-series based analysis ap-
proaches such as used in [Mednis et al., 2012] or [Mladenov
and Mock, 2009] to be applied for road surface classification.
As application, biking communities can profit from the pre-
sented approach for displaying route quality information on a
community portal, or cylcing-friendly cities can monitor the
surface quality of their cycling route network for detecting
damage and initiating road repair.

Acknowledgements
The research leading to these results has received funding
from the European Union’s Seventh Framework Programme
(FP7/2007-2013) under grant agreement no. 255951 (LIFT
Project).

References
[Eriksson et al., 2008] J. Eriksson, L. Girod, B. Hull,
R. Newton, S. Madden, and H. Balakrishnan. The pothole
patrol: Using a mobile sensor network for road surface
monitoring. In Proceeding of the 6th international confer-
ence on Mobile systems, applications, and services, pages
29–39. ACM, 2008.
[Guc et al., 2008] B. Guc, M. May, Y. Saygin, and C. Körner.
Semantic Annotation of GPS Trajectories. In 11th AGILE
International Conference on Geographic Information Sci-
ence, Girona, Spain, 2008.
[Mednis et al., 2012] Artis Mednis, Atis Elsts, and Leo
Selavo. Embedded solution for road condition monitor-
ing using vehicular sensor networks. In Application of In-
formation and Communication Technologies (AICT), 2012
6th International Conference on, pages 1–5. IEEE, 2012.
[Mladenov and Mock, 2009] M. Mladenov and M. Mock. A
step counter service for Java-enabled devices using a built-
in accelerometer. In Proceedings of the 1st International
Workshop on Context-Aware Middleware and Services: af-
filiated with the 4th International Conference on Commu-
nication System Software and Middleware (COMSWARE
2009), pages 1–5. ACM, 2009.
[Strazdins12 et al., 2011] Girts Strazdins12, Artis Med-
nis12, Georgijs Kanonirs, Reinholds Zviedris12, and Leo
Selavo12. Towards vehicular sensor networks with an-
droid smartphones for road surface monitoring. 2011.