237


Deep Learning for Terrain Surface Classification:
Vibration-based Approach
Marcos Concona , W. K. Wonga , Filbert H. Juwonoa and Catur Aprionob
a
    Curtin University Malaysia, CDT 250, Miri 98009, Sarawak, Malaysia
b
    University of Indonesia, Kampus Baru UI, Depok, West Java 16424, Indonesia


                                             Abstract
                                             As robots become more pervasive in the service sector, control in dynamic environment has become an important element in
                                             optimising the deployment of mobile robots. A mobile robot should be knowledgeable not only of the barriers, but also of
                                             the surface on which the robot navigates to estimate slippage and adaptive control. We note that various terrains/surfaces
                                             have different characteristics, which can directly influence the handling, driving, efficiency, and stability of the robot vehicle.
                                             Knowledge of the terrain can provide valuable information for establishing effective and secure navigation strategies. We
                                             built a mobile robot prototype equipped by Inertial Measurement Unit (IMU) to obtain the terrain data and applied deep
                                             learning models to classify the terrain using the data. Three deep learning configurations have been proposed in this paper, i.e.
                                             long short-term memory (LSTM), 1D convolutional network (1D CNN), and convolutional neural network-long short-term
                                             memory network (CNN-LSTM). The deep learning architectures were trained and evaluated based on the data collected from
                                             five different surfaces. It is shown that the CNN-LSTM performs the best with an F1 score of 98.49%. The other two networks
                                             also generalize relatively well with the unseen vibration sequences with F1 scores of 97.47% and 95.98% for the 1D CNN and
                                             LSTM, respectively. Finally, we investigate the effect of varying input sequence to find the optimal length, so that we are able
                                             to obtain the highest accuracy and generalization of the deep learning networks.

                                             Keywords
                                             LTSM, CNN, terrain classification,


1. Introduction                                                In this paper, we focus on the area of terrain mapping
                                                            using Inertial Measurement Unit (IMU) sensors. In or-
Intelligent robotics have seen rapid advancement in their der to map the readings to the respective terrain labels,
scope of operations such as in military reconnaissance we present and evaluate three types of deep learning
in hostile environments [1], unmanned surveillance for frameworks: long short-term memory (LSTM), one di-
disaster management [2], and telemedicine robot used mensional convolutional neural network (1D-CNN), and
for examining remote patients [3], and in factories. It the CNN-LSTM architectures. Both LSTM and CNN have
is necessary for a robot to acquire a clear understand- been extensively used in the literature; however, the ap-
ing of its current environment in order to successfully plications of utilizing both frameworks in a unified struc-
manoeuvre and accomplish its planned operation, while ture have been lacking. This paper aims to leverage the
preventing any damage to itself and creating hazards to temporal and spatial advantages towards the vibration-
others. As service robots have achieved broad adoption based terrain classification.
in the above-mentioned industries, precise navigation          It is worth noting that deep learning can be used for
and surrounding awareness have become crucial issues tasks where it is almost impossible to execute a raw data
to improve the capacity of the device to deploy. An sig- engineering function manually. Despite being highly
nificant consideration for the robot’s efficient navigation ’blackbox’, the end-to-end deep learning approach is suit-
is the motion control algorithm, based on the type of able for automatically extracting useful features in com-
terrain being travelled. Thus, a detailed classification of plex non-linear classification tasks. Therefore, deep learn-
the type of terrain is required for the robot to adapt its ing method can be implemented to obtain more reliable
speed of navigation and the parameters of route planning, results to recognize the surrounding environment of the
which depend on the characteristics of the terrain.         robot, thereby enhancing the robot’s adaptive controls
                                                            and mobility.
ISIC 2021: International Semantic Intelligence Conference, February
25–27, 2021, New Delhi, India
Envelope-Open marcos@respiree.com (M. Concon); weikitt.w@curtin.edu.my                                                2. Related Works
(W.K. Wong); filbert@ieee.org (F.H. Juwono); catur@eng.ui.ac.id (C.
Apriono)                                                                                                              The problems of adaptive control in mobile robots have
Orcid 0000-0001-6212-6096 (W.K. Wong); 0000-0002-2596-8101 (F.H.                                                      been constantly researched. The challenges presents var-
Juwono); 0000-0002-7843-6352 (C. Apriono)                                                                             ious opportunities for researchers to develop methods in
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
    CEUR
    Workshop
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       Commons License Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                                                      predicting the dynamic changes in the environment. In
    Proceedings
                                                                                                                            238


[4], the authors investigated the use of kinematics-based to some factors such as environmental noise and the
analytic for wheel slippage calculation. The results were robot’s internal motor noise as described in [13]. A deep
validated using collected data on a mobile platform. Sim- learning approach was applied in [14] where a CNN was
ilar work was found in [5] where the authors applied developed and trained using the short time Fourier trans-
rolling resistance torque without using any additional form (STFT) spectograms extracted from the raw terrain
sensors. Rolling resistance torque in multiple terrains audio signals. It was demonstrated that the network was
can be acquired by reaction torque observer. Proposed robust even when the terrain audio signal was corrupted
concept was verified by using a differential drive mobile with the white Gaussian noise.
robot. In [6], wheel slips were estimated based on the          Haptic-based classification uses ground contact forces
odometric data. The collected data were analyzed using between a legged robot and terrain to describe different
two different approaches, which were instantaneous esti- terrain properties. Typically, features such as the robot’s
mator and temporal window approach. Results showed stride frequency, peak and average motor torque in a
that temporal window approach yielded a better result. single stride are used to train an SVM classifier [15]. In
   In [7], researchers presented a solution of using laser- [16], a 1-dimensional CNN and an RNN architecture were
based point cloud generation to detect robot traversal implemented and evaluated when raw force/torque sig-
surface. The researchers explored several terrains includ- nals from a hexapod robot were passed to them. There
ing carpet, coated asphalt, and asphalt. The solution was was a significant improvement of about 15% in classifica-
highly precise with high computation cost as it gener- tion accuracy when compared to the SVM method with
ated point clouds which needed to be further processed a Gaussian kernel.
digitally. The authors stated that there were opportuni-        The last reaction-based technique is based on the vibra-
ties to further investigate how a mobile robotic platform tion characteristics of the terrain. It was first suggested
could provide reliable and accurate surface prediction in [17] where the vibration signal was measured using
of the terrain for improving the navigation with prior an accelerometer during the robot’s traversal. In terms
knowledge of the surface. These research works have of performance, SVM has proven to be the best when
shown that there is strong motivation in investigating trained on hand crafted time domain features such as
methods to enable service robots to have perception on skewness, impulse factor, and root mean square (RMS),
the terrain and traversal surface.                           along with frequency-domain features from the discrete
   Several sensing methodologies have been developed to Fourier transform (DFT). Experiments using a CNN for
tackle the problem of terrain classification. The method- vibrational wheel slip estimation in ground robotics was
ology is typically categorized into two main groups: vision- carried out in [11]. The wheel torque, vertical accelera-
based and reaction-based techniques. Traditional vi- tion and degree of pitch were used to train the classifier.
sual feature engineering approaches include the scale- The difference of 10% for classification was obtained be-
invariant feature transform (SIFT) [8], speeded-up robust fore and after filtering the input data for the CNN which
features (SURF) [9], and the bag of visual words (BOVW) reinforces the generality of deep learning frameworks
[10], among many others. These algorithms pass the in extracting meaningful information directly from raw
useful features of the images obtained from light detec- input vibration data.
tion and ranging (lidar) or stereo camera to a classifier to
be trained and classified. In [11], raw grayscale terrain
images were trained using deep convolutional network 3. Methodology
and the accuracy was 6% less than the support vector
                                                             The mobile robot used in this research is a two wheel dif-
machine (SVM) classifier used jointly with the histogram
                                                             ferential drive with an attached 6-axis accelerometer that
of gradients (HOG) feature extractor.
                                                             measures six vibrational terrain signatures. The setup
   While vision-based approaches are useful because of
                                                             is shown in Fig. 2. The form is very similar to the con-
their high accuracy, they are vulnerable to distortion
                                                             ventional indoor service robots such as robotic vaccuum
caused by lighting changes and other factors such as
                                                             cleaner. The vibration characteristics are all dependent
the realization of the surface’s physical properties (e.g.
                                                             on the terrain’s texture/material and robot movement.
material type and degree of hardness) [12]. Reaction-
                                                             This study primarily aims to address terrain classifica-
based techniques, on the other hand, utilize sensor mea-
                                                             tion by utilising raw time-series vibration data as input
surements to obtain either the acoustics, haptics, or the
                                                             to three implemented deep learning frameworks: LSTM,
vibration profiles for the classification. Acoustic-based
                                                             CNN, and a CNN-LSTM architectures. An overview of
classification relies on the use of microphone to record
                                                             the experiment workflow is given in Fig. 1.
the sound of signal generated between the robot and
                                                                The data set used in this study contains a total of 24000
terrain during traversal. Noise removal and smoothing
                                                             samples distributed evenly from five different terrain
techniques are necessary in traditional acoustic-based
                                                             sources. The six features includes the lateral, longitu-
classification to achieve satisfactory results. This is due
                                                                                                                                   239


Figure 1: Experiment workflow.


dinal, and vertical accelerations and angular velocities            tion. The vibration samples were then segmented into
(𝑎𝑥 , 𝑎𝑦 , 𝑎𝑧 , 𝑔𝑥 , 𝑔𝑦 , 𝑔𝑧 ) of the traversing robot. The setup   fixed windows of 1.5 seconds (75 samples). An overlap
is shown in Fig. 2. Fig. 3 illustrates the five different           rate of 20% was applied between two consecutive 1.5
vibration signals corresponding to the surface type. The            second segments to conserve the temporal dependencies
vibration samples were collected via I2C using IMU unit             between the time steps in the vibration sequence. One-
MPU-6050 containing both an accelerometer and gyro-                 hot encoding was then performed to map the different
scope integrated in a single chip. The controlled condi-            labelled surfaces numerically. Lastly, the vibration data
tions for the wheeled robot are: 50 Hz sampling rate, 1.6           set was split into training, validation, and testing sets to
minutes traversal time per surface, and circular motion             allow the neural networks to generalize with the unseen
of the robot.                                                       vibration characteristics. These data set partition were
                                                                    set to be 70%, 15% and 15% for training, validation, and
                                                                    testing, respectively.

                                                                    3.1. Implementation
                                                         LSTM is a type of recurrent neural network (RNN) that
                                                         is typically used for sequence prediction. In particu-
                                                         lar, LSTM solves the issues of the disappearing gradient
                                                         present in the RNNs while allowing the long-term tem-
                                                         poral dynamics of the series to be exploited. In contrast,
                                                         CNNs have been commonly used for 2D problems (e.g.
                                                         image classification task); however, it can be modified
                                                         to classify the 1D vibrational problem. The dimension-
Figure 2: Experimental Setup.                            ality of the convolutional layers is reduced to match the
                                                         model’s 1D input.
                                                            The CNN-LSTM model leverages the robustness of
   Vibration samples must be converted into an appro-
                                                         CNN in extracting spatial features and LSTM in exploit-
priate format before entering the neural networks. Also,
                                                         ing the temporal dependencies of the vibration sequence.
as the measurements contain multiple units, the range of
                                                         In this paper, the time-series vibration is downsampled
vibration samples must be normalised to a mean of zero
                                                         by the 1D CNN to extract the higher level features. This
and a variance of one. The equation for normalization is
                                                         can be considered as the pre-processing step which al-
given by
                              𝑥𝑖 − 𝜇                     lows the LSTM to interpret the features extracted at each
                       𝑠𝑖 , =        ,               (1) block of the sequence. The concept is illustrated in Fig.
                                 𝜎
where i is the index of the element from the vibration 4.
sequence, 𝜇 is the average, and 𝜎 is the standard devia-    The three models were built and trained using a Ten-
                                                                                                                           240


Figure 3: The five different terrain vibration signals.


                                                               Table 1
                                                               Overview of the architecture used in this study
                                                                                  Layer                 Output shape
                                                                                  LSTM (20 units)       (75, 20)
                                                                                  Dropout (25%)         (75, 20)
                                                                   LSTM           LSTM (70 units)       70
                                                                                  Dropout (40%)         70
                                                                                  Dense                 112
                                                                                  Dense                 5
                                                                                  Conv1D (80@6×1)       (70, 80)
Figure 4: Time slice processing for CNN-LSTM                                      Dropout (50%)         (70, 80)
                                                                                  Conv1D                (65, 128)
                                                                                  (128@6×1)
                                                                   1D CNN         Dropout(50%)          (65, 128)
sorflow backend with the Keras API. A detailed overview                           Max pooling           (32, 128)
of the three models was summarized in Table 1. The                                Flatten               4096
hyperband algorithm was used to select the hyperparam-                            Dense                 96
eters allowing for the best balance between training time                         Dense                 5
                                                                                  Conv1D (96@6×1)       (3, 20, 96)
and accuracy. The learning rate, batch size, and number
                                                                                  Conv1D (48@6×1)       (3, 15, 48)
of epochs were set at 0.001, 64, and 30, respectively. Addi-
                                                                                  Dropout (30%)         (3, 15, 48)
tionally, early stopping regularization was implemented                           Max pooling           (3, 7, 48)
to avoid overfitting during model training. Further, the           CNN-LSTM       Flatten               (3, 336)
Adam optimization algorithm based on the Stochastic                               LSTM (20 units)       60
gradient descent was used as the optimizer.                                       Dropout (20%)         60
   The implemented CNN-LSTM architecture is shown                                 Dense                 96
in Fig. 5. For both the LSTM and 1D CNN networks,                                 Dense                 5
the data length of a vibration training sample was a flat
vector of 75 time steps. In a stacked LSTM network, the
input sequence to the first LSTM layer returns a shape         to retain its temporal representation during the convo-
of (timestep, unit) to be passed on to the next layer. The     lution process. The time distributed layer expects a 3D
output from the last LSTM layer returns only the unit.         input and so the input sequence was reshaped from 75u
For the 1D CNN, the input shape to the network is repre-       time steps into 3 subsequences of 25 time steps. The con-
sented as (timestep, features). In the case for the CNN-       volutional layer used the ReLU activation and consisted
LSTM network, a time distributed wrapper is first used         of a 6 × 1 kernel that moves across in one dimension
before the LSTM layers to allow the input vibration signal     during the convolution operation.
                                                                                                                               241


Figure 5: Implemented CNN-LSTM model for vibration-based terrain classification


   The dropout layers were then added to tackle overfit-             Fig. 7 illustrates the performance of the models on the
ting issues by arbitrarily setting a fraction rate of input       5-class vibration test data set. In the worst case, about
units to zero. The pooling layer was added to reduce the          7% (average) of the wood class was mistakenly classified
spatial size of the output representation into half. Note         as tiles across the three models. Overall, it can be seen
that both the dropout and pooling layer allows for faster         that the three models exhibited good performance and
training time due to the reduced parameter size. The              generalized well with the unseen data. Further, the CNN-
flatten layer was used to transform the input from the            LSTM architecture has the best performance with F1
previous layers as input to the LSTM layer where the              score of 98.49% (average). The 1D CNN follows at the
temporal characteristics of the vibration sequence were           second place with F1 score of 97.49% (average). We note
extracted. Lastly, the fully connected layers with the soft-      that the slight improvement of the CNN-LSTM model
max activation function was used to structure the outputs         compared to the 1D CNN may suggest that the temporal
of the previous layer for the final classification task. In       characteristics of the LSTM is less important than the
this experiment, the categorical cross-entropy loss func-         feature generation capability of the CNN-LSTM. Table 4
tion was used to address the 5-class terrain classification       summarizes the overall performance of the three models.
problem.
                                                                  Table 2
                                                                  Average Precision, Recall, and F1 scores (Based on testing
4. Results                                                        data)
The confusion matrix of the three models are depicted in               Model          Precision   Recall      F1 Score
Fig. 6. From the confusion matrix, we can calculate the                LSTM           96.23%      96.00%       95.98%
F1 score, the precision 𝑃𝑟 , and the recall, 𝑅𝑐 . The F1 score,        1D CNN         97.53%      97.50%       97.47%
which is calculated using 𝑃𝑟 and 𝑅𝑐 , has been commonly                CNN-LSTM       98.60%      98.50%       98.49%
used to analyze the performance of the models. We used
macro-averaging technique to expand these benchmarks                 One factor influencing the performance of the models
towards multi-class terrain classification. The equations         is the sequence length of the input vibration. To fur-
for the precision, recall, and F1 score, respectively are         ther validate the performance of the three models, we
given by                                                          analyzed the F1 score with varying segment lengths as
                                                                  shown in Fig. 8. It can be seen that a longer sequence
                                 𝑇𝑃                               length results in a better accuracy with the cost of per-
                       𝑃𝑟 =            ,                   (2)
                              𝑇𝑃 + 𝐹 𝑃                            formance saturation at a certain length. It can be shown
                                                                  that the average F1 score rises as the duration of the
                                 𝑇𝑃
                       𝑅𝑐 =            ,                   (3)    vibration series increases from 30 to 60 samples but de-
                              𝑇𝑃 + 𝐹 𝑁                            creases afterwards. This may be caused by the lack of the
                               2𝑃𝑅 𝑅𝐶                             training data after the segmentation process of the given
                       𝐹1 =            ,                   (4)    length. Therefore, we can consider an optimal sequence
                              𝑃𝑅 + 𝑅 𝐶
                                                                  length of 75 (1.5 seconds) for obtaining high accuracy and
where 𝑇𝑃 shows the outcome where the model correctly              generalization of the models. Furthermore, the proposed
classifies the positive class, 𝐹𝑃 is the outcome where the        CNN-LSTM architecture slightly outperformed the CNN
model incorrectly classifies the positive class, 𝑇𝑁 is the        and LSTM models across the varying segment lengths.
outcome where the model correctly classifies the negative
class, and 𝐹𝑁 is the outcome where the model incorrectly
classifies the negative class.
                                                                                                                          242


                                                             Figure 7: F1-scores for the three architectures


                                                             Figure 8: Average F1 Score at varying segment length


                                                             5. Conclusion and Future Work
                                                             In this paper, we have demonstrated the application of
                                                             IMU-based surface classification task. We have compared
                                                             three candidates for classifying the IMU data, i.e. LSTM,
                                                             1D CNN, and a combination of CNN and LSTM. By com-
                                                             paring the results, CNN-LSTM provided the best results
                                                             (F1 score of 98.49%). However, we can further observe
                                                             that the 1D CNN presented favorable results although
                                                             slightly lower than the CNN-LSTM. The results suggest
                                                             that 1D CNN is able to map the classification better when
                                                             compared to the LSTM on standalone basis. CNN and
                                                             LSTM works on different principle in which the latter is
                                                             based on the temporal dynamics of the data. On the other
                                                             hand, 1D CNN is based on static convolution, similar to
                                                             the 2D counterparts. This implies that there is a clear
                                                             static pattern when the IMU data enabling well defined
                                                             mapping to their respective classes.
                                                                The results, despite counter intuitive, may prompt fur-
Figure 6: Confusion matrices of (a) LSTM, (b) CNN, and (c)   ther research in the this direction. With the growing
CNN-LSTM on the vibration test dataset.                      of edge computing and capacity of embedded system,
                                                             enabling robots to recognize surface would enable fur-
                                                             ther applications for indoor or industrial applications.
                                                                                                                                                                243


Reducing the complexity of the machine learning models                         [9] Seung-Youn Lee, D. Kwak, A terrain classification
to further benefit in terms of computation reduction is                            method for ugv autonomous navigation based on
required. This is possible given that there is a clear static                      surf, in: 2011 8th International Conference on Ubiq-
pattern demonstrated from the results using 1D CNN.                                uitous Robots and Ambient Intelligence (URAI),
                                                                                   2011, pp. 303–306. doi:1 0 . 1 1 0 9 / U R A I . 2 0 1 1 . 6 1 4 5 9 8 1 .
                                                                              [10] H. Wu, B. Liu, W. Su, Z. Chen, W. Zhang, X. Ren,
Acknowledgments                                                                    J. Sun, Optimum pipeline for visual terrain classi-
                                                                                   fication using improved bag of visual words and
This research was supported by Ministry of Research and
                                                                                   fusion methods, Journal of Sensors 2017 (2017).
Technology/National Agency for Research and Innova-
                                                                              [11] R. González, K. Iagnemma, Deepterramechanics:
tion, Republic of Indonesia through Penelitian Dasar Ung-
                                                                                   Terrain classification and slip estimation for ground
gulan Perguruan Tinggi (PDUPT) Grant, contract num-
                                                                                   robots via deep learning, CoRR abs/1806.07379
ber: NKB-2838/UN2.RST/HKP.05.00/2020, year 2020.
                                                                                   (2018). URL: http://arxiv.org/abs/1806.07379.
                                                                              [12] P. Roy, S. Ghosh, S. Bhattacharya, U. Pal, Effects of
References                                                                         degradations on deep neural network architectures,
                                                                                   CoRR abs/1807.10108 (2018). URL: http://arxiv.org/
 [1] J. G. Bellingham, K. Rajan, Robotics in remote                                abs/1807.10108. a r X i v : 1 8 0 7 . 1 0 1 0 8 .
     and hostile environments, Science 318 (2007)                             [13] J. Libby, A. J. Stentz, Using sound to classify vehicle-
     1098–1102.                                                                    terrain interactions in outdoor environments, in:
 [2] V. Jorge, R. Granada, R. Maidana, D. Jurak, G. Heck,                          2012 IEEE International Conference on Robotics
     A. Negreiros, D. dos Santos, L. Gonçalves, A. Amory,                          and Automation, 2012, pp. 3559–3566. doi:1 0 . 1 1 0 9 /
     A survey on unmanned surface vehicles for disas-                              ICRA.2012.6225357.
     ter robotics: Main challenges and directions, Sen-                       [14] A. Valada, L. Spinello, W. Burgard, Deep feature
     sors 19 (2019) 702. URL: http://dx.doi.org/10.3390/                           learning for acoustics-based terrain classification,
     s19030702. doi:1 0 . 3 3 9 0 / s 1 9 0 3 0 7 0 2 .                            in: International Symposium on Robotics Research
 [3] K. K. Chung, K. W. Grathwohl, R. K. Poropatich,                               (ISRR), 2015.
     S. E. Wolf, J. B. Holcomb, Robotic telepresence:                         [15] X. A. Wu, T. M. Huh, R. Mukherjee, M. Cutkosky, In-
     past, present, and future, Journal of cardiothoracic                          tegrated ground reaction force sensing and terrain
     and vascular anesthesia 21 (2007) 593—596.                                    classification for small legged robots, IEEE Robotics
 [4] R. Chaichaowarat, W. Wannasuphoprasit, Wheel                                  and Automation Letters 1 (2016) 1125–1132. doi:1 0 .
     slip angle estimation of a planar mobile platform,                            1109/LRA.2016.2524073.
     in: 2019 First International Symposium on In-                            [16] J. Bednarek, M. Bednarek, L. Wellhausen, M. Hut-
     strumentation, Control, Artificial Intelligence, and                          ter, K. Walas, What am i touching? learning to
     Robotics (ICA-SYMP), 2019, pp. 163–166. doi:1 0 .                             classify terrain via haptic sensing, in: 2019 In-
     1109/ICA- SYMP.2019.8646198.                                                  ternational Conference on Robotics and Automa-
 [5] S. D. A. P. Senadheera, A. M. H. S. Abeykoon, Sen-                            tion (ICRA), 2019, pp. 7187–7193. doi:1 0 . 1 1 0 9 / I C R A .
     sorless terrain estimation for a wheeled mobile                               2019.8794478.
     robot, in: 2017 IEEE International Conference on                         [17] C. A. Brooks, K. Iagnemma, Vibration-based ter-
     Industrial and Information Systems (ICIIS), 2017,                             rain classification for planetary exploration rovers,
     pp. 1–6. doi:1 0 . 1 1 0 9 / I C I I N F S . 2 0 1 7 . 8 3 0 0 4 2 2 .        IEEE Transactions on Robotics 21 (2005) 1185–1191.
 [6] D. Masha, M. Burke, b. Twala, Slip estimation meth-                           doi:1 0 . 1 1 0 9 / T R O . 2 0 0 5 . 8 5 5 9 9 4 .
     ods for proprioceptive terrain classification using
     tracked mobile robots, in: International Conference
     (PRASA-RobMech), 2017, pp. 150–152.
 [7] S. Wilson, J. Potgieter, K. Arif, Floor surface map-
     ping using mobile robot and 2d laser scanner, in:
     2017 24th International Conference on Mechatron-
     ics and Machine Vision in Practice (M2VIP), 2017,
     pp. 1–6. doi:1 0 . 1 1 0 9 / M 2 V I P . 2 0 1 7 . 8 2 1 1 5 0 8 .
 [8] S. Zenker, E. E. Aksoy, D. Goldschmidt, F. Wörgöt-
     ter, P. Manoonpong, Visual terrain classification
     for selecting energy efficient gaits of a hexapod
     robot, in: 2013 IEEE/ASME International Confer-
     ence on Advanced Intelligent Mechatronics, 2013,
     pp. 577–584. doi:1 0 . 1 1 0 9 / A I M . 2 0 1 3 . 6 5 8 4 1 5 4 .