=Paper=
{{Paper
|id=Vol-2841/DARLI-AP_7
|storemode=property
|title=Profiling industrial vehicle duties using CAN bus signal segmentation and clustering
|pdfUrl=https://ceur-ws.org/Vol-2841/DARLI-AP_7.pdf
|volume=Vol-2841
|authors=Silvia Buccafusco,Andrea Megaro,Luca Cagliero,Francesco Vaccarino,Lucia Salvatori,Riccardo Loti
|dblpUrl=https://dblp.org/rec/conf/edbt/BuccafuscoMCVSL21
}}
==Profiling industrial vehicle duties using CAN bus signal segmentation and clustering==
<pdf width="1500px">https://ceur-ws.org/Vol-2841/DARLI-AP_7.pdf</pdf>
<pre>
         Profiling industrial vehicle duties using CAN bus signal
                       segmentation and clustering
               Silvia Buccafusco                                          Andrea Megaro                                 Luca Cagliero
               Politecnico di Torino                                   Politecnico di Torino                         Politecnico di Torino
                    Turin, Italy                                            Turin, Italy                                  Turin, Italy
           silvia.buccafusco@polito.it                              andrea.megaro@asp-poli.it                       luca.cagliero@polito.it

            Francesco Vaccarino                                           Lucia Salvatori                               Riccardo Loti
            Politecnico di Torino                                            Tierra spa                                   Tierra spa
                 Turin, Italy                                               Turin, Italy                                  Turin, Italy
        francesco.vaccarino@polito.it                                 lsalvatori@topcon.com                       rloti@tierratelematics.com

ABSTRACT                                                                               learning techniques in order to support fleet managers’ decisions
Industrial vehicles working in construction sites show rather                          (e.g., [16, 17, 22, 23]).
heterogeneous usage patterns. Depending on its type, model, and                           To optimize vehicle usage fleet managers commonly need to
context of usage, the vehicle workload may vary from light to                          monitor the time spent by the vehicles in specific duties. Engine
heavy with variable periodicity. Duties summarize the current                          duties describe the current state of a vehicle and are usually
state of a vehicle according to its usage level. They are usually set                  classified as (i) long idle, which indicates that the vehicle has
up manually vehicle by vehicle according to the specifications                         been stationary and under a minimal workload level for a rela-
of the manufacturer. To automate the definition of per-vehicle                         tively long period, (ii) idle, which indicates that the vehicle has
duty levels, this paper explores the use of clustering techniques                      been stationary and under a minimal workload for a short pe-
applied to CAN bus signals. It first performs a segmentation of                        riod, (iii) moving/working, which indicates non-stationary vehicle
the CAN bus signals to identify specific working cycles. Then, it                      usage with light workload, (iv) light workload, which indicates
clusters the segments to support the definition of vehicle-specific                    non-stationary vehicle usage with light workload, and (v) heavy
duty levels. The preliminary results, acquired on real vehicle                         workload, which indicates non-stationary vehicle usage with in-
usage data, show the applicability of the proposed approach.                           tensive workload. However, due to the high vehicle heterogeneity
                                                                                       over models, types, and context of usage (e.g., ground type, use
                                                                                       of vehicle equipment) duties are commonly defined manually by
1    INTRODUCTION                                                                      domain experts separately for each vehicle. This is not efficient,
The fleets of industrial vehicles that are commonly employed in                        particularly time-consuming, and prone to errors.
construction sites by public and private enterprises show rather                          To make the process of defining per-vehicle duty levels more
variable usage patterns. For example, refuse compactors, which                         efficient and effective, we propose to apply a clustering-based
are usually employed in dumps, drive few kilometers per day                            approach to the acquired CAN bus signals related to a shortlist
and work at light workload 24/7 for relatively long periods. Road                      of Suspect Parameter Numbers (SPNs). To this end, we make
rollers and tandem rollers, which are frequently used in road                          a preliminary attempt to directly cluster the raw SPN series in
maintenance, drive few kilometers per day as well, but work at                         order to assign approximated, pointwise per-vehicle duties. For
relatively heavy workload only for short periods. Conversely,                          instance, Figure 1 shows three examples of SPNs (i.e., engine
forklift trucks, which are employed in warehouses, drive many                          speed, fuel tank level, engine percent load) corresponding to
kilometers per day, work most of the time at light workload, and                       a representative vehicle. The idle and working states defined
accomplish specific tasks at heavy workload (e.g., the lift of a                       by setting the usage level thresholds inferred according to the
heavy pallet).                                                                         outcomes of the clustering algorithms are colored in green and
   The advent of Controller Area Network (CAN) bus technol-                            red, respectively. Although they discriminate between instants
ogy [11] has provided fleet managers with a huge amount of data                        of heavy and light workloads, they do not trace the underlying
useful for monitoring and analyzing vehicle usage. The CAN                             trends in temporal duty variations. Hence, the results is hardly
bus allows communication among the electronic control unit                             usable by domain experts.
devices on board the vehicle. It provides direct access to vari-                          To get more precise and stable duty state levels, we devise a
ous signals describing the vehicle state. CAN bus data usually                         refined clustering strategy that groups fixed-length segments of
consist of raw time series, which are sampled and aggregated                           CAN bus signals according to ad hoc descriptive features in both
before being transmitted to a central repository. Data regard fuel                     the frequency and temporal domains. Segments are produced by
consumption, vehicle movements (e.g., accelerations and drifts),                       applying a motif discovery algorithm on the aligned and synchro-
engine conditions (e.g., RPM, oil and coolant temperature), route                      nized version of the raw SPN series. Figure 2 shows the output
characteristics (e.g., slope), and alarms. Domain experts can thus                     of the refined process, which exhibits the newly defined duties.
monitor the vehicle state by acquiring, collecting, and analyzing                      The new states appear to be less susceptible to temporary usage
vehicle-specific CAN bus data through data mining and machine                          level variations thus becoming usable for profiling vehicle usage.
                                                                                          The results were validated on real vehicle usage data acquired
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   by a multinational company providing telematics services. The
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)       validation phase included qualitative and quantitatively analyses.
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)
                                                                                       The latter relied on both established clustering validity indices [2]
                                                                                       and on a comparison between the assigned duty states and the
expected output according to the National Marine Electronics                                  Table 1: Analyzed SPNs.
Association (NMEA) 0183 messages data [3].
                                                                                 SPN code (SAE J1939)                    Description
                                                                                          81            Engine diesel particulate filter inlet pressure
                                                                                          90                  Power takeoff oil temperature
                                                                                          94                   Engine fuel delivery pressure
                                                                                         110                   Engine coolant temperature
                                                                                         114                        Net battery current
                                                                                         123                          Clutch pressure
                                                                                         164                 Engine injection control pressure
                                                                                         182                          Engine trip fuel
                                                                                         183                          Engine fuel rate
                                                                                         190                           Engine speed
                                                                                         524                    Transmission selected gear
                                                                                         975                   Estimated percent fan speed
                                                                                        1638                      Hydraulic temperature
                                                                                   Custom Number                    Engine percent load
                                                                                   Custom Number                     Front plow switch
Figure 1: SPNs colored according to the rough duty levels                          Custom Number                    Rear hitch position
(green=idle, red=working). Point-wise clustering                                   Custom Number                      Charge pressure
                                                                                   Custom Number          Amount of particulate matter C method
                                                                                   Custom number                       Digging depth
                                                                                   Custom number                      Fuel Tank Level


                                                                         engineering strategy to extract undisclosed information about
                                                                         CAN bus configuration. Similar to the present work, the afore-
                                                                         said research study aims at analyzing vehicle usage via CAN bus
                                                                         signal analysis. However, the research objective is substantially
                                                                         different.

                                                                         3   DATA OVERVIEW
Figure 2: SPNs colored according to the refined duty levels              Data were acquired from an experimental CAN bus data logger,
(gray=idle, green=moving, red=working). Segment-wise                     which was installed on a test farm tractor working in a construc-
clustering                                                               tion site. Data were provided by Tierra S.p.A, a multinational
                                                                         company operating in the IoT sector and internationally recog-
   The rest of the paper is organized as follows. Section 2 overviews    nized for providing to their customers sophisticated and reliable
the related literature. Section 3 describes the analyzed data. Sec-      telematics solutions for management, maintenance, and remote
tions 4 and 5 present the data preparation and mining phases,            diagnostics of equipment.
respectively. Section 6 summarizes the empirical results, whereas            The test vehicle is equipped with a large amount of sensors,
Section 7 draws conclusions and discusses the future develop-            which capture CAN parametric messages at a high frequency
ments of this research.                                                  (up to 100 Hz). Messages were gathered and temporally stored
                                                                         on an SD card which manages data transmission to the cloud
2    RELATED WORK                                                        infrastructures. Then, raw data were decoded and transformed
Clustering techniques have already been applied to analyze CAN           using the SAE J1939 protocol (https://www.sae.org/), which is
Bus data acquired from vehicles. Examples of applications include,       established for heavy-duty vehicle manufacturers and provides a
amongst other, (i) the optimization of vehicle routes (e.g., [5]),       shared set of standard messages and conversion rules.
(ii) the identification of driver intentions based on trajectory             Customers of the telematics service provider can visualize and
analysis (e.g., [21]), (iii) the characterization of drivers’ behavior   process in real time the converted data. Among the available
(e.g., [7, 15]), (iv) the management of single vehicles and vehicles’    vehicle usage indicators, the times spent by the vehicle in each
fleets (e.g., [9]). The present work belongs to the latter category.     duty (i.e., long idle, idle, moving/working, heavy workload) are
To the best of our knowledge, this is the first attempt to automate      among the mostly commonly used to optimize vehicle mainte-
the process of assigning per-vehicle duty levels based on CAN            nance, production, business, and investments [20]. Unfortunately,
bus signal clustering.                                                   the threshold levels used to define the vehicle states are not stan-
    In [9] the authors focused on explaining clusters mined from         dardized since they depend on the particular vehicle model, type,
multivariate time series data over different time scales and gran-       and context of usage. Hence, typically, their setup is manually
ularity. As application scenario, they analyzed the usage profile        performed by domain experts. This prompts the need for data-
of vehicles travelling across urban areas with the aim at planning       driven approaches to automatically inferring the most suitable
and supporting maintenance operations. To this purpose, they             duty levels separately for each industrial vehicle.
used a Gaussian mixture model to identify clusters on top of                 The acquired dataset consists of the SPN series acquired from
a subset of features extracted from the raw series according to          the test vehicle from November 7, 2019 to April 15, 2020. Each
a sliding window strategy. Rather than extracting aggregated             observation is described by SPN name, acquisition timestamp, and
statistics for all series over some time window and then iden-           measurement. The dataset collects 20 different SPNs describing
tify clusters as indicators of more abstract states, our approach        the state of the vehicle engine such as engine speed, percent load,
aims at partitioning the series to detect specific vehicle duties.       fuel rate, coolant temperature, fuel delivery pressure (see the full
In a nutshell, we cluster segments of SPN series instead of con-         list in Table 1). A more thorough SPN description can be found
cise series representation. In [10] the authors proposed a reverse       at www.sae.org/standards.
Figure 3: Extracts of the SPN series of engine speed and
engine coolant temperature.

                                                                          Figure 4: Pearson correlation between SPN pairs.
   The acquired SPN series are highly heterogeneous, not syn-
chronized with each other, and partly noisy. For example, Figure 3
shows two extracts of SPN series, i.e., the engine speed and the        Next, to reduce the potential bias due to the contemporary
engine coolant temperature series. The former has a sampling         presence of correlated SPNs describing related components of the
rate of 50 Hz, whereas the latter 1 Hz. Furthermore, the series      same physical system, we perform also a preliminary correlation
periodicity has different granularity over the working cycles.       analysis of the SPN series.
                                                                        Figure 4 shows the Pearson’s correlation. It clearly indicates
4   DATA PREPARATION                                                 the presence of a group of highly correlated SPNs describing the
To prepare the raw CAN bus data to the subsequent analyses we        status of the vehicle engine namely engine speed, engine percent
apply the following steps.                                           load, engine fuel rate and fuel tank level. Hereafter, we will focus
                                                                     our analyses on a group representative, i.e., the engine speed.
   Data cleaning. To avoid introducing a bias in the clustering
                                                                        Finally, we analyze the spectral content of the SPN signals.
process, we removed missing values (due, for instance, to failures
                                                                     Signals characterized by slow variations are disregarded in the
in data acquisition and transmission) and properly managed the
                                                                     following analyses since they incorporate most of their informa-
presence of noise, decoding errors, and inconsistencies in the
                                                                     tion in correspondence of frequencies close to 0 and their spectral
SPNs series values. Specifically, for each pair timestamp and
                                                                     content can be approximated by the temporal average value of
SPN we computed an average value to bound data points to
                                                                     the signal. Notice that slow signal variations can be due to either
the feasible and operative range of the corresponding measured
                                                                     the intrinsic nature of the considered measure (e.g., for the SPN
physical quantity.
                                                                     related to the transmission selected gear) or to the limited sensi-
   Working cycle identification. CAN messages are transmitted        tivity of measurement instrument (e.g., for the SPNs related to
only when a vehicle is on. To analyze vehicle duties it can be       charge pressure and engine fuel delivery pressure).
useful to understand whether a vehicle has been turned off at the
end of a working cycle or for any other reason. For this reason,     5    PROFILE VEHICLE USAGE
at the vehicle restart we analyze the value of the engine coolant    To identify vehicle duties we analyze vehicle usage data by means
temperature as it indicates, to a good approximation, when a         of clustering techniques. Specifically, the SPN series are first
vehicle has been turned off for a sufficiently long time.            synchronized and segmented into fixed-length intervals. Each
   Series alignment and synchronization. CAN bus messages are        segment is described by specific features. Then, segments are clus-
asynchronously transmitted at variable rates over the network.       tered into homogeneous groups. The clustering outcomes allow
For example, SPNs such as engine speed, engine percent load and      domain experts to empirically set up per-duty levels associated
charge pressure are transmitted quite frequently (sampling rate      with each SPN.
between 20 Hz and 50 Hz), whereas engine coolant temperature
                                                                          Time series segmentation. Time series segmentation entails
and engine delivery fuel pressure are sent less frequently (rate
                                                                     defining a partition of the input series 𝑋 (𝑡) into into 𝑘 segments
between 1 Hz to 2 Hz). Hence, to enable SPN series clustering we
                                                                     𝑆 1 , 𝑆 2 , . . ., 𝑆𝑘 , each one characterized by a distinct time span
re-align and synchronize all the analyzed SPN series. To this aim,
                                                                     [𝑡𝑠𝑡𝑎𝑟𝑡 ,𝑡𝑒𝑛𝑑 ]. Since vehicle usage is described by multiple SPN
CAN bus signals are first linearly interpolated in the temporal
                                                                     series, the segmentation problem is extended to a multivariate
domain and then down-sampled in frequency domain to the least
                                                                     model, i.e., given the time series 𝑋 1 (𝑡), 𝑋 2 (𝑡), . . ., 𝑋𝑛 (𝑡) corre-
average sampling rate to align multiple signals (using standard
                                                                     sponding to SPNs 𝑆𝑃𝑁 1 , 𝑆𝑃𝑁 2 , . . ., 𝑆𝑃𝑁𝑛 , respectively, we parti-
anti-aliasing filters and down-sampling operators).
                                                                     tion them series into 𝑘 segments, where the same partition holds
   SPNs selection. We select the subset of SPNs that most likely     for all the considered series. To deal with correlated series, the
influence the vehicle duty. To this purpose, we firstly filter out   input series can be preprocessing using Principal Component
all the SPNs providing less relevant information. For example,       Analysis [1] with the aim at collapsing the underlying SPN sub-
according to the manufacturers’ specification the engine trip fuel   trends that are highly correlated with each other into a separate
was deemed as irrelevant to our purposes and thus discarded.         component.
   For the sake of simplicity, we address time series segmentation     duties can be easily inferred). The number of desired clusters is
using an established motif discovery algorithm [13]. Motifs are        an input parameter, which can be specified by the domain experts.
recurring sub-series within a reference time series. The algorithm     We set up this parameter in an empirical way by assessing the
first splits each of the original time series into fixed-length seg-   clustering results according to established cluster validity indices.
ments and then compares pairs of segments to select the top most       Specifically, to choose the best algorithm and the number of de-
similar pairs. The segment length can vary within a range [𝐿𝑚𝑖𝑛 ,      sired clusters we empirically assessed the performance achieved
𝐿𝑚𝑎𝑥 ]. Both the segment length range and the distance measure         by multiple runs of different clustering algorithms by varying
used to generate the motif are configurable by domain experts.         the number of desired clusters 𝑘 (see Section 6).
In our experiments, we varied the segment length between 2
minutes and 10 minutes and evaluated the similarity between               Duty level identification. On top of the cluster outcomes the
segments via Euclidean distance.                                       levels associated with each vehicle duty can be identified. To this
   To empirically identify the most appropriate segment length,        aim, the SPN segments associated with the same cluster are fur-
we discretized the segment length range into 1-minute bins,            ther split into sub-groups characterizing similar usage patterns.
counted the number of motifs per length, and selected the length       For example, sub-groups allow us to distinguish between vehicles
maximizing that count. The lower bound of the segment length           in an idle state and vehicles that keep moving steadily.
range (2 minutes) turned out to be the most appropriate time scale.
The corresponding 2405 segments will be hereafter considered           6     EXPERIMENTAL RESULTS
in the reported analyses.                                              We carried out an empirical analysis of the proposed methodol-
                                                                       ogy on the real vehicle usage data provided by Tierra SpA. The
   Per-segment feature extraction. For each segment we extract
                                                                       experiments were run on an Intel(R) Core(TM) i5-8250U machine
a subset of features that describe time series shape and values’
                                                                       equipped with 8 GB of RAM and running Windows 10 64-bit.
distribution. since the aim is to characterize the general shape
                                                                          The summary of the experimental results is organized as fol-
of each segment in terms of its variations and their correspond-
                                                                       lows. Firstly, we compare the performance of different clustering
ing rapidity and amplitudes, SPNs are analyzed in the frequency
                                                                       techniques according to the Silhouette validity index [2] and dis-
domain. To this purpose, for each SPN and segment the Fourier
                                                                       cuss the impact of the number of desired clusters on clustering
transform of the signal is applied by considering only the positive
                                                                       performance (see Section 6.1). Secondly, we quantitatively eval-
coefficients, as we exploit the symmetry of real signals Fourier
                                                                       uate the quality of the clustering outcome against the National
coefficients. Then, separately for low, medium, and high frequen-
                                                                       Marine Electronics Association (NMEA) 0183 messages data [3]
cies, the signal power value, the signal peaks, and the signal
                                                                       (see Section 6.2). Finally, we report a qualitative analysis of the
peaks frequencies are computed. Lastly, the signal mean (in the
                                                                       achieved results (see Section 6.3).
time domain) is considered as well.

                                                                       6.1     Comparison between different clustering
                                                                               techniques
                                                                       We tested various purely partitional clustering algorithms belong-
                                                                       ing to the following categories: (i) centroid-based, (ii) density-
                                                                       based, (iii) hierarchical, and (iv) shape-based. We considered two
                                                                       of the most renowned algorithms belonging to the centroid-based
                                                                       category (K-Means [14], which exploits the concept of cluster
                                                                       centroid, and Clara [12], based on medoids1 ) Density-based clus-
                                                                       tering group together samples located in dense regions and well
                                                                       separated from other regions. We exploited a Python implemen-
                                                                       tation of the well-known DBScan algorithm [6]. Hierarchical
                                                                       clustering produces nested clusters, which can be organized in a
                                                                       dendrogram. We exploited an implementation of an agglomera-
                                                                       tive algorithm [4]. Finally, shape-based clustering is tailored to
                                                                       time series clustering. We considered the K-Shape algorithm [18],
                                                                       which relies on series cross-correlation analyses. Notice that the
                                                                       latter algorithm is designed for univariate time series analysis.
Figure 5: Pearson correlation between pairs of per-
                                                                       It captures the similarities between sub-series independently of
segment features.
                                                                       the shift of the distinctive segments’ properties.
                                                                           We evaluated clustering performance according to the Silhou-
   Figure 5 shows the Pearson’s correlation between the pairs          ette score, which is an established validity index used to measure
of extracted features. According to the correlation values, the        of how similar a sample is to its own cluster compared to the
feature set is reduced to 3: (i) the power in low frequencies sub-     other clusters [2]. The score ranges from -1 (high separation) to
band, (ii) the peak value for high frequencies, and (iii) the peak     1 (high cohesion), i.e., the larger the better. For each algorithm
frequency in the high frequencies sub-band.                            we varied the configuration settings to find the best setting.
   Clustering. Clustering aims at grouping data samples that are           For all the tested algorithms we achieved the best results by
similar to one another and dissimilar from those assigned to           setting the number 𝑘 of desired clusters to 2. K-Means achieved
other groups. In this particular context, clustering algorithms are
exploited to group the SPN segments into homogeneous groups            1 Clara is an extension of the k-Medoid algorithm, which is able to scale towards

representing typical vehicle duties (or vehicle states for which       larger and more complex datasets.
the best overall performance (0.67), followed by the hierarchical        The preliminary results leave room for further improvements.
clustering (0.53), Clara (0.5), DBScan (0.49) and K-Shape (0.23).     First of all, the acquisition of CAN bus data from many test ve-
                                                                      hicles would allow us to extend the problem of vehicle duty
6.2    Evaluation based on the National Marine                        identification from a single vehicle to groups of similar vehicles.
                                                                      Secondly, a deeper analysis of the contextual information related
       Electronics Association messages
                                                                      to working site and the vehicle equipment would be useful for
We assessed the quality of the clustering outcomes using, as          further improving the accuracy of the duty level assignments
ground truth, the numerical score provided by the National Ma-        for effectively identifying the driving styles. In addition, as soon
rine Electronics Association (NMEA) 0183 messages data [3].           as new historical data become available, the framework will be
NMEA 0183 is a one-way serial data communication protocol             updated basing on results obtained on similar tasks and con-
used to send messages from the vehicle to external devices. Unlike    struction site conditions in the past. Finally, the assigned duties
CAN bus data, it provides fairly accurate GPS-related information     will be exploited to accomplish specific tasks, such predictive
such as the vehicle coordinates (latitude and longitude) and the      maintenance and anomaly detection[19].
vehicle speed. Conversely, GPS positions transmitted via CAN
bus messages are frequently characterized by relatively high          8    ACKNOWLEDGMENTS
measurement error [8]. Despite the accuracy of NMEA position
                                                                      The research leading to these results has been funded by the
was not guaranteed overall, we made a preliminary attempt to
                                                                      SmartData@PoliTO center for Big Data and Machine Learning
validate the ability of the proposed method to discriminate be-
                                                                      technologies and by Tierra Spa.
tween idle states and moving/working ones by comparing the
duty labels assigned via clustering against the assignment made
based on NMEA message (used as ground truth).
                                                                      REFERENCES
                                                                       [1] J. Abonyi, B. Feil, S. Németh, and P. Arva. 2004. Principal Component Analysis
    We achieved a 82.37% accuracy score, i.e., we correctly clas-          based Time Series Segmentation: A New Sensor Fusion Algorithm.
sified approximately 8 duties out of 10. The average recall and        [2] Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesus M. Perez, and Inigo
precision scores were 82.35% and 82.39%, respectively. The re-             Perona. 2013. An extensive comparative study of cluster validity indices.
                                                                           Pattern Recognition 46, 1 (2013), 243 – 256. https://doi.org/10.1016/j.patcog.
sults were quite promising, provided that the segments used for            2012.07.021
validation purposes were fairly balanced (50.81% of idle segments,     [3] National Marine Electronics Association. [n.d.]. NMEA 0183. Retrieved No-
                                                                           vember 11, 2020 from https://www.nmea.org/content/STANDARDS/NMEA_
49.19% of moving/working ones).                                            0183_Standard
    A deeper analysis of the wrongly labeled segments has shown        [4] Maria-Florina Balcan, Yingyu Liang, and Pramod Gupta. 2014. Robust Hierar-
that, in few cases, there were rapid and multiple changes in the           chical Clustering. J. Mach. Learn. Res. 15, 1 (Jan. 2014), 3831–3871.
                                                                       [5] Sahar Ebadinezhad, Ziya Dereboylu, and Enver Ever. 2019. Clustering-Based
engine speed associated with an idle state. This was probably due          Modified Ant Colony Optimizer for Internet of Vehicles (CACOIOV). Sustain-
to small errors in GPS readings. Therefore, these particular errors        ability 11, 9 (May 2019), 2624. https://doi.org/10.3390/su11092624
seem to be not due to imprecise vehicle duty level assignments.        [6] Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A
                                                                           Density-Based Algorithm for Discovering Clusters in Large Spatial Databases
                                                                           with Noise. In Proceedings of the Second International Conference on Knowledge
                                                                           Discovery and Data Mining (KDD’96). AAAI Press, 226–231.
6.3    Qualitative evaluation                                          [7] Umberto Fugiglando, Paolo Santi, Sebastiano Milardo, Kacem Abida, and Carlo
Figure 6 plots two representative segments belonging to three              Ratti. 2017. Characterizing the "Driver DNA" Through CAN Bus Data Analysis.
                                                                           In Proceedings of the 2nd ACM International Workshop on Smart, Autonomous,
different clusters. With the help of domain experts, we figured            and Connected Vehicular Systems and Services (CarSys ’17). Association for
out the underlying vehicle usage patterns. Specifically, cluster 1         Computing Machinery, New York, NY, USA, 37–41. https://doi.org/10.1145/
shows a working or heavy workload state and is characterized               3131944.3133939
                                                                       [8] Yong Heo, Thomas Yan, Samsung Lim, and Chris Rizos. 2009. International
by highly variable segments. It indicates an aggressive driving            standard GNSS real-time data formats and protocols.
style, with rapid accelerations and breaks. Cluster 2 shows a          [9] Anders Holst, Juhee Bae, Alexander Karlsson, and Mohamed-Rafik Bouguelia.
                                                                           2019. Interactive Clustering for Exploring Multiple Data Streams at Different
more stationary vehicle usage. The usage levels are compatible             Time Scales and Granularity. In Proceedings of the Workshop on Interactive
with either a stationary vehicle under medium workload or with             Data Mining (WIDM’19). Association for Computing Machinery, New York,
a non-stationary vehicle. Finally, cluster 3 contains slow vary-           NY, USA, Article 2, 7 pages. https://doi.org/10.1145/3304079.3310286
                                                                      [10] Thomas Huybrechts, Yon Vanommeslaeghe, Dries Blontrock, Gregory
ing segments in which the engine speed is oscillating around               Van Barel, and Peter Hellinckx. 2018. Automatic Reverse Engineering of
minimum levels of use. It likely denotes an idle duty.                     CAN Bus Data Using Machine Learning Techniques. In Advances on P2P,
   Figure 7 shows the number of segments per cluster (corre-               Parallel, Grid, Cloud and Internet Computing, Fatos Xhafa, Santi Caballé, and
                                                                           Leonard Barolli (Eds.). Springer International Publishing, Cham, 751–761.
sponding to the previous experiment). According to domain ex-         [11] Karl Henrik Johansson, Martin Törngren, and Lars Nielsen. 2005. Vehicle
perts’ opinion, the distribution is coherent with what expected,           Applications of Controller Area Network. Birkhäuser Boston, Boston, MA,
                                                                           741–765. https://doi.org/10.1007/0-8176-4404-0_32
since it corresponds to the actual usage of the test vehicle.         [12] Leonard Kaufman and Peter J. Rousseeuw. 1987. Clustering by means of
                                                                           medoids. , 405–416 pages.
                                                                      [13] Michele Linardi, Yan Zhu, Themis Palpanas, and Eamonn Keogh. 2018. Ma-
7     CONCLUSIONS AND FUTURE WORKS                                         trix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in
The paper explores the use of clustering techniques to profile             Data Series. SIGMOD ’18: Proceedings of the 2018 International Conference on
                                                                           Management of Data, 1053–1066. https://doi.org/10.1145/3183713.3183744
industrial vehicle usage in construction sites. The aim is to de-     [14] J. MacQueen. 1967. Some Methods for Classification and Analysis of Multi-
fine per-vehicle duties, which summarize the current state of              variate Observations. In Proceedings of the 5th Berkeley Symposium on Mathe-
                                                                           matical Statistics and Probability - Vol. 1, L. M. Le Cam and J. Neyman (Eds.).
the vehicle (e.g., idle, moving, heavy workload). Due to the high          University of California Press, Berkeley, CA, USA, 281–297.
heterogeneity of vehicles types, models, and usage contexts, du-      [15] C. Marina Martinez, M. Heucke, F. Wang, B. Gao, and D. Cao. 2018. Driving
ties are commonly detected by exploiting manually configured               Style Recognition for Intelligent Vehicle Control and Advanced Driver Assis-
                                                                           tance: A Survey. IEEE Transactions on Intelligent Transportation Systems 19, 3
threshold. To automate this process, we propose a data-driven              (2018), 666–676. https://doi.org/10.1109/TITS.2017.2706978
approach relying on CAN bus signals segmentation and cluster-         [16] Dena Markudova, Elena Baralis, Luca Cagliero, Marco Mellia, Luca Vassio,
ing. The clustering output achieved on real vehicle usage data             Elvio Gilberto Amparore, Riccardo Loti, and Lucia Salvatori. 2019. Heteroge-
                                                                           neous Industrial Vehicle Usage Predictions: A Real Case. In Proceedings of the
was validated with the help of domain experts.                             Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT 2019, Lisbon,
                                                 Figure 6: Representative segments for each cluster


                                                                                    [18] John Paparrizos and Luis Gravano. 2015. k-shape: Efficient and accurate
                                                                                         clustering of time series. In Proceedings of the 2015 ACM SIGMOD International
                                                                                         Conference on Management of Data. 1855–1870.
                                                                                    [19] Stefano Proto, Evelina Di Corso, Daniele Apiletti, Luca Cagliero, Tania
                                                                                         Cerquitelli, Giovanni Malnati, and Davide Mazzucchi. 2020. REDTag: A Pre-
                                                                                         dictive Maintenance Framework for Parcel Delivery Services. IEEE Access 8
                                                                                         (2020), 14953–14964. https://doi.org/10.1109/ACCESS.2020.2966568
                                                                                    [20] Lee A Schmidt and Lorenz Riegger. 2012. Automatic Detection of Machine
                                                                                         Status for Fleet Management. US Patent App. 13/341,500.
                                                                                    [21] D. Yi, J. Su, C. Liu, and W. Chen. 2019. Trajectory Clustering Aided Personalized
                                                                                         Driver Intention Prediction for Intelligent Vehicles. IEEE Transactions on
                                                                                         Industrial Informatics 15, 6 (2019), 3693–3702. https://doi.org/10.1109/TII.2018.
                                                                                         2890141
                                                                                    [22] Weiliang Zeng, Tomio Miwa, Wakita, and Takayuki Morikawa. 2015. Exploring
                                                                                         Trip Fuel Consumption by Machine Learning from GPS and CAN Bus Data.
                                                                                         Journal of the Eastern Asia Society for Transportation Studies 11 (12 2015),
                                                                                         906–921. https://doi.org/10.11175/easts.11.906
                                                                                    [23] W. Zhang, D. Yang, and H. Wang. 2019. Data-Driven Methods for Predictive
                                                                                         Maintenance of Industrial Equipment: A Survey. IEEE Systems Journal 13, 3
           Figure 7: Number of segments per cluster                                      (Sep. 2019), 2213–2227.


     Portugal, March 26, 2019 (CEUR Workshop Proceedings), Paolo Papotti (Ed.),
     Vol. 2322. CEUR-WS.org. http://ceur-ws.org/Vol-2322/DARLIAP_13.pdf
[17] Sachit Mishra, Luca Vassio, Luca Cagliero, Marco Mellia, Elena Baralis, Ric-
     cardo Loti, and Lucia Salvatori. 2020. Machine Learning Supported Next-
     Maintenance Prediction for Industrial Vehicles. In Proceedings of the Work-
     shops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, March
     30, 2020 (CEUR Workshop Proceedings), Alexandra Poulovassilis, David Auber,
     Nikos Bikakis, Panos K. Chrysanthis, George Papastefanatos, Mohamed A.
     Sharaf, Nikos Pelekis, Chiara Renso, Yannis Theodoridis, Karine Zeitouni,
     Tania Cerquitelli, Silvia Chiusano, Genoveva Vargas-Solar, Behrooz Omidvar-
     Tehrani, Katharina Morik, Jean-Michel Renders, Donatella Firmani, Letizia
     Tanca, Davide Mottin, Matteo Lissandrini, and Yannis Velegrakis (Eds.),
     Vol. 2578. CEUR-WS.org. http://ceur-ws.org/Vol-2578/DARLIAP9.pdf

</pre>