=Paper=
{{Paper
|id=Vol-2841/DARLI-AP_7
|storemode=property
|title=Profiling industrial vehicle duties using CAN bus signal segmentation and clustering
|pdfUrl=https://ceur-ws.org/Vol-2841/DARLI-AP_7.pdf
|volume=Vol-2841
|authors=Silvia Buccafusco,Andrea Megaro,Luca Cagliero,Francesco Vaccarino,Lucia Salvatori,Riccardo Loti
|dblpUrl=https://dblp.org/rec/conf/edbt/BuccafuscoMCVSL21
}}
==Profiling industrial vehicle duties using CAN bus signal segmentation and clustering==
Profiling industrial vehicle duties using CAN bus signal segmentation and clustering Silvia Buccafusco Andrea Megaro Luca Cagliero Politecnico di Torino Politecnico di Torino Politecnico di Torino Turin, Italy Turin, Italy Turin, Italy silvia.buccafusco@polito.it andrea.megaro@asp-poli.it luca.cagliero@polito.it Francesco Vaccarino Lucia Salvatori Riccardo Loti Politecnico di Torino Tierra spa Tierra spa Turin, Italy Turin, Italy Turin, Italy francesco.vaccarino@polito.it lsalvatori@topcon.com rloti@tierratelematics.com ABSTRACT learning techniques in order to support fleet managers’ decisions Industrial vehicles working in construction sites show rather (e.g., [16, 17, 22, 23]). heterogeneous usage patterns. Depending on its type, model, and To optimize vehicle usage fleet managers commonly need to context of usage, the vehicle workload may vary from light to monitor the time spent by the vehicles in specific duties. Engine heavy with variable periodicity. Duties summarize the current duties describe the current state of a vehicle and are usually state of a vehicle according to its usage level. They are usually set classified as (i) long idle, which indicates that the vehicle has up manually vehicle by vehicle according to the specifications been stationary and under a minimal workload level for a rela- of the manufacturer. To automate the definition of per-vehicle tively long period, (ii) idle, which indicates that the vehicle has duty levels, this paper explores the use of clustering techniques been stationary and under a minimal workload for a short pe- applied to CAN bus signals. It first performs a segmentation of riod, (iii) moving/working, which indicates non-stationary vehicle the CAN bus signals to identify specific working cycles. Then, it usage with light workload, (iv) light workload, which indicates clusters the segments to support the definition of vehicle-specific non-stationary vehicle usage with light workload, and (v) heavy duty levels. The preliminary results, acquired on real vehicle workload, which indicates non-stationary vehicle usage with in- usage data, show the applicability of the proposed approach. tensive workload. However, due to the high vehicle heterogeneity over models, types, and context of usage (e.g., ground type, use of vehicle equipment) duties are commonly defined manually by 1 INTRODUCTION domain experts separately for each vehicle. This is not efficient, The fleets of industrial vehicles that are commonly employed in particularly time-consuming, and prone to errors. construction sites by public and private enterprises show rather To make the process of defining per-vehicle duty levels more variable usage patterns. For example, refuse compactors, which efficient and effective, we propose to apply a clustering-based are usually employed in dumps, drive few kilometers per day approach to the acquired CAN bus signals related to a shortlist and work at light workload 24/7 for relatively long periods. Road of Suspect Parameter Numbers (SPNs). To this end, we make rollers and tandem rollers, which are frequently used in road a preliminary attempt to directly cluster the raw SPN series in maintenance, drive few kilometers per day as well, but work at order to assign approximated, pointwise per-vehicle duties. For relatively heavy workload only for short periods. Conversely, instance, Figure 1 shows three examples of SPNs (i.e., engine forklift trucks, which are employed in warehouses, drive many speed, fuel tank level, engine percent load) corresponding to kilometers per day, work most of the time at light workload, and a representative vehicle. The idle and working states defined accomplish specific tasks at heavy workload (e.g., the lift of a by setting the usage level thresholds inferred according to the heavy pallet). outcomes of the clustering algorithms are colored in green and The advent of Controller Area Network (CAN) bus technol- red, respectively. Although they discriminate between instants ogy [11] has provided fleet managers with a huge amount of data of heavy and light workloads, they do not trace the underlying useful for monitoring and analyzing vehicle usage. The CAN trends in temporal duty variations. Hence, the results is hardly bus allows communication among the electronic control unit usable by domain experts. devices on board the vehicle. It provides direct access to vari- To get more precise and stable duty state levels, we devise a ous signals describing the vehicle state. CAN bus data usually refined clustering strategy that groups fixed-length segments of consist of raw time series, which are sampled and aggregated CAN bus signals according to ad hoc descriptive features in both before being transmitted to a central repository. Data regard fuel the frequency and temporal domains. Segments are produced by consumption, vehicle movements (e.g., accelerations and drifts), applying a motif discovery algorithm on the aligned and synchro- engine conditions (e.g., RPM, oil and coolant temperature), route nized version of the raw SPN series. Figure 2 shows the output characteristics (e.g., slope), and alarms. Domain experts can thus of the refined process, which exhibits the newly defined duties. monitor the vehicle state by acquiring, collecting, and analyzing The new states appear to be less susceptible to temporary usage vehicle-specific CAN bus data through data mining and machine level variations thus becoming usable for profiling vehicle usage. The results were validated on real vehicle usage data acquired © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- by a multinational company providing telematics services. The ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus) validation phase included qualitative and quantitatively analyses. on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) The latter relied on both established clustering validity indices [2] and on a comparison between the assigned duty states and the expected output according to the National Marine Electronics Table 1: Analyzed SPNs. Association (NMEA) 0183 messages data [3]. SPN code (SAE J1939) Description 81 Engine diesel particulate filter inlet pressure 90 Power takeoff oil temperature 94 Engine fuel delivery pressure 110 Engine coolant temperature 114 Net battery current 123 Clutch pressure 164 Engine injection control pressure 182 Engine trip fuel 183 Engine fuel rate 190 Engine speed 524 Transmission selected gear 975 Estimated percent fan speed 1638 Hydraulic temperature Custom Number Engine percent load Custom Number Front plow switch Figure 1: SPNs colored according to the rough duty levels Custom Number Rear hitch position (green=idle, red=working). Point-wise clustering Custom Number Charge pressure Custom Number Amount of particulate matter C method Custom number Digging depth Custom number Fuel Tank Level engineering strategy to extract undisclosed information about CAN bus configuration. Similar to the present work, the afore- said research study aims at analyzing vehicle usage via CAN bus signal analysis. However, the research objective is substantially different. 3 DATA OVERVIEW Figure 2: SPNs colored according to the refined duty levels Data were acquired from an experimental CAN bus data logger, (gray=idle, green=moving, red=working). Segment-wise which was installed on a test farm tractor working in a construc- clustering tion site. Data were provided by Tierra S.p.A, a multinational company operating in the IoT sector and internationally recog- The rest of the paper is organized as follows. Section 2 overviews nized for providing to their customers sophisticated and reliable the related literature. Section 3 describes the analyzed data. Sec- telematics solutions for management, maintenance, and remote tions 4 and 5 present the data preparation and mining phases, diagnostics of equipment. respectively. Section 6 summarizes the empirical results, whereas The test vehicle is equipped with a large amount of sensors, Section 7 draws conclusions and discusses the future develop- which capture CAN parametric messages at a high frequency ments of this research. (up to 100 Hz). Messages were gathered and temporally stored on an SD card which manages data transmission to the cloud 2 RELATED WORK infrastructures. Then, raw data were decoded and transformed Clustering techniques have already been applied to analyze CAN using the SAE J1939 protocol (https://www.sae.org/), which is Bus data acquired from vehicles. Examples of applications include, established for heavy-duty vehicle manufacturers and provides a amongst other, (i) the optimization of vehicle routes (e.g., [5]), shared set of standard messages and conversion rules. (ii) the identification of driver intentions based on trajectory Customers of the telematics service provider can visualize and analysis (e.g., [21]), (iii) the characterization of drivers’ behavior process in real time the converted data. Among the available (e.g., [7, 15]), (iv) the management of single vehicles and vehicles’ vehicle usage indicators, the times spent by the vehicle in each fleets (e.g., [9]). The present work belongs to the latter category. duty (i.e., long idle, idle, moving/working, heavy workload) are To the best of our knowledge, this is the first attempt to automate among the mostly commonly used to optimize vehicle mainte- the process of assigning per-vehicle duty levels based on CAN nance, production, business, and investments [20]. Unfortunately, bus signal clustering. the threshold levels used to define the vehicle states are not stan- In [9] the authors focused on explaining clusters mined from dardized since they depend on the particular vehicle model, type, multivariate time series data over different time scales and gran- and context of usage. Hence, typically, their setup is manually ularity. As application scenario, they analyzed the usage profile performed by domain experts. This prompts the need for data- of vehicles travelling across urban areas with the aim at planning driven approaches to automatically inferring the most suitable and supporting maintenance operations. To this purpose, they duty levels separately for each industrial vehicle. used a Gaussian mixture model to identify clusters on top of The acquired dataset consists of the SPN series acquired from a subset of features extracted from the raw series according to the test vehicle from November 7, 2019 to April 15, 2020. Each a sliding window strategy. Rather than extracting aggregated observation is described by SPN name, acquisition timestamp, and statistics for all series over some time window and then iden- measurement. The dataset collects 20 different SPNs describing tify clusters as indicators of more abstract states, our approach the state of the vehicle engine such as engine speed, percent load, aims at partitioning the series to detect specific vehicle duties. fuel rate, coolant temperature, fuel delivery pressure (see the full In a nutshell, we cluster segments of SPN series instead of con- list in Table 1). A more thorough SPN description can be found cise series representation. In [10] the authors proposed a reverse at www.sae.org/standards. Figure 3: Extracts of the SPN series of engine speed and engine coolant temperature. Figure 4: Pearson correlation between SPN pairs. The acquired SPN series are highly heterogeneous, not syn- chronized with each other, and partly noisy. For example, Figure 3 shows two extracts of SPN series, i.e., the engine speed and the Next, to reduce the potential bias due to the contemporary engine coolant temperature series. The former has a sampling presence of correlated SPNs describing related components of the rate of 50 Hz, whereas the latter 1 Hz. Furthermore, the series same physical system, we perform also a preliminary correlation periodicity has different granularity over the working cycles. analysis of the SPN series. Figure 4 shows the Pearson’s correlation. It clearly indicates 4 DATA PREPARATION the presence of a group of highly correlated SPNs describing the To prepare the raw CAN bus data to the subsequent analyses we status of the vehicle engine namely engine speed, engine percent apply the following steps. load, engine fuel rate and fuel tank level. Hereafter, we will focus our analyses on a group representative, i.e., the engine speed. Data cleaning. To avoid introducing a bias in the clustering Finally, we analyze the spectral content of the SPN signals. process, we removed missing values (due, for instance, to failures Signals characterized by slow variations are disregarded in the in data acquisition and transmission) and properly managed the following analyses since they incorporate most of their informa- presence of noise, decoding errors, and inconsistencies in the tion in correspondence of frequencies close to 0 and their spectral SPNs series values. Specifically, for each pair timestamp and content can be approximated by the temporal average value of SPN we computed an average value to bound data points to the signal. Notice that slow signal variations can be due to either the feasible and operative range of the corresponding measured the intrinsic nature of the considered measure (e.g., for the SPN physical quantity. related to the transmission selected gear) or to the limited sensi- Working cycle identification. CAN messages are transmitted tivity of measurement instrument (e.g., for the SPNs related to only when a vehicle is on. To analyze vehicle duties it can be charge pressure and engine fuel delivery pressure). useful to understand whether a vehicle has been turned off at the end of a working cycle or for any other reason. For this reason, 5 PROFILE VEHICLE USAGE at the vehicle restart we analyze the value of the engine coolant To identify vehicle duties we analyze vehicle usage data by means temperature as it indicates, to a good approximation, when a of clustering techniques. Specifically, the SPN series are first vehicle has been turned off for a sufficiently long time. synchronized and segmented into fixed-length intervals. Each Series alignment and synchronization. CAN bus messages are segment is described by specific features. Then, segments are clus- asynchronously transmitted at variable rates over the network. tered into homogeneous groups. The clustering outcomes allow For example, SPNs such as engine speed, engine percent load and domain experts to empirically set up per-duty levels associated charge pressure are transmitted quite frequently (sampling rate with each SPN. between 20 Hz and 50 Hz), whereas engine coolant temperature Time series segmentation. Time series segmentation entails and engine delivery fuel pressure are sent less frequently (rate defining a partition of the input series 𝑋 (𝑡) into into 𝑘 segments between 1 Hz to 2 Hz). Hence, to enable SPN series clustering we 𝑆 1 , 𝑆 2 , . . ., 𝑆𝑘 , each one characterized by a distinct time span re-align and synchronize all the analyzed SPN series. To this aim, [𝑡𝑠𝑡𝑎𝑟𝑡 ,𝑡𝑒𝑛𝑑 ]. Since vehicle usage is described by multiple SPN CAN bus signals are first linearly interpolated in the temporal series, the segmentation problem is extended to a multivariate domain and then down-sampled in frequency domain to the least model, i.e., given the time series 𝑋 1 (𝑡), 𝑋 2 (𝑡), . . ., 𝑋𝑛 (𝑡) corre- average sampling rate to align multiple signals (using standard sponding to SPNs 𝑆𝑃𝑁 1 , 𝑆𝑃𝑁 2 , . . ., 𝑆𝑃𝑁𝑛 , respectively, we parti- anti-aliasing filters and down-sampling operators). tion them series into 𝑘 segments, where the same partition holds SPNs selection. We select the subset of SPNs that most likely for all the considered series. To deal with correlated series, the influence the vehicle duty. To this purpose, we firstly filter out input series can be preprocessing using Principal Component all the SPNs providing less relevant information. For example, Analysis [1] with the aim at collapsing the underlying SPN sub- according to the manufacturers’ specification the engine trip fuel trends that are highly correlated with each other into a separate was deemed as irrelevant to our purposes and thus discarded. component. For the sake of simplicity, we address time series segmentation duties can be easily inferred). The number of desired clusters is using an established motif discovery algorithm [13]. Motifs are an input parameter, which can be specified by the domain experts. recurring sub-series within a reference time series. The algorithm We set up this parameter in an empirical way by assessing the first splits each of the original time series into fixed-length seg- clustering results according to established cluster validity indices. ments and then compares pairs of segments to select the top most Specifically, to choose the best algorithm and the number of de- similar pairs. The segment length can vary within a range [𝐿𝑚𝑖𝑛 , sired clusters we empirically assessed the performance achieved 𝐿𝑚𝑎𝑥 ]. Both the segment length range and the distance measure by multiple runs of different clustering algorithms by varying used to generate the motif are configurable by domain experts. the number of desired clusters 𝑘 (see Section 6). In our experiments, we varied the segment length between 2 minutes and 10 minutes and evaluated the similarity between Duty level identification. On top of the cluster outcomes the segments via Euclidean distance. levels associated with each vehicle duty can be identified. To this To empirically identify the most appropriate segment length, aim, the SPN segments associated with the same cluster are fur- we discretized the segment length range into 1-minute bins, ther split into sub-groups characterizing similar usage patterns. counted the number of motifs per length, and selected the length For example, sub-groups allow us to distinguish between vehicles maximizing that count. The lower bound of the segment length in an idle state and vehicles that keep moving steadily. range (2 minutes) turned out to be the most appropriate time scale. The corresponding 2405 segments will be hereafter considered 6 EXPERIMENTAL RESULTS in the reported analyses. We carried out an empirical analysis of the proposed methodol- ogy on the real vehicle usage data provided by Tierra SpA. The Per-segment feature extraction. For each segment we extract experiments were run on an Intel(R) Core(TM) i5-8250U machine a subset of features that describe time series shape and values’ equipped with 8 GB of RAM and running Windows 10 64-bit. distribution. since the aim is to characterize the general shape The summary of the experimental results is organized as fol- of each segment in terms of its variations and their correspond- lows. Firstly, we compare the performance of different clustering ing rapidity and amplitudes, SPNs are analyzed in the frequency techniques according to the Silhouette validity index [2] and dis- domain. To this purpose, for each SPN and segment the Fourier cuss the impact of the number of desired clusters on clustering transform of the signal is applied by considering only the positive performance (see Section 6.1). Secondly, we quantitatively eval- coefficients, as we exploit the symmetry of real signals Fourier uate the quality of the clustering outcome against the National coefficients. Then, separately for low, medium, and high frequen- Marine Electronics Association (NMEA) 0183 messages data [3] cies, the signal power value, the signal peaks, and the signal (see Section 6.2). Finally, we report a qualitative analysis of the peaks frequencies are computed. Lastly, the signal mean (in the achieved results (see Section 6.3). time domain) is considered as well. 6.1 Comparison between different clustering techniques We tested various purely partitional clustering algorithms belong- ing to the following categories: (i) centroid-based, (ii) density- based, (iii) hierarchical, and (iv) shape-based. We considered two of the most renowned algorithms belonging to the centroid-based category (K-Means [14], which exploits the concept of cluster centroid, and Clara [12], based on medoids1 ) Density-based clus- tering group together samples located in dense regions and well separated from other regions. We exploited a Python implemen- tation of the well-known DBScan algorithm [6]. Hierarchical clustering produces nested clusters, which can be organized in a dendrogram. We exploited an implementation of an agglomera- tive algorithm [4]. Finally, shape-based clustering is tailored to time series clustering. We considered the K-Shape algorithm [18], which relies on series cross-correlation analyses. Notice that the latter algorithm is designed for univariate time series analysis. Figure 5: Pearson correlation between pairs of per- It captures the similarities between sub-series independently of segment features. the shift of the distinctive segments’ properties. We evaluated clustering performance according to the Silhou- Figure 5 shows the Pearson’s correlation between the pairs ette score, which is an established validity index used to measure of extracted features. According to the correlation values, the of how similar a sample is to its own cluster compared to the feature set is reduced to 3: (i) the power in low frequencies sub- other clusters [2]. The score ranges from -1 (high separation) to band, (ii) the peak value for high frequencies, and (iii) the peak 1 (high cohesion), i.e., the larger the better. For each algorithm frequency in the high frequencies sub-band. we varied the configuration settings to find the best setting. Clustering. Clustering aims at grouping data samples that are For all the tested algorithms we achieved the best results by similar to one another and dissimilar from those assigned to setting the number 𝑘 of desired clusters to 2. K-Means achieved other groups. In this particular context, clustering algorithms are exploited to group the SPN segments into homogeneous groups 1 Clara is an extension of the k-Medoid algorithm, which is able to scale towards representing typical vehicle duties (or vehicle states for which larger and more complex datasets. the best overall performance (0.67), followed by the hierarchical The preliminary results leave room for further improvements. clustering (0.53), Clara (0.5), DBScan (0.49) and K-Shape (0.23). First of all, the acquisition of CAN bus data from many test ve- hicles would allow us to extend the problem of vehicle duty 6.2 Evaluation based on the National Marine identification from a single vehicle to groups of similar vehicles. Secondly, a deeper analysis of the contextual information related Electronics Association messages to working site and the vehicle equipment would be useful for We assessed the quality of the clustering outcomes using, as further improving the accuracy of the duty level assignments ground truth, the numerical score provided by the National Ma- for effectively identifying the driving styles. In addition, as soon rine Electronics Association (NMEA) 0183 messages data [3]. as new historical data become available, the framework will be NMEA 0183 is a one-way serial data communication protocol updated basing on results obtained on similar tasks and con- used to send messages from the vehicle to external devices. Unlike struction site conditions in the past. Finally, the assigned duties CAN bus data, it provides fairly accurate GPS-related information will be exploited to accomplish specific tasks, such predictive such as the vehicle coordinates (latitude and longitude) and the maintenance and anomaly detection[19]. vehicle speed. Conversely, GPS positions transmitted via CAN bus messages are frequently characterized by relatively high 8 ACKNOWLEDGMENTS measurement error [8]. Despite the accuracy of NMEA position The research leading to these results has been funded by the was not guaranteed overall, we made a preliminary attempt to SmartData@PoliTO center for Big Data and Machine Learning validate the ability of the proposed method to discriminate be- technologies and by Tierra Spa. tween idle states and moving/working ones by comparing the duty labels assigned via clustering against the assignment made based on NMEA message (used as ground truth). REFERENCES [1] J. Abonyi, B. Feil, S. Németh, and P. Arva. 2004. Principal Component Analysis We achieved a 82.37% accuracy score, i.e., we correctly clas- based Time Series Segmentation: A New Sensor Fusion Algorithm. sified approximately 8 duties out of 10. The average recall and [2] Olatz Arbelaitz, Ibai Gurrutxaga, Javier Muguerza, Jesus M. Perez, and Inigo precision scores were 82.35% and 82.39%, respectively. The re- Perona. 2013. An extensive comparative study of cluster validity indices. Pattern Recognition 46, 1 (2013), 243 – 256. https://doi.org/10.1016/j.patcog. sults were quite promising, provided that the segments used for 2012.07.021 validation purposes were fairly balanced (50.81% of idle segments, [3] National Marine Electronics Association. [n.d.]. NMEA 0183. Retrieved No- vember 11, 2020 from https://www.nmea.org/content/STANDARDS/NMEA_ 49.19% of moving/working ones). 0183_Standard A deeper analysis of the wrongly labeled segments has shown [4] Maria-Florina Balcan, Yingyu Liang, and Pramod Gupta. 2014. Robust Hierar- that, in few cases, there were rapid and multiple changes in the chical Clustering. J. Mach. Learn. Res. 15, 1 (Jan. 2014), 3831–3871. [5] Sahar Ebadinezhad, Ziya Dereboylu, and Enver Ever. 2019. Clustering-Based engine speed associated with an idle state. This was probably due Modified Ant Colony Optimizer for Internet of Vehicles (CACOIOV). Sustain- to small errors in GPS readings. Therefore, these particular errors ability 11, 9 (May 2019), 2624. https://doi.org/10.3390/su11092624 seem to be not due to imprecise vehicle duty level assignments. [6] Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96). AAAI Press, 226–231. 6.3 Qualitative evaluation [7] Umberto Fugiglando, Paolo Santi, Sebastiano Milardo, Kacem Abida, and Carlo Figure 6 plots two representative segments belonging to three Ratti. 2017. Characterizing the "Driver DNA" Through CAN Bus Data Analysis. In Proceedings of the 2nd ACM International Workshop on Smart, Autonomous, different clusters. With the help of domain experts, we figured and Connected Vehicular Systems and Services (CarSys ’17). Association for out the underlying vehicle usage patterns. Specifically, cluster 1 Computing Machinery, New York, NY, USA, 37–41. https://doi.org/10.1145/ shows a working or heavy workload state and is characterized 3131944.3133939 [8] Yong Heo, Thomas Yan, Samsung Lim, and Chris Rizos. 2009. International by highly variable segments. It indicates an aggressive driving standard GNSS real-time data formats and protocols. style, with rapid accelerations and breaks. Cluster 2 shows a [9] Anders Holst, Juhee Bae, Alexander Karlsson, and Mohamed-Rafik Bouguelia. 2019. Interactive Clustering for Exploring Multiple Data Streams at Different more stationary vehicle usage. The usage levels are compatible Time Scales and Granularity. In Proceedings of the Workshop on Interactive with either a stationary vehicle under medium workload or with Data Mining (WIDM’19). Association for Computing Machinery, New York, a non-stationary vehicle. Finally, cluster 3 contains slow vary- NY, USA, Article 2, 7 pages. https://doi.org/10.1145/3304079.3310286 [10] Thomas Huybrechts, Yon Vanommeslaeghe, Dries Blontrock, Gregory ing segments in which the engine speed is oscillating around Van Barel, and Peter Hellinckx. 2018. Automatic Reverse Engineering of minimum levels of use. It likely denotes an idle duty. CAN Bus Data Using Machine Learning Techniques. In Advances on P2P, Figure 7 shows the number of segments per cluster (corre- Parallel, Grid, Cloud and Internet Computing, Fatos Xhafa, Santi Caballé, and Leonard Barolli (Eds.). Springer International Publishing, Cham, 751–761. sponding to the previous experiment). According to domain ex- [11] Karl Henrik Johansson, Martin Törngren, and Lars Nielsen. 2005. Vehicle perts’ opinion, the distribution is coherent with what expected, Applications of Controller Area Network. Birkhäuser Boston, Boston, MA, 741–765. https://doi.org/10.1007/0-8176-4404-0_32 since it corresponds to the actual usage of the test vehicle. [12] Leonard Kaufman and Peter J. Rousseeuw. 1987. Clustering by means of medoids. , 405–416 pages. [13] Michele Linardi, Yan Zhu, Themis Palpanas, and Eamonn Keogh. 2018. Ma- 7 CONCLUSIONS AND FUTURE WORKS trix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in The paper explores the use of clustering techniques to profile Data Series. SIGMOD ’18: Proceedings of the 2018 International Conference on Management of Data, 1053–1066. https://doi.org/10.1145/3183713.3183744 industrial vehicle usage in construction sites. The aim is to de- [14] J. MacQueen. 1967. Some Methods for Classification and Analysis of Multi- fine per-vehicle duties, which summarize the current state of variate Observations. In Proceedings of the 5th Berkeley Symposium on Mathe- matical Statistics and Probability - Vol. 1, L. M. Le Cam and J. Neyman (Eds.). the vehicle (e.g., idle, moving, heavy workload). Due to the high University of California Press, Berkeley, CA, USA, 281–297. heterogeneity of vehicles types, models, and usage contexts, du- [15] C. Marina Martinez, M. Heucke, F. Wang, B. Gao, and D. Cao. 2018. Driving ties are commonly detected by exploiting manually configured Style Recognition for Intelligent Vehicle Control and Advanced Driver Assis- tance: A Survey. IEEE Transactions on Intelligent Transportation Systems 19, 3 threshold. To automate this process, we propose a data-driven (2018), 666–676. https://doi.org/10.1109/TITS.2017.2706978 approach relying on CAN bus signals segmentation and cluster- [16] Dena Markudova, Elena Baralis, Luca Cagliero, Marco Mellia, Luca Vassio, ing. The clustering output achieved on real vehicle usage data Elvio Gilberto Amparore, Riccardo Loti, and Lucia Salvatori. 2019. Heteroge- neous Industrial Vehicle Usage Predictions: A Real Case. In Proceedings of the was validated with the help of domain experts. Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT 2019, Lisbon, Figure 6: Representative segments for each cluster [18] John Paparrizos and Luis Gravano. 2015. k-shape: Efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 1855–1870. [19] Stefano Proto, Evelina Di Corso, Daniele Apiletti, Luca Cagliero, Tania Cerquitelli, Giovanni Malnati, and Davide Mazzucchi. 2020. REDTag: A Pre- dictive Maintenance Framework for Parcel Delivery Services. IEEE Access 8 (2020), 14953–14964. https://doi.org/10.1109/ACCESS.2020.2966568 [20] Lee A Schmidt and Lorenz Riegger. 2012. Automatic Detection of Machine Status for Fleet Management. US Patent App. 13/341,500. [21] D. Yi, J. Su, C. Liu, and W. Chen. 2019. Trajectory Clustering Aided Personalized Driver Intention Prediction for Intelligent Vehicles. IEEE Transactions on Industrial Informatics 15, 6 (2019), 3693–3702. https://doi.org/10.1109/TII.2018. 2890141 [22] Weiliang Zeng, Tomio Miwa, Wakita, and Takayuki Morikawa. 2015. Exploring Trip Fuel Consumption by Machine Learning from GPS and CAN Bus Data. Journal of the Eastern Asia Society for Transportation Studies 11 (12 2015), 906–921. https://doi.org/10.11175/easts.11.906 [23] W. Zhang, D. Yang, and H. Wang. 2019. Data-Driven Methods for Predictive Maintenance of Industrial Equipment: A Survey. IEEE Systems Journal 13, 3 Figure 7: Number of segments per cluster (Sep. 2019), 2213–2227. Portugal, March 26, 2019 (CEUR Workshop Proceedings), Paolo Papotti (Ed.), Vol. 2322. CEUR-WS.org. http://ceur-ws.org/Vol-2322/DARLIAP_13.pdf [17] Sachit Mishra, Luca Vassio, Luca Cagliero, Marco Mellia, Elena Baralis, Ric- cardo Loti, and Lucia Salvatori. 2020. Machine Learning Supported Next- Maintenance Prediction for Industrial Vehicles. In Proceedings of the Work- shops of the EDBT/ICDT 2020 Joint Conference, Copenhagen, Denmark, March 30, 2020 (CEUR Workshop Proceedings), Alexandra Poulovassilis, David Auber, Nikos Bikakis, Panos K. Chrysanthis, George Papastefanatos, Mohamed A. Sharaf, Nikos Pelekis, Chiara Renso, Yannis Theodoridis, Karine Zeitouni, Tania Cerquitelli, Silvia Chiusano, Genoveva Vargas-Solar, Behrooz Omidvar- Tehrani, Katharina Morik, Jean-Michel Renders, Donatella Firmani, Letizia Tanca, Davide Mottin, Matteo Lissandrini, and Yannis Velegrakis (Eds.), Vol. 2578. CEUR-WS.org. http://ceur-ws.org/Vol-2578/DARLIAP9.pdf