=Paper= {{Paper |id=Vol-2041/paper2 |storemode=property |title=Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from Shipping |pdfUrl=https://ceur-ws.org/Vol-2041/paper2.pdf |volume=Vol-2041 |authors=Lokukaluge Prasad Perera }} ==Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from Shipping== https://ceur-ws.org/Vol-2041/paper2.pdf
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from Shipping

                                           Lokukaluge P. Perera1[0000-0002-1608-7804]
                                             1 SINTEF Ocean, Trondheim, Norway

                                               prasad.perera@sintef.no



       Abstract. A novel mathematical framework to support industrial digitization of shipping is presented in this study.
       The framework supports a data flow path, i.e. from Industrial IoT (i.e. with Big Data) to Predictive Analytics, where
       digital models with advanced data analytics are introduced. The digital models are derived from ship performance
       and navigation data sets and a combination of such models facilitates towards the proposed Predictive Analytics.
       Since the respective data sets are used to derive the Predictive Analytics, this mathematical framework is also
       categorized as a reverse engineering approach. Furthermore, a data anomaly detection and recover procedure that
       is associated with the same framework to improve the respective data quality are also described in this study.

       Keywords: Industrial IoT, Big Data, Advanced Analytics, Predictive Analytics, Shipping, Maritime.


1      Industrial Digitalization

1.1    Big Data Challenges




                                           Fig. 1. Industrial digitalization in shipping

Future vessels will be supported by ocean IoT, i.e. onboard sensors, data acquisition (DAQ) systems, satellites and
other communication systems, to collect ship performance and navigation information and that can also be a part of
industrial digitalization (see Figure 1). Such systems collect large-scale data sets, so called "Big Data", that should be
analyzed to evaluate vessel performance under various operation and navigation conditions [1]. The outcome of such
analyses creates smart decisions and that should be applied towards vessel navigation and ship system operations.
However, there are several layers in-between big data (i.e. ship performance and navigation data) and smart decisions
as presented in Figure 1. The ship performance and navigation data should be transferred to the data management
layer for handling and storage, then into the data analytics layer for analyses. However, the respective big data
challenges create a considerable gap between the data management and analytics layers. Such challenges are
categorized as data volume (i.e. scale of data sets), velocity (i.e. speed of data processing), variety (i.e. various forms
of data sets) and veracity (i.e. uncertainty of data sets). The conventional data handling approaches are often failed to
address these challenges, therefore this study proposes advanced data analytics to overcome the same. It is believed
that the proposed solutions are somewhat domain specific, therefore the respective domain knowledge in vessels and
ship systems, i.e. navigation and operation conditions, should also be incorporated in such approaches.


1.2    Advanced Data Analytics

Advanced data analytics will play an important role in industrial digitalization and that can extract useful information
from the respective ship performance and navigation data. The outcome of useful information creates advanced
knowledge in shipping and that will enhance the domain knowledge, further (see Figure 1). This knowledge will lead
towards industrial intelligence resulted in smart decisions in vessel navigation and ship system operations. The energy



Copyright held by the author(s). NOBIDS 2017
efficiency and emission control rules and regulations in shipping [2] often influence these decisions. Therefore,
industrial digitalization eventually supports towards energy efficient, low emission and highly reliable smart shipping
fleets.


2      Mathematical Framework

2.1    Ship Performance and Navigation Data




                                                 Fig. 2. Ship engine data

A novel mathematical framework to support industrial digitization of shipping is presented in this study. The
framework consists of a data flow path, where digital models and advanced data analytics are introduced. Several
statistical analyses, machine learning (ML) and artificial intelligence (AI) algorithms are used in this framework [3].
Ship performance and navigation data from a selected vessel are considered to develop this mathematical framework.
The vessel is a bulk carrier with ship length: 225 (m) and beam: 32.29 (m) and the respective particulars are presented
in [4]. Two parameters, i.e. engine power and engine speeds, of this vessel are presented in Figure 2. A combined
kernel density estimation (multivariate KDE) of the same parameters is in the middle plot and the respective univariate
KDEs are presented in the top and right plots of the same figure. One should note that the white dots represent the
parameter values and the contours represent the density of the data distribution (i.e. multivariate KDE). This plot
shows that the engine data are clustered around three Gaussian type distributions and that are denoted as data cluster
1, 2 and 3. These data clusters represent the respective engine modes of this vessel. Therefore, it is concluded that ship
performance and navigation data sets are often clustered in a high dimensional space (i.e. with the respective
parameters) and those clusters relate to vessel navigation and ship system operational conditions. The same introduce
the discreteness (i.e. digital-ness) into the proposed mathematical framework, where the respective digital models are
introduced. The distribution of such data clusters made the foundation of digital models and an example such models
is presented in Figure 3.
2.2    Digital Models




                                                     Fig. 3. Digital models

Digital models can be derived purely from the ship performance and navigation data sets. The initial digital models
should derive by using a relatively cleaner data set (i.e. less data anomalies) and the same models should be used to
improve the quality of the entire data set at a later stage. The quality improved data set should again be used to improve
the same digital models. Figure 3 represents a three-dimensional vector space with the right-hand coordinate system
of X 1 X 2 X 3 . One should note that X1, X 2 and X 3 represent the parameters of the respective data set. This model
consists of three data clusters as denoted in the figure with the respective mean vectors of      1 ,  2 and 3 . One
should note that ship performance and navigation data sets consist of such clusters due to the vessel operational and
navigation conditions (i.e. engine modes, trim and draft conditions), as mentioned before. Therefore, the proposed
digital models are well suitable to represent such data sets. The first step in developing digital models is to identify
the respective data clusters (i.e. how these data clusters are distributed in a high dimensional space). Each data cluster
consists of local navigation and operational information of the vessel and ship systems. Such local operational and
navigation conditions should be investigated at the second step, where the respective structure of each data cluster
(i.e. the relative correlation among ship performance and navigation parameters) should be identified. The structure
of each data cluster is denoted by several vectors, therefore the i-th data cluster is represented as Z i ,1 , Z i , 2 and Z i ,3
vectors (see Figure 3). That creates a vector structure (i.e. a data structure) to represent the respective parameter
correlations in each data cluster and then into the whole data set. These vectors can also be categorized as singular
vectors (i.e. associated with the respective singular values) for each ship performance and navigation data cluster.

        Singular values quantify to the importance of each singular vector that holds the information on the respective
parameter relationships. Hence, the respective singular vector structure represents to local operational and navigation
conditions. That also represent an approximate linear model for each local mode of navigation and operational
conditions of the vessel and ship systems. One should note that the singular values and singular vectors can be used
to reduce the number of ship performance and navigation parameters by removing the least important singular vectors
(i.e. a model reduction technique). Such reduction in ship performance and navigation parameters can play an
important role in the data sets collected onboard, since the cost data transfer can be reduced by introducing a less
number of parameters. One should note that the model reduction can also be a part of the proposed digital models.
Digital models can also be classified as the linearization (i.e. piecewise linearization) of ship performance and
navigation conditions around various navigational and operational points. Therefore, ship navigation and operation
situations jump from one data cluster, i.e. system state, to another under a high dimensional space (see Figure 3).
Digital models can also be categorized as linear models that represent local navigation and operational conditions of
the vessel and ship systems. These linear models can be combined with appropriate observes to develop a highly
nonlinear ship performance and navigation models and that is categorized as predictive analytics in this study and that
is further discussed in section 3. One should note that the data modeling issues (i.e. data variety) can be addressed by
predictive analytics, where various forms of data can be combined into a single model. Furthermore, the structural
changes in digital models under various environmental conditions and time-horizons can relate to vessel performance.
The understanding of such data structural changes in digital models can further improves the predictive analytics.
Since the singular values and vectors are derived from ship performance and navigation data sets, this is a reverse
engineering approach that captures vessel and ship system behavior for the data sets.


3      Advanced Data Analytics

3.1    Data Anomalies




                                           Fig. 4. Advanced analytics in shipping

The digital models support the proposed data analytics and presented in Figure 4. Five important data analytics types
are proposed under this study to extract the useful information from ship performance and navigation data sets:
descriptive analytics, diagnostic analytics, predictive analytics, visual analytics and decision analytics (see Figure 4).
One should note that the data quality issues (i.e. data veracity) can be addressed by descriptive and diagnostic analytics,
where the respective data anomalies are identified, categorized and recovered under the propose mathematical
framework. In general, the descriptive analytics identifies various erroneous conditions and then the diagnostic
analytics recovers/removes such conditions from ship performance and navigation data. Therefore, such data analytics
creates high quality data sets and that should be further analyzed to extract useful information on vessel navigation
and ship system operation behavior. Then, the predictive analytics can be used to forecast and the visual analytics can
be used to visualize the information on the same vessel and ship system behavior. Such information creates advanced
knowledge in vessel navigation and ship system operation conditions and that will lead to industrial intelligence. Both
advanced knowledge and industrial intelligence will support the respective decision analytics. Decision analytics
should consist of appropriate key performance indicators (i.e. KPIs) to evaluate the respective navigation and
operational actions for various vessels and that can often relate to energy efficiency and emission control rules and
regulations.

          The digital models interact with these data analytics to improve the data quality and that process is presented
in Figure 5. That is denoted as a data anomaly detection and recovery procedure. One should note that this procedure
consists of complex interactions among digital models and data analytics, i.e. descriptive and diagnostic analytics and
that consists of several steps. Firstly, the raw data send through a data anomaly detection filter 1, where missing data
points and preliminary data anomalies are detected and separated. Each parameter will have possible minimum and
maximum values and the values beyond that range detect as preliminary data anomalies. The data points with
preliminary data anomalies send to a separate group, where the data anomalies against known and unknown sensor
and DAQ faults and system abnormalities can be compared [5, 6]. Furthermore, the new information on such data
anomalies can also be stored in the respective database. That is a knowledge-base that should develop, further to detect
complex sensor and DAQ faults and system abnormalities. One should note these system abnormal events can relate
either to vessel navigation or ship system operation conditions, therefore the domain knowledge in shipping can play
an important role in this knowledge-base. The data anomalies separated from filter 1 will be directed to data anomaly
group 1. The remaining data send through the digital models, where additional data anomalies can be detected (i.e. by
data anomaly filter 2). The respective outliers of digital models are detected as data anomalies from this filter. The
remaining data send to the cleaned data group and that can be considered as the data that have the highest data quality.
However, the data that have secondary data anomalies will send to a separate group, i.e. data anomaly group 2, where
the respective data anomalies against known and unknown sensor and DAQ faults and system abnormalities can be
compared, similarly. Then, the data sets from anomaly group 1 and 2 will be transferred through the data recovery
filter and then digital models. A considerable amount of data anomalies can be recovered by this step. The digital
models, i.e. data structure and accurate ship performance and navigation parameters, are used to estimate unknown or
erroneous parameters values in this procedure. Therefore, the recovered data can be send to the cleaned data group
and then the whole data set will be used under predictive analytics.




                                  Fig. 5. Data anomaly detection and recovery procedure

3.2    Predictive Analytics




                                       Fig. 6. Development of predictive analytics

The digital models (i.e. linear models) interact with predictive analytics to forecast vessel navigation and ship system
operation behavior [7] and that process is presented in Figure 6. These models of the vessel and ship systems connect
with several observers and that create the respective predictive analytics. One should note that each digital model
represents a local linear model of vessel navigation and ship system operation behavior. Finally, that crates a global
nonlinear model to represent vessel and ship system behavior. The inputs to the predictive analytics are the navigation
and operation control inputs of the vessel and ship systems and the external and environmental conditions. The external
and environmental conditions can be observed as big data sets under ocean IoT. That are also the inputs to the vessel
and ship systems. The outputs of the vessel and ship systems are their actual behavior and that can also be observed
as big data sets under ocean IoT.

         The outputs of the predictive analytics are predicted vessel and ship system behavior. That behavior is
converted to vessel navigation and ship system operation information by visual analytics. Visual analytics displays
various ship performance and navigation parameter relationships [8], where the information on optimal vessel and
ship system performance can be extracted. The same information creates advanced knowledge in vessel navigation
and ship system operations and facilitates towards industrial intelligence in shipping (see Figure 1), as mentioned
before. Hence, advanced knowledge in vessel navigation and ship system operations can be used to develop
appropriate decision analytics with key performance indicators (KPIs). Finally, the outcome of the decisions analytics
influences the navigation and operation control inputs of the vessel and ship systems.


4      Conclusions

A novel mathematical framework to support industrial digitization of shipping is presented with a data flow path, i.e.
from Industrial IoT to Predictive Analytics. The proposed data analytics can self-learn (i.e. the data structure can learn
itself), self-clean (i.e. data anomalies can be detected, isolated and recovered by considering the outliers of the data
structure), self-compress and expend (i.e. the respective parameters in the data sets can be reduced and expanded by
considering the same data structure) and self-visualize (i.e. the respective data structures can be used for both vessel
and ship system performance observations). Since this framework is developed from ship performance and navigation
data sets, this process can also be a reverse engineering approach of vessel and ship systems Furthermore, that
introduces intelligent analytics to the shipping industry and also provides important solutions to the big data challenges
in industrial digitalization.


References
 1. L. P. Perera and B. Mo, ”Data Analysis on Marine Engine Operating Regions in relation to Ship Navigation,” Journal of Ocean
    Engineering, vol. 128, 2016, pp. 163-172.
 2. L. P. Perera and B. Mo, ”Emission Control based Energy Efficiency Measures in Ship Operations,” Journal of Applied Ocean
    Research, vol. 60, 2016, pp. 29-46.
 3. L. P. Perera and B. Mo, "Machine Learning based Data Handling Framework for Ship Energy Efficiency," IEEE Transactions
    on Vehicular Technology, vol. 66, no. 10, 2017, pp. 8659-8666.
 4. L. P. Perera and B. Mo, ”Marine Engine Operating Regions under Principal Component Analysis to evaluate Ship Performance
    and Navigation Behavior.,” In Proceedings of the 8th IFAC Conference on Control Applications in Marine Systems (CAMS
    2016), Trondheim, Norway, September, 2016, pp. 512-517.
 5. L. P. Perera, ”Statistical Filter based Sensor and DAQ Fault Detection for Onboard Ship Performance and Navigation
    Monitoring Systems,” In Proceedings of the 8th IFAC Conference on Control Applications in Marine Systems (CAMS 2016),
    Trondheim, Norway, September 2016, pp. 323-328.
 6. L. P. Perera, "Marine Engine Centered Localized Models for Sensor Fault Detection under Ship Performance Monitoring," In
    Proceedings of the 3rd IFAC Workshop on Advanced Maintenance Engineering, Service and Technology, (AMEST'16),
    Biarritz, France, vol. 49, no. 28, October, 2016, pp. 91-96.
 7. L. P. Perera and B. Mo, ”Ship Speed Power Performance under Relative Wind Profiles,” Maritime Engineering and
    Technology III , Guedes Soares & Santos (Eds.), vol. 1, Taylor & Francis Group, London, UK, 2016, pp. 133-141.
 8. L. P. Perera and B. Mo, "Visual Analytics in Ship Performance and Navigation Information for Sensor Specific Fault
    Detection," In Proceedings of the 36th International Conference on Ocean, Offshore and Arctic Engineering (OMAE 2017),
    Trondheim, June, 2017 (OMAE2017-61118).
 9. L. P. Perera and B. Mo, ”Marine Engine Centered Data Analytics for Ship Performance Monitoring,” Journal of Offshore
    Mechanics and Arctic Engineering-Transactions of The ASME, vol. 139, no. 2, 2017.
10. L. P. Perera and B. Mo, ”Machine Intelligence for Energy Efficient Ships: A Big Data Solution,” Maritime Engineering and
    Technology III , Guedes Soares & Santos (Eds.), vol. 1, Taylor & Francis Group, London, UK, 2016, ISBN 978-1-138-03000-
    8, pp. 143-150.