<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Belief-based fault recovery for marine robotics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jeremy Paul Cofelt</string-name>
          <email>jcofelt@rosen-group.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mahya Mohammadi Kashani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrzej Wąsowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Kampmann</string-name>
          <email>pkampmann@rosen-group.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IT University of Copenhagen</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ROSEN Technology and Research Center GmbH</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Eighth Joint Ontology Workshops</institution>
          ,
          <addr-line>JOWO'22</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a framework expanding the capabilities of underwater robots to autonomously recover from anomalous situations. The framework is built around a knowledge model developed in three stages. First, we create a deterministic knowledge base to describe the “health” of hardware, software, and environment components involved in a mission. Next, we describe the same components probabilistically, defining probabilities of failures, faults, and fixes. Finally, we combine the deterministic and probabilistic knowledge into a minimal ROS package designed to detect failures, isolate the underlying faults, propose ifxes for the faults, and determine which is the most likely to help. We motivate the solution with a camera fault scenario and demonstrate it with a thruster failure on a real AUV and a simulated ROV.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ontology-based autonomy</kwd>
        <kwd>probabilistic logic programming</kwd>
        <kwd>fault detection and recovery</kwd>
        <kwd>marine robotics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Unmanned underwater vehicles (UUVs) are used for tasks that are impossible or too
dangerous for humans: bathymetric and ecological surveys, ofshore infrastructure inspection and
maintenance, or clearing underwater minefields. The required autonomy and reliability for
these missions grows along with their range and duration. When things go wrong, these robots
cannot ask a human for help like service robots or safely stop in place like ground robots.</p>
      <p>
        Remotely operated vehicles (ROVs) are UUVs that are designed to be neutrally buoyant and
remain tethered to a surface vessel. Despite this, notable ROVs, such as the Kaikio, have have
been lost at the bottom of the ocean [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Autonomous underwater vehicles (AUVs) are UUVs
that operate with complete autonomy, untethered to any human operator or vessel, which
puts them at an even greater risk of being lost. Although designed to be positively buoyant
and float to the surface when things go wrong, AUVs can drift far away from their last known
location, be struck by a passing vessel, or lost forever under an ice shelf [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. A recovery is
time consuming and expensive, as a specialized ship and crew must be chartered to search
for and load the robot. Several noteworthy AUVs remain lost [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Because of these risks and costs, it is especially important that UUV missions are not aborted
unnecessarily. To this end, we propose a solution enabling compromised marine robots to
autonomously recover and successfully complete their missions. Our main contributions include:
• An ontology deterministically describing the “health” of hardware, software, and environment
components involved in an underwater mission. The semantics and syntax of the ontology
allow information to be shared between the systems on the vehicle, the engineers that
develop it, and the domain experts that guide the knowledge base.
• A probabilistic extension of the ontology to describe the same components in terms of
likelihoods of failures, faults, and fixes. This guides which fixes are available and whether they
are likely to resolve the problem or lead to new faults.
• A knowledge service combining the deterministic and probabilistic capabilities above. In
order to simplify and accelerate development time, our solution relies only on existing,
open-source resources, such as OWL, Protégé, Prolog, and ROS.
• A demonstration of feasibility with experiments involving a catastrophic thruster failure
during a real mission and various failures during a simulated shipwreck survey.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Deterministic knowledge</title>
      <p>
        Before proceeding, we define two key terms. Following Laprie [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (as extended in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) and ISO
standards [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we consider a failure to be a deviation from some desired behavior and a fault to
be the defect or anomaly that caused the failure. For instance, blurry images could be a failure
due to numerous faults, including a damaged camera sensor.
      </p>
      <p>Now, consider an AUV several days and hundreds of kilometers into a week-long port-to-port
pipeline inspection. The AUV relies on a camera to avoid obstacles, track the pipeline, and
document findings. How should the AUV behave if it suddenly experiences a camera fault?
What knowledge and reasoning are required to turn a failed or aborted mission into a successful
one? We investigate these issues through the following competency questions.</p>
      <p>Q1: How is a fault detected? To detect a camera failure, the ontology must permit clear
definitions of metrics for nominal camera operation so that a “health monitor” can process all
camera-related data and determine whether current readings are within expected limits. For
instance, metrics like power draw and image brightness can assess the health of a camera.</p>
      <p>Q2: What faults can cause a failure? Unusual power draw can be caused by a camera
hardware fault, a loose connector, or a short circuit in a cable. Dark images could be caused by a
faulty sensor, or by environmental disturbances (“faults”) such as high turbidity. Consequently,
our ontology must capture all failures for each potential fault. For the AUV example, let
CamFault = {cam_hardware_fault, cam_driver_fault, . . .} be the set of all camera faults, and
CamFail = {power_fail, raw_data_fail, processed_data_fail, . . .} be all camera-related
failures. Then for a failure fail ∈ CamFail, the model constrains the possible faults to:
fault ∈ CamFault ∧ hasFault(cam0, fault) ∧ resultsIn(fault, fail)
Q3: What failures could result from a fault? If the camera sensor is damaged (a fault),
multiple failures are expected to occur—power draws unusually high or low, raw camera data
with unexpected distributions, and suspiciously dark or noisy processed videos. The ontology
must fully describe the many-to-many relationships between faults and failures. In our example,
if a fault ∈ CamFault occurs, it produces failures satisfying the following constraint:
fail ∈ CamFail ∧ hasFailure(cam0, fail) ∧ resultsIn(fault, fail)
% individual sensor capabilities
hasCapability(mono_camera, 2D_vision_data).
hasCapability(stereo_camera, 3D_vision_data).
hasCapability(side_scan_sonar, 2D_sonar_data).</p>
      <p>hasCapability(mbes_sonar, 3D_sonar_data).</p>
      <p>% ROS topics providing health metrics for camera "cam0"
hasHealthTopic(cam0, "remaro_auv/cam0/power_stats").
hasHealthTopic(cam0, "remaro_auv/cam0/processed_data/img_stats").
% Specific fields in the ROS messages that correspond to health metrics
hasHealthMetric("remaro_auv/cam0/power_stats", cam0_current_draw).
hasHealthMetric("remaro_auv/cam0/processed_data/img_stats", cam0_img_brightness).
hasHealthMetric("remaro_auv/cam0/processed_data/img_stats", cam0_img_noise_ratio).
% Acceptable limits on each health metric
hasHealthLimits(cam0_current_draw, [0.0, 0.2]). % amps
hasHealthLimits(cam0_img_brightness, [0.1, 0.9]). % 0=black --&gt; 1=white
hasHealthLimits(cam0_img_noise_ratio, [5.0, 1e6]). % 40=excellent</p>
      <p>Q4: Are any fixes available for a fault? Some faults, like a damaged camera sensor, are
unrecoverable during mission execution, others like a buggy camera driver could have one or
more fixes. The ontology should support mapping faults to potential fixes, which results in
constraints and queries similar to those relating failures and faults.</p>
      <p>% Potential faults common to any camera
hasFault(Comp, camera_hardware_fault) : 0.01 :- isCamera(Comp).
hasFault(Comp, camera_firmware_fault) : 0.05 :- isCamera(Comp).
hasFault(Comp, camera_driver_fault) : 0.10 :- isCamera(Comp).
hasFault(Comp, camera_processor_fault) : 0.25 :- isCamera(Comp).
hasFault(Comp, camera_power_fault) : 0.05 :- isCamera(Comp).
hasFault(Comp, camera_comm_fault) : 0.10 :- isCamera(Comp).
% Possible camera failures
resultsIn(camera_hardware_fault, camera_current_draw_fail) : 0.50.
resultsIn(camera_hardware_fault, camera_img_brightness_fail) : 0.85.
resultsIn(camera_hardware_fault, camera_img_noise_ratio_fail) : 0.70.
resultsIn(camera_firmware_fault, camera_current_draw_fail) : 0.05.
% Potential fixes for various faults
hasFix(camera_hardware_fault, camera_cycle_power_fix) : 0.10.
hasFix(camera_firmware_fault, camera_cycle_power_fix) : 0.30.
hasFix(camera_driver_fault, camera_cycle_power_fix) : 0.30.
hasFix(camera_driver_fault, camera_restart_driver_fix) : 0.50.
% Note the dependence on whether the fault already exists
hasRisk(camera_cycle_power_fix, camera_hardware_fault) : 0.99</p>
      <p>:- isCamera(Comp), hasFault(Comp, camera_hardware_fault).
hasRisk(camera_cycle_power_fix, camera_hardware_fault) : 0.05</p>
      <p>:- isCamera(Comp), \+hasFault(Comp, camera_hardware_fault).</p>
      <p>Q5: Are there any potential risks with the proposed fixes? Some fixes come with no risk.
For instance, if an image processing service has crashed, restarting it can do no further damage.
Other fixes come with substantial risk. As an example, consider a central communication service
that has become unresponsive due to a large message queue. If left alone, the service might
recover and resume normal operation. However, attempting to kill and reinitialize the service
could have disastrous consequences as obstacle avoidance processes lose critical sensor data.
For these reasons, it is important the ontology includes not only a list of fixes for each fault, but
also a list of risks (additional faults) associated with each fix.</p>
      <p>
        Motivated by the competency questions above, we propose the REliable MArine RObotics
(REMARO) ontology outlined in Fig. 1. We have elaborated it and translated into OWL syntax [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
using the Protégé editor [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The resulting OWL file, which is available in the REMARO Project
repository,1 is used to define the basic classes and relationships in the knowledge base.
      </p>
      <p>The deterministic capabilities of the ontology are ideally suited for storing mission-agnostic
knowledge, such as relationships between goals, tasks, capabilities, and components. As an
example, consider the Prolog snippet in Fig. 2, which relates components and capabilities.
Another Prolog file includes the health information specific to a particular vehicle configuration.
The snippet in Fig. 3 models the reliability of an AUV camera. Among other things, Fig. 3
includes health data directly and indirectly measuring camera performance. The availability of
such data is a limiting factor in determining which failures (and, therefore, faults) are detectable.</p>
      <sec id="sec-2-1">
        <title>1https://github.com/remaro-network/remaro_ontologies</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Probabilistic knowledge</title>
      <p>Suppose the AUV has an image quality failure. As previously discussed, this could be due to
numerous faults, such as a broken sensor (hardware), a crashed processing script (software), or
turbid water (environment). In general, a dead camera is less likely than a software bug. And,
depending on the environment, murky water could be near impossible or almost certain. For
these reasons, it is valuable to consider not only faults and failures, but also their likelihoods.</p>
      <p>In addition, some recovery “fixes” might resolve the original issue but produce new faults as
unintended side efects. Again, probabilities could be attached to both the proposed fixes and their
potential risks. If a fix is likely to restore a key capability and has low probability of detrimental
side efects, it should likely be executed. However, if the fix involves cycling power to some
mission-critical system/section of the vehicle (compromising the safety of the entire vehicle if
power cannot be restored), then it should be avoided. With probabilities, these pros and cons can
be weighed to guide and explain eforts towards capability restoration and mission adaptation.</p>
      <p>
        As shown in Fig. 2 and 3, we represent deterministic knowledge in Prolog. To add probabilistic
knowledge while remaining in Prolog, we use the open-source cplint2 framework. This allows
knowledge to be interpreted as a probability distribution over deterministic logic programs.
cplint includes the PITA and MCINTYRE modules for exact and approximate causal inference,
respectively. With them, queries to the knowledge base can be calculated by considering a joint
distribution over possible words [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>These probabilistic capabilities are ideal for describing failures, faults, and fixes specific to
the vehicle and environment in the current mission. For instance, suppose priors such as
Pr(fault) =  and</p>
      <p>Pr(fail | fault) = ,
are known for all fault ∈ Faults and fail ∈ Fails. Then cplint provides a reasoning
engine for the Bayes’ Theorem calculations required to compute posteriors of the form
Pr(fault | fail1, . . . , fail),
(1)
(2)
which can be used to determine the fault most likely to cause the detected failure(s). Fig. 4 lists
a few such probabilistic facts for an AUV camera. To clarify the notation in Fig. 4, note that</p>
      <p>Pr(coin shows heads | coin is tossed ∧ coin is fair) = 0.6
would be translated as:</p>
      <p>showsHeads(coin) : 0.6 :- isTossed(coin), \+isFair(coin).</p>
      <p>Note that faults, such as those given in Fig. 4, are carefully defined in broad terms to cover
all internal, external, and interface aspects of each component. Also, unless stated otherwise,
probabilities are assumed independent of each other and based on some consistent time interval
(e.g., per mission, per hour of mission, per device lifetime).</p>
      <p>As much of the power in the proposed solution lies in its probabilistic capabilities, it is worth
discussing potential sources of priors like those shown in Equations 1-2 and Fig. 4. We believe
the ideal approach involves a combination of the following:
2https://github.com/friguzzi/cplint
• Device manufacturers: Data sheets and manufacturer engineers/technicians are perhaps
the simplest source of information regarding component-related faults. Manufacturers
can also provide details about power, communication, and environmental limitations that
are otherwise dificult to determine without tedious and expensive experimentation.
• Past missions: If available, results from past missions can provide improved estimates
for a particular vehicle configuration or mission environment. For instance, suppose
data sheets give reliability estimates that are contradicted by the observed frequency of
faults for similar vehicles and environments. Then, these observed probabilities should
be prioritized over the theoretical probabilities provided by the manufacturer.
• Simulated missions: When a new vehicle configuration or mission environment is
being considered, the cheapest and quickest source of information is usually simulation.</p>
      <p>Unfortunately, some faults and failures are dificult or impossible to simulate.
• Domain experts: With their unique knowledge and experience, domain experts can
combine all of the above into improved estimates. Unfortunately, domain experts can be
dificult to find, expensive to hire, and still susceptible to human error.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Health monitor implementation</title>
      <p>As shown in Fig. 5, the proposed health monitor consists of a publisher/subscriber node
connected to a knowledge base. Before detailing the implementation, we ofer the following
justifications for basing our solution in ROS: (1) it is the de facto standard for robotics
middleware, (2) real-world data is available in ROS "bags", (3) a ROS-based UUV simulator is
available for experimentation, and (4) a popular ROS-based knowledge base already exists.</p>
      <sec id="sec-4-1">
        <title>4.1. Knowledge base</title>
        <p>The knowledge base provides a means of storing existing knowledge and reasoning to generate
new knowledge. A priori knowledge like that shown in Fig. 2-4 is mostly static and reusable
between missions. For instance, a team will likely have an inventory of actuators and sensors that
it shares across multiple robots. Because the capabilities of these components are deterministic
and remain fixed, the team could have a single components_and_capabilities.pl file, like the
one shown in Fig. 2. Similarly, the potential faults, failures, and fixes for these components do not
change between missions. So, the team could also have a single fails_faults_and_fixes.pl
ifle containing probabilistic knowledge similar to that shown in Fig. 4. These files can be
extended as inventory and beliefs evolve, but should not change for regular day-to-day testing.</p>
        <p>Other knowledge will vary on a mission-by-mission basis. For instance, for each vehicle
configuration, there should be a corresponding health_topics.pl file, like the one shown in
Fig. 3. This information should match the current vehicle and environment. For instance, for an
AUV camera, nominal img_brightness could depend on whether the mission occurs near the
surface or at depth, during day or night, in a turbid river or in clear ocean water.</p>
        <p>Currently, information contained in these files is loaded during initialization of the ROS node
discussed in Sec. 4.2. In the future, we plan to fully integrate our solution with KnowRob3,</p>
        <sec id="sec-4-1-1">
          <title>3https://github.com/knowrob/knowrob</title>
          <p>
            which is an open-source, ROS-based knowledge processing system for acquiring, grounding,
representing, and reasoning with heterogeneous knowledge [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. For now, our solution relies
only on the rosprolog4 functionality within KnowRob, which provides an interface between
Prolog and other ROS nodes. Its primary usage is extracting knowledge from the Prolog files
mentioned above and using this knowledge to answer queries from other ROS nodes, such as
the proposed health monitor node.
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4https://github.com/knowrob/rosprolog</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. ROS node</title>
        <p>As shown in Fig. 6, the ROS node is both a subscriber and publisher. On initialization, the node
extracts health topics, metrics, and limits from health_topics.pl. It then subscribes to health
topics for all hardware, software, and environment components defined for the current mission.</p>
        <p>During nominal operations, the node simply awaits published messages on the monitored
topics. When a new message is received, a callback is prompted, the current health value is
extracted, and that value is compared to the predefined limits. If the value is within range, the
callback terminates and the subscriber resumes waiting for the next message.</p>
        <p>If a value is non-nominal, the failure triggers the FailFaultFix subroutine. First, a STOP
procedure is initiated to protect the afected component from further damage. These STOP
procedures consists of ROS messages that can, for example, unpower a sensor, return an actuator
to default position, or kill a software service.</p>
        <p>Next, the FailFaultFix subroutine searches for faults capable of causing the detected
failure. The most likely fault is determined (via cplint queries involving the knowledge in
fails_faults_and_fixes.pl). For this fault, its most likely fix (determined similarly) is then
attempted. Like the emergency STOP procedures, each FIX publishes a sequence of ROS
messages. For instance, the messages could restart a camera driver, adjust perception algorithms
to account for visibility changes, or pulse the thruster to clear a blockage. The last step in the
FIX procedure essentially undoes the STOP procedure. If the failure is no longer detected, the
ifx is assumed successful and the FailFaultFix subroutine exits. If the failure remains, the next
most likely fix is considered until all have been exhausted. Once all possible fixes for a fault
have been attempted, the next most likely fault is considered along with its fixes. This process
is repeated until either the failure has been resolved or no more potential fixes remain.</p>
        <p>
          Although the current implementation can compute risks (likelihoods of side-efect faults) of
candidate fixes, it can also attempt fixes for those faults should they be realized. As such, the
current priority is fixing present faults, not avoiding potential faults. In future work, we intend
to consider the upper half of Fig. 1 and how such risks might jeopardize mission success. For
now, should the FailFaultFix subroutine exists unsuccessfully, then higher-level services must be
involved to adapt mission plans for any compromised capabilities. The most notable frameworks
for this include the Cognitve Robotics Abstract Machine (CRAM)[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and Metacontrol for ROS
(MROS)[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. We consider our current solution a complement to such systems, ensuring faults
are truly unrecoverable before resorting to component substitutions and controller adaptations.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <sec id="sec-5-1">
        <title>5.1. Real thruster fault on an AUV</title>
        <p>In our first scenario, data is analyzed from a real field trial where an AUV sufered a catastrophic
thruster failure when a cable became tangled around the rotor blades. As Fig. 7 shows, several
failures occurred in quick succession. Almost simultaneously, the diference jumped between
desired and measured speeds, and the temperature began rapidly increasing in the motor.
Around 10 seconds later, three power issues occurred near synchronously.</p>
        <p>Since these power-related metrics/messages may have diferent publication rates, all of them
should be included in the health monitor to ensure the earliest possible failure indicator is caught.
Other particularly useful metrics can be derived from standard topics. For instance, monitoring
the relative diference between desired and measured values is usually more efective than
measuring the values individually. For this reason, we consider the scaled speed diferential:
speed_diff = |speed_measured − speed_desired| / speed_max</p>
        <p>In addition, we suggest limits on both temperature and its time derivative. More specifically,
we consider an averaged derivative over an interval of ∆  = 5 seconds, which provides a nice
compromise between slow response time and frequent false alarms from noisy readings:
temp_gain() = [motor_temperature() − motor_temperature( − ∆ )] / ∆ 
To avoid similar future faults, we propose the settings shown in Fig. 8. With this configuration,
the health monitor (limited only by the 0.3 Hz publication rates of the desired_speed and
motor_temperature topics) detects within 5 seconds both the temperature gain and speed
discrepancy. This results in a STOP message (setting speed to zero) long before the current
spike that ultimately destroyed internal circuitry. Notice the catastrophic fault could have been
prevented thanks in part to the internal motor temperature sensor. A cheaper thruster might
not provide such data causing the same failure to go undetected in time to save the device.</p>
        <p>Although not testable with the available ROS bag, we believe several fixes were possible. First,
a "wait-and-see" fix could have allowed the cable to drift out on its own. Other possibilities
include alternating pulses to loosen a jam or reversing to unwind an entanglement. Each of
these attempts would be immediately aborted if health metrics continued to worsen.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Simulated thruster faults on an ROV</title>
        <p>
          Our second scenario considers a simulated ROV sufering a thruster fault while performing a
shipwreck survey. More specifically, we consider the REXROV surveying the Hercules wreckage
shown in the Gazebo simulator screenshot in Fig. 9(a). The ROV, ship, and terrain model are all
included in the uuv_simulator5 developed during the SWARMs Project [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>Because we do not have access to low-level hardware data in the simulation, like temperature,
power usage, and control signals, we instead use trajectory deviations as our primary health
metric. For demonstrations purposes, we configured the vehicle to follow a tight helical path as
it descends around the Hercules. With the shipwreck at 60 meters depth and the ROV starting
15 meters above, we began three trials, each with the ROV completing five loops of radius
14 meters and a vertical displacement 2.6 meters per loop. During the first scenario, no fault
was present and the ROV followed the desired trajectory. During the second, a fault was forced
at  = 540 seconds by manually disabling a subset of the sternside thrusters. This result is
shown in RViz in Fig. 9(b). The final scenario recreated the same fault, but followed with a
successful thruster_cycle_power_fix at  = 850 seconds. As can be seen in Fig. 10, the ROV
successfully recovered and reacquired the desired trajectory within 30 seconds.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Related work</title>
      <p>
        A variety of approaches are common for fault detection, including analytical, knowledge-based,
and data-driven models [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Analytical models are often derived from first principles to describe
diferences between observed and modeled behavior [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]. Such systems can be accurate,
but also dificult to develop. Knowledge-based methods, which rely on domain experts and
ontological formulations, can ofer simpler and more expressive solutions [ 18, 19]. Data-driven
      </p>
      <sec id="sec-6-1">
        <title>5https://github.com/uuvsimulator/uuv_simulator</title>
        <p>(a) Simulated scenario in Gazebo.
(b) Faulty trajectory in RViz.
methods have gained increasing popularity due to the rapid growth of AI-based techniques [20].
For example, in [21], a neural network was trained on empirical data to provide a soft-fault model
for marine thrusters. An interesting hybrid approach in [22] is used to develop an energy-aware
architecture for fault mitigation in AUVs from both data-driven actuator characterizations and
model-based anomaly diagnoses. We believe our proposed solution ofers a complement to such
methods, while adding probabilistic capabilities to existing knowledge-based approaches.</p>
        <p>
          Several existing frameworks rely on semantic knowledge representation for collaborative
underwater robots [23]. In the SWARMs Project [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], a probabilistic ontology was combined
with a multi-entity Bayesian network (MEBN) to reason with uncertainty for chemical spill
monitoring. Cordova [24] developed an ontology to describe robotic devices and skills to enable
collaborative knowledge acquisition between real and virtual worlds. Our ontology is also
designed to be shareable between robots and environments, but at the component level.
        </p>
        <p>Diab et al. [25] presented an ontology for overcoming task execution failures in service
robotics. In [26], a framework was developed for fault-tolerant mission adaptations in
underwater robotics by relating plans, actions, and capabilities. Such solutions attempt to overcome faults
by substituting components and capabilities, or altering plans. Our solution takes a diferent
approach by attempting to exhaust all possible fixes to allow tasks to be executed as planned.</p>
        <p>
          Self-awareness, self-adaptation, and metacontrol ofer additional perspective for overcoming
faults and failures. Patrón et al. [27, 28] used Boyd’s Cycle [29] of Observe → Orient →
Decide → Act to combine situational awareness with analytical planners to repair and modify
plans. In [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], TOMASys6 was used as part of a ROS-based metacontrol system for self-adapting
systems that combined multiple levels of abstraction and autonomy. TOMASys is also used
in [30] to enable self-diagnosis and autonomous reconfiguration to preserve mission-mandated
capabilities. Papadimitriou [31] demonstrated a semantic-based knowledge framework to
simulate and overcome a hardware fault on a REMUS100 AUV by performing mission and plan
adaptation during mine countermeasure missions (MCM). In each of these cases, the solutions
require complete frameworks in order to achieve their higher-level functionalities. Our
lowerlevel solution is designed to be an add-on to existing frameworks, including these, ensuring
ifrst that components cannot be repaired before they are replaced.
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This paper introduces the REMARO ontology for detecting failures, isolating their underlying
faults, and attempting fixes for their recovery. The ontology supports both deterministic and
probabilistic knowledge representation and reasoning. These probabilities allow domain experts,
past and simulated missions, and manufacturer-provided specifications to guide the order in
which faults and fixes are considered.</p>
      <p>To demonstrate the ontology, we propose a minimal ROS node designed to add fault recovery
capabilities to existing UUV frameworks. This node subscribes to health topics specified in
a mission configuration file. Whenever a new message is received that reveals non-nominal
health values, a subroutine is triggered to consider the most probable faults and their most
likely fixes. If a fix is found, monitoring resumes. Otherwise, higher-level services must be used.</p>
      <p>
        In future work, we hope to extend the health monitor node to provide component substitutions
when fixes fail and mission reductions when substitutions are unavailable. For now, our solution
serves as a complement to mission adaptation frameworks like CRAM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and metacontrol
architectures like MROS [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Other future plans include field testing of the presented solution
and extension of the solution to include more complex probabilistic representations. Such
extensions could consider recurring faults and temporary fixes, updating probabilities based on
outcomes from recently attempted fixes, and standardization of STOP and FIX subroutines.
      </p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under the Marie SkłodowskaCurie grant agreement No. 956200. For
more info, please visit https://remaro.eu.</p>
      <sec id="sec-8-1">
        <title>6Teleological and Ontological Model of an Autonomous System</title>
        <p>Systems (SysTol), IEEE, 2010, pp. 11–16.
[18] B. Liu, M. Duan, G. Zhao, An object frame knowledge representation approach for fault
diagnosis expert system, in: 2011 International Conference on Future Computer Sciences
and Application, IEEE, 2011, pp. 74–77.
[19] X. Deng, R. Luo, J. Li, Similarity matching algorithm of equipment fault diagnosis based
on CBR, in: 2015 6th IEEE International Conference on Software Engineering and Service
Science (ICSESS), IEEE, 2015, pp. 998–1002.
[20] X. Dai, Z. Gao, From model, signal to knowledge: A data-driven perspective of fault
detection and diagnosis, IEEE Transactions on Industrial Informatics 9 (2013) 2226–2238.
[21] S. Nascimento, M. Valdenegro-Toro, Modeling and soft-fault diagnosis of underwater
thrusters with recurrent neural networks, IFAC-PapersOnLine 51 (2018) 80–85.
[22] V. De Carolis, F. Maurelli, K. E. Brown, D. M. Lane, Energy-aware fault-mitigation
architecture for underwater vehicles, Autonomous Robots 41 (2017) 1083–1105.
[23] P. Patrón, Y. Petillot, The underwater environment: a challenge for planning, UK PlanSIG</p>
        <p>Edinburgh 2008, 2008.
[24] A. J. Cordova, Semantic web for robots : applying semantic web technologies for
interoperability between virtual worlds and real robots, Technische Universiteit Eindhoven,
2012.
[25] M. Diab, M. Pomarlan, S. Borgo, D. Bebler, J. Rosell Gratacòs, J. Bateman, M. Beetz,
FailRecOnt-an ontology-based framework for failure interpretation and recovery in
planning and execution, in: Proceedings of the 2nd International Workshop on Ontologies for
Autonomous Robotics, 2021, pp. 1–14.
[26] P. Patrón, E. Miguelanez, J. Cartwright, Y. R. Petillot, Semantic knowledge-based
representation for improving situation awareness in service oriented agents of autonomous
underwater vehicles, OCEANS, 2008.
[27] P. Patrón, E. Miguelanez, Y. R. Petillot, D. M. Lane, Fault-tolerant adaptive mission planning
with semantic knowledge representation for autonomous underwater vehicles, in: 2008
IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2008, pp.
2593–2598.
[28] P. Patrón, E. Miguelanez, Y. R. Petillot, D. M. Lane, J. Salvi, Adaptive mission plan diagnosis
and repair for fault recovery in autonomous underwater vehicles, in: OCEANS 2008, IEEE,
2008, pp. 1–9.
[29] H. Hillaker, Tribute to John R. Boyd, Code One Magazine 12 (1997).
[30] E. Aguado, Z. Milosevic, C. Hernández, R. Sanz, M. Garzon, D. Bozhinoski, C. Rossi,
Functional self-awareness and metacontrol for underwater robot autonomy, Sensors 21
(2021) 1210.
[31] G. Papadimitriou, D. Lane, Semantic-based knowledge representation and adaptive mission
planning for MCM missions using AUVs, in: OCEANS 2014-TAIPEI, IEEE, 2014, pp. 1–8.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ishibashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yoshida</surname>
          </string-name>
          ,
          <article-title>Developing a sediment sampling ROV for the deepest ocean</article-title>
          ,
          <source>Sea Technology</source>
          <volume>49</volume>
          (
          <year>2008</year>
          )
          <fpage>43</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Norgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lubbad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Skjetne</surname>
          </string-name>
          ,
          <article-title>Unmanned underwater vehicles in Arctic operations</article-title>
          ,
          <source>in: Proceedings of the 22nd IAHR International Symposium on Ice. Singapore</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>89</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Strutt</surname>
          </string-name>
          ,
          <article-title>Report of the inquiry into the loss of AutoSub2 under the Fimbulisen</article-title>
          ,
          <year>2006</year>
          . URL: https://eprints.soton.ac.uk/41098/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Copely</surname>
          </string-name>
          ,
          <article-title>Onwards &amp; downwards: when ROVs or AUVs are lost in ocean exploration</article-title>
          ,
          <year>2014</year>
          . URL: http://www.joncopley.com/blog_may14a.html.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Laprie</surname>
          </string-name>
          ,
          <article-title>Dependable computing and fault-tolerance: concepts and terminology</article-title>
          ,
          <source>in: Proceedings of the 25th International Symposium on Fault-Tolerant Computing, IEEE</source>
          ,
          <year>1995</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Carlson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>How UGVs physically fail in the field</article-title>
          ,
          <source>IEEE Transactions on Robotics</source>
          <volume>21</volume>
          (
          <year>2005</year>
          )
          <fpage>423</fpage>
          -
          <lpage>437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[7] ISO</source>
          <volume>10303</volume>
          :
          <string-name>
            <surname>2021(E),</surname>
          </string-name>
          <article-title>Industrial automation systems and integration - Product data representation and exchange</article-title>
          , Standard, International Organization for Standardization, Geneva,
          <string-name>
            <surname>CH</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Antoniou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. v.</given-names>
            <surname>Harmelen</surname>
          </string-name>
          , Web Ontology Language: OWL, in: Handbook on Ontologies, Springer,
          <year>2004</year>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Musen</surname>
          </string-name>
          ,
          <article-title>The Protégé Project: a look back and a look forward</article-title>
          ,
          <source>AI</source>
          Matters
          <volume>1</volume>
          (
          <year>2015</year>
          )
          <fpage>4</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Riguzzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Cota</surname>
          </string-name>
          , E. Bellodi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zese</surname>
          </string-name>
          , Causal inference in cplint,
          <source>International Journal of Approximate Reasoning</source>
          <volume>91</volume>
          (
          <year>2017</year>
          )
          <fpage>216</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Beetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Beßler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Haidu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pomarlan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Bozcuoğlu</surname>
          </string-name>
          , G. Bartels, KnowRob
          <volume>2</volume>
          .0
          <article-title>-a 2nd generation knowledge processing framework for cognition-enabled robotic agents</article-title>
          ,
          <source>in: 2018 IEEE International Conference on Robotics and Automation (ICRA)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>512</fpage>
          -
          <lpage>519</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Beetz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mösenlechner</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Tenorth, CRAM-A Cognitive Robot Abstract Machine for everyday manipulation in human environments</article-title>
          ,
          <source>in: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems</source>
          , IEEE,
          <year>2010</year>
          , pp.
          <fpage>1012</fpage>
          -
          <lpage>1017</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Corbato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bozhinoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Oviedo</surname>
          </string-name>
          , G. van der Hoorn,
          <string-name>
            <given-names>N. H.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tjerngren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wąsowski</surname>
          </string-name>
          , MROS:
          <article-title>Runtime adaptation for robot control architectures</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>09145</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bilbao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Martín-Wanton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bastos</surname>
          </string-name>
          , J. Rodriguez, SWARMs Ontology:
          <article-title>A common information model for the cooperation of underwater robots</article-title>
          ,
          <source>Sensors</source>
          <volume>17</volume>
          (
          <year>2017</year>
          )
          <fpage>569</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Braatz</surname>
          </string-name>
          ,
          <article-title>Fault detection and diagnosis in industrial systems</article-title>
          , Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>M. L. Leuschen</surname>
            ,
            <given-names>I. D.</given-names>
          </string-name>
          <string-name>
            <surname>Walker</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Cavallaro</surname>
          </string-name>
          ,
          <article-title>Fault residual generation via nonlinear analytical redundancy</article-title>
          ,
          <source>IEEE transactions on control systems technology 13</source>
          (
          <year>2005</year>
          )
          <fpage>452</fpage>
          -
          <lpage>458</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shumsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhirabok</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hajiyev</surname>
          </string-name>
          ,
          <article-title>Observer-based fault diagnosis in thrusters of autonomous underwater vehicles</article-title>
          ,
          <source>in: 2010 Conference on Control and Fault-Tolerant</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>