<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hallucinating Hidden Obstacles for Unmanned Surface Vehicles Using a Compositional Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jon Muhovič</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gregor Koporec</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Janez Perš</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Electrical Engineering, University of Ljubljana</institution>
          ,
          <addr-line>Tržaška 25, 1000 Ljubljana</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Gorenje</institution>
          ,
          <addr-line>d.o.o., 3320 Velenje</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The water environment in which unmanned surface vehicles (USVs) navigate presents many unique challenges. One of these is the risk of encountering obstacles that are (partially) submerged and therefore poorly visible. Therefore, their extent cannot be determined directly from available above-water sensor data. On the other hand, it is well known that human skippers are able to safely navigate boats around obstacles even without underwater sensors and only with the help of their expertise. In this paper, we describe initial work on extending the USV obstacle detection to include such functionality using a compositional model. To learn to hallucinate the extent of obstacles with a minimum of learning efort, we exploit the nature of obstacles (people in kayaks, canoes, and on paddleboards) that are visible most of the time, but not always. We evaluate the impact of such hallucinations on USV safety and maneuverability, and suggest additional cases where such hallucinations can be used to improve USV safety.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;unmanned vehicles</kwd>
        <kwd>USV</kwd>
        <kwd>obstacle detection</kwd>
        <kwd>compositional models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Unmanned surface vehicles (USVs) are increasingly
recognized as a valuable tool for a variety of applications,
including military, environmental, and commercial
purposes. These autonomous craft are capable of operating
in dificult or hazardous environments, making them
ideal for tasks that would be too risky for humans.</p>
      <p>
        On the other hand, one of the envisioned benefits of
USVs is the ability to gather data and perform tasks for
extended periods of time without the need for human
intervention. This would allow them to cover large areas
and collect a large amount of data that can then be used
for a variety of purposes. USVs equipped with sensors
and cameras could be used, for example, to monitor and
map the marine environment, track wildlife [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], or assess
the health of coral reefs [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, truly autonomous
vehicles with no captain on board and no contact with
remote operators must essentially duplicate the
reasoning of a trained skipper in certain situations. One of
those situations are (partially) submerged objects that
cannot be detected by USV sensors located above the
      </p>
      <p>
        This paper is organized as follows: Following related
work, we define the problem we want to use to
demonstrate the capabilities of our method. We then introduce
the basic concepts of compositional models and describe
our use case and evaluation method. In the experimental
part, we present our own dataset used in our experiments
and its properties, followed by the evaluation setup
focusing on the USV navigation. Finally, we discuss the results
and further applications of the presented approach.
2. Related work
branches of obstacle detection have been improved. On
the one hand, researchers have adapted or retrained
general object detectors for marine environments [
        <xref ref-type="bibr" rid="ref10 ref18">18, 19, 10</xref>
        ]
using more precise classification information and custom
datasets. However, such approaches only work for
welldefined objects. Unknown structures, such as floating
debris or piers, cannot usually be detected using such
methods.
      </p>
      <p>
        The other branch of obstacle detection is semantic
segmentation. Several methods have adapted general
segmentation methods to the marine environment [
        <xref ref-type="bibr" rid="ref7">20,
7, 21</xref>
        ]. Obstacle detection can be performed using such
methods by determining regions that are partially or
completely surrounded by water.
      </p>
      <p>
        The method presented in this paper operates at a
higher level of reasoning and aims to use assumptions
that reasonably hold in water-bound environments. It
relies on existing but imperfect methods for obstacle
detection (in this paper we use Yolov7 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).This work
contains two contributions:
Recently, numerous papers have been published on the
subject of USV sensors, obstacle detection and navigation.
      </p>
      <p>
        The computer vision aspect of marine environment
interpretation has been approached in several ways so
far: Some authors have acquired datasets to facilitate
domain transfer for Deep Learning and further investigate
the specific problems in the maritime domain [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        Several USV architectures with diferent sensors have
been presented to solve problems such as poor lighting • A method for improving the safety of USV and
conditions and the need for absolute distance measure- its environment by improving the estimation of
ments [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ]. In addition, authors have proposed deep free passage corridors in front of the USV, even
learning methods specific to the maritime domain that ei- with imperfect obstacle detectors.
ther incorporate additional relevant modalities or address • An evaluation method that evaluates increase of
problems that arise in the maritime domain [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Numer- safety in that case
ous publications have also been presented that address
automatic navigation and maritime collision avoidance
compliance [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. 3. Problem definition
      </p>
      <p>
        Han et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] have presented a complete platform and
framework for obstacle detection and avoidance, com- In situations where we cannot reliably observe fully or
plete with multimodal sensors, obstacle detectors, and partially submerged obstacles using any of the sensors
collision avoidance rules. They use SSD detector [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to mounted above the water, we use knowledge of
comdetect potential obstacles and track them using sensor fu- monly occurring structures in marine environments to
sion. Since real-time performance is usually desired, fast improve the safety of a USV.
detectors such as SSD or YOLO [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] are usually preferred In this paper, we present preliminary research results:
for USV applications. We focused on the problem of detecting boats or other
      </p>
      <p>
        Several datasets have also been published, some of floating objects in situations where a person was detected
which will be used as learning data for Deep Learning- above the water surface, but the corresponding boat was not
based methods and others as benchmarks for existing detected. Such cases often occur when boats are of a
simmethods. One such dataset, SMD, was proposed by ilar color to the surrounding water, partially submerged
Prasad et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It contained 51 RGB and 30 NIR se- due to maneuvering, or are otherwise poorly visible due
quences and was primarily intended for monitoring. to backlight or the distance between a smaller object and
Since then, several more USV oriented datasets have the camera. The work was performed using RGB images,
been proposed, such as MODD [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], MaSTr1325 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and because of the wide availability of pre-trained object
deMODS [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. tectors that perform reasonably well without the need
      </p>
      <p>
        In the past, obstacle detection was performed directly for additional training.
by estimating salient regions [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] or color segmenta- Since we are dealing with coastal and continental
wation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Before the widespread use of Deep Learning, ter regions where smaller boats such as rowboats and
several approaches were also proposed that mainly fo- paddle boats are usually found, consistent detection of
cused on semantic segmentation followed by anomaly such obstacles is necessary. Depending on the lighting
detection. These methods [
        <xref ref-type="bibr" rid="ref15 ref17">15, 17</xref>
        ] used prior informa- conditions, size and color of the boats, detection with
tion about the scene and refined it with color image in- conventional detectors applied to color images is not
alformation. With the advent of Deep Learning, the two ways consistent. This inconsistency can be a hazard to
safe navigation, especially when maneuvering near other
boats.
      </p>
      <p>This problem has the following interesting properties:
• Solid physical foundation. People cannot walk
or sit on the water. There must be some kind of
highly buoyant device present to support their
weight.
• No opportunity to introduce gross errors with
false detections. False positives only restrict the
possibilities for the USV to advance, and our
experiments were designed to check for that efect.
• No manual annotations are needed, since we can
obtain ground truth using the object detector
(Yolov7) and therefore obtain plenty of data to
train the higher-reasoning model.</p>
      <sec id="sec-1-1">
        <title>The method will be later extended to a wider range of problems, which are discussed in Section 7.2, but represent edge cases and thus are subject to problems of data collection.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Our method</title>
      <sec id="sec-2-1">
        <title>4.1. Compositional models</title>
        <p>In computer vision, a composition refers to the
arrangement of visual elements in an image. These visual
elements are called parts and can be low level primitives (e.g.
edges, corners) or high-level objects themselves (e.g. cap,
a label and recognizable shaped bottom on a bottle of
soft drink), as shown in Fig. 2. Parts can be compositions
themselves, yielding a hierarchical compositional model.</p>
        <p>The compositional model, as shown in Fig. 2 is not
particularly useful, as it is completely rigid. In
practice, geometrical parameters of the parts are modelled
as random vectors. In Figure 3 we show a hierarchical,
compositional model of a 3-part Coke bottle under the
assumption that the probability distribution of -th part
position ( ,  ) relative to the center (origin) of the
-th composition is Gaussian:
1–9
Our method is heavily influenced by the work of
Koporec et al. [22], that uses hierarchical compositional
models to detect objects’ visible parts even when large
parts of objects are occluded, and allows collection of ex-   and covariance matrix Σ . The parameters of the
pert knowledge from a small number of targeted human Gaussian distribution are obtained by learning on a
sufannotations. In our work we use a highly simplified im- ifciently large set of train data, from which vectors X
plementation of Human-Centered Deep Compositional are extracted.
(HCDC) model [22].</p>
        <sec id="sec-2-1-1">
          <title>Compositional models can be used in the following</title>
          <p>ways:
• 1. Robust, explainable detection of partially
occluded objects, where the object (composition) is
detected even if not all its parts are visible.
• 2. Explanation (hallucination) of the missing part.</p>
          <p>This is the functionality we use in the presented
work.</p>
          <p>where subscripts  and  denote left-top or
right4.2. Model of a person on a boat bottom point of the boat bounding box, respectively, and
 denotes the scale index. Therefore, a total parameter
Human-Centered Deep Compositional (HCDC) set of our 2D model consists of 2 Gaussian means and
model [22] operates on parts that are itself deep 2 2D Gaussian covariance matrices.
detections (detections, obtained by convolutional
neural network models, CNNs). This makes the model 4.3. Training the compositional model
explainable, as the parts are already categorized into
human-understandable categories. Our training does not require any manual annotations.</p>
          <p>We follow this example and use the detections Due to pretty good (but not perfect) performance of the
provided by an obstacle detector pretrained on MS chosen detector (Yolov7 detects about 95% boats and even
COCO [23]. We only retained the pertinent detection higher percentage of persons) we use those cases where
classes: person, boat and surfboard. Additionally, we both the boat and the person on it were detected, to
treated the classes boat and surfboard as the same se- establish a model that can reasonably predict the position
mantic entity (referred to as boat in the remainder of the and size of a boat in absence of detections.
text), since both of those classes almost always appear Although we assume Gaussian model for probability
simultaneously with the class person. The compositional distributions  and , we estimate each separate
model that we use is shown in Fig. 4. distribution using expectation maximization (EM)
algorithm with 2-component Gaussian Mixture Model (GMM)
and retain the larger of the two components as either
Person usingo2r-comp.oOneunrtpGreMliMmirneasruylttseisntimngorheaascrceuvreaatelefitdtitnhgat
of Gaussian model to the data, collecting the outliers in
the significantly smaller component.
d
 =</p>
          <p>︀] 
X = [︀ 
X = [︀</p>
          <p>︀] 
X ∼  ( , Σ)
X ∼  ( , Σ)
1–9
(2)
ΣL
 L
 R Boat</p>
          <p>ΣR</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>In our case, the Eq. (1) changes, since we have two</title>
          <p>separate Gaussian models for upper-left and bottom-right
corners of the boat bounding box, and that for each of 
scales.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>4.4. Hallucination</title>
        <p>To hallucinate the most likely bounding box of the
(undetected) boat, we examine the bounding box of the
detected person, calculate its centroid and diagonal ,
calculate the scale index  and look up the relevant Gaussian
models  and  obtained during the training. The
hallucinated bounding box points of a boat are
determined at displacements  and  at which  and
 have their maximum values. Note that  and 
are relative to the person’s centroid point.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. USV safety-focused evaluation</title>
      <p>
        To compare performance of object detectors, a generic
approach by counting false positives and false negatives,
with respect to some minimum intersection over union
(IoU) value is often used. However, when evaluating
the detectors in with actual application in mind, it is
often the case, that not all errors are equally important
or relevant. For example, USV benchmark [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] defines a
so-called danger zone to evaluate more relevant obstacles may not have any possibility of advancing, and
separately. The problem that we are addressing in this regardless of the increase of safety, this solution
work is increasing safety of the USV navigation, in cases is not good. Coverage is obtained by dividing the
where actual boats are not detected. The challenge is, width of the evaluation line in pixels with the
how do we measure increase in safety? sum of the pixels on the evaluation line, covered
      </p>
      <p>Note that a crucial safety issue here is that the USV by projected bounding boxes.
may navigate in the areas that actually contain part of the
boat. Fig. 5 shows the situation with multiple detections This evaluation protocol does not assume or require
and corresponding hallucinations. The aim of the USV is complex obstacle avoidance maneuvers, and is not
sensito proceed in the forward direction, but it has to avoid ob- tive to vertical displacement of bounding boxes.
stacles. Therefore, it can proceed only through navigable
channels, marked with arrows in Fig. 5. To ensure safety, 6. Experiments
navigable channels cannot contain any part of the boat
at any distance, and the problem can be compressed to
one-dimensional representation along the horizontal ()
axis. However, if the hallucinations are too wide, there
may not be any navigable channel left in front of the
boat.</p>
      <p>Therefore, we define the following two metrics:</p>
      <p>We recorded several hours of video on the Ljubljanica
river (sessions denoted LJU1, LJU2, and LJU3) in
diferent weather conditions, on Lake Bled (denoted BLE1),
and on the Adriatic Sea (near the coast, in several areas
between Koper and Portorož), denoted ADR1. In each
case, we hired human workers who served as obstacles
• One-dimensional IoU value (referred to as IoU- in boats, kayaks, canoes and on paddleboards. The data
1D), calculated from the projections of actual contains about 10 obstacles in the near vicinity of the
(ground truth) bounding boxes and hallucinated recording boat, captured in diferent configurations and
bounding boxes, both projected downwards onto from diferent angles relative to the position of the sun (so
the horizontal axis (evaluation line in Fig. 5). This challenging backlit scenes were also captured). Videos
value should be as high as possible. were recorded at 10 frames per second using Stereolabs
• One-dimensional coverage (referred to as Cov- ZED 3D stereo camera1, mounted between 1-1.5 meters
1D) of the horizontal axis (evaluation line) with above the water surface (diferent watercraft were used
projection of both ground truth bounding boxes at diferent locations). In this experiment we only use the
and hallucinated bounding boxes. If the coverage
of hallucinations becomes too high, then USV 1https://www.stereolabs.com
left RGB images, the right RGB image and depth were
not used in any way.</p>
      <sec id="sec-3-1">
        <title>6.1. Analysis of dataset contents</title>
        <p>The training data was constructed by first obtaining
predictions for all the relevant classes using Yolov7. The
compositions were then constructed from cases where
there was overlap between detections of class person and
either of the classes boat or surfboard.</p>
        <p>Analysis of the detections provide some insight into
the problem of ”invisible” boats and paddle boards, as
shown in Table 1.</p>
        <p>Session (dataset)
person only (%)
person+boat (%)</p>
        <p>LJU1
0.04
0.96</p>
        <p>LJU2
0.05
0.95</p>
        <p>LJU3
0.05
0.95</p>
        <p>BLE1
0.03
0.97</p>
        <p>ADR1
0.05
0.95</p>
      </sec>
      <sec id="sec-3-2">
        <title>6.2. Training</title>
        <p>We decided to use session BLE1 for training of the
Gaussian distributions  and , as it featured boats of
varying shapes and sizes. The training time using
precalculated Yolov7 detections was negligible.</p>
      </sec>
      <sec id="sec-3-3">
        <title>6.3. Testing</title>
        <p>Free from requirements for manual annotation, we were
able to run the evaluation of our method on all images
from our dataset, For evaluation, we used only the
detections of people with corresponding boats. Boat
detections, obtained via Yolov7, were considered ground
truth, against which the hallucinations, obtained using
our compositional model, were tested. Person detections
without corresponding boats were not used, as these had
no usable ground truth. Table 2 shows the results.</p>
        <p>Analysing the results, we can see that there is good
overlap between ground truth detections and
hallucinations, with IoU-1D ranging from 0.465 to 0.605 for the
same dataset on which the model was trained. Note that
IoU-1D of 0.5 means that the middle half of bounding
box projections overlap, while the 1/4 at each edge is
non-overlapping.</p>
        <p>Coverage of hallucinations is not as high as coverage
of detections, and, most surprisingly, coverage of pure
person detections (e.g. in absence of any detected boats)
is not much lower than the coverage of hallucinations.
We examined the reason behind this and found that the
increase is not as high as expected due to obstacles which
are further away and have disproportionately wide
person detection bounding boxes, and due to diferences
in the set of boats used for training and testing (note
the highest increase in Cov-1D from person detection
to hallucination when the training set BLE1 was tested).
Figure 6 shows an image where the result of our method
is poor.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>7. Discussion</title>
      <p>This paper presents a preliminary research on use of
hallucinations, provided by compositional models, in
water-borne obstacle detection and avoidance. The
experimental design in this work has been subject to many
constraints, most notably the absence of proper ground
truth annotations. These issues will be addressed in
further work, towards a general framework to hallucinate
obstacles that are not directly observed by the sensors.</p>
      <p>Since using an obstacle detector precludes us from
detecting unknown objects, combining their results with
either semantic segmentation or another method of
anomaly detection or a diferent sensor modality (such as
LIDAR) might help in producing a more general hazard
detection system that will perform hazard detection from
multimodal cues.</p>
      <sec id="sec-4-1">
        <title>7.1. Underwater sensors</title>
        <p>A state of the art in experimental autonomous road
vehicles relies heavily on multimodal sensor setup, with
sensors like LIDAR and RADAR [24, 25], which bear no
resemblance to human sensing. Therefore, an argument
could be made that instead of hallucinating the obstacles
and trying to emulate the skipper, one could detect the
hidden obstacles using proper underwater sensor setup.
Session (dataset)
IoU-1D
Ground truth Cov-1D
Hallucination Cov-1D
Person detection Cov-1D</p>
        <p>LJU1
0.465
0.13
0.074
0.067</p>
        <p>In practice, this results in fragile setup due to water tur- leaves in the fall), so avoiding it at all times is not an
opbidity – USVs are expected to navigate safely even in tion. However, debris may accumulate in shallow water
water that is dirty or muddy. areas (it may not be debris, but aquatic plants sticking</p>
        <p>Note also that a paddleboard, as shown in Fig. 1, is a out of the shallow water). So, if we encounter debris
farvery thin object at the boundary between air and water, ther from shore, it is not a cause for concern as it is most
which is not comparable to the situations encountered in likely floating. However, if it is found near land features
autonomous driving (on the road), so it is unlikely that (e.g., trees, mud), then it usually means that the area is
additional (underwater) sensors will reliably detect it. In dangerous, shallow, and not navigable. To detect this
fact, some watercraft may be completely submerged at case, we might model the shallow, non-navigable area as
times, as can be seen in Fig. 7 which shows a fast-moving a composition of debris and other land-based features.
athlete in a kayak. As it can be seen in top right image in Fig. 8, it is
sometimes dificult to determine whether the situation is a
hazard or not. The labeling of such situations cannot
be done by (untrained) labelers, but must be defined by
experienced skippers working in cooperation with
computer vision engineers. These compositions and their
parameters must be defined by hand for a small number
of available cases. The HCDC approach [22] has shown
that this is indeed possible for common, well known food
items. In this case, it will be used to insert concentrated
expert knowledge into the compositional hazard detection
model.</p>
      </sec>
      <sec id="sec-4-2">
        <title>7.2. Other examples of invisible hazards</title>
        <p>Missing detections of boats and paddleboards are
immediately available in our waterborne datasets. However,
there are other scenarios where such an approach would
be useful, but for which there is currently insuficient
data to train the models. The main reason for this is that
these scenarios are to some extent hazardous to the USV
and represent edge cases in USV deployment. In Figure 8,
we present a common scenario that we have encountered
several times, but for which we currently do not have
enough data to properly test, let alone train. Plant
debris is common in continental waters and usually safe to
traverse. Often it covers the entire navigable area (e.g.,</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>8. Acknowledgments</title>
      <sec id="sec-5-1">
        <title>This work was financed by the Slovenian Research Agency (ARRS), research program [P2-0095], and research project [J2-2506].</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dallolio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. B.</given-names>
            <surname>Bjerck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Urke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Alfredsen</surname>
          </string-name>
          ,
          <article-title>A persistent sea-going platform for robotic fish telemetry using a wave-propelled usv: Technical solution and proof-of-concept</article-title>
          ,
          <source>Frontiers in Marine Science</source>
          <volume>9</volume>
          (
          <year>2022</year>
          ). URL: https://www.frontiersin.org/ articles/10.3389/fmars.
          <year>2022</year>
          .
          <volume>857623</volume>
          . doi:
          <volume>10</volume>
          .3389/ fmars.
          <year>2022</year>
          .
          <volume>857623</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. T.</given-names>
            <surname>Raber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Schill</surname>
          </string-name>
          ,
          <article-title>Reef rover: A low-cost small autonomous unmanned surface vehicle (usv) for mapping and monitoring coral reefs</article-title>
          ,
          <source>Drones</source>
          <volume>3</volume>
          (
          <year>2019</year>
          ). URL: https://www.mdpi.com/2504-446X/3/ 2/38. doi:
          <volume>10</volume>
          .3390/drones3020038.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bochkovskiy</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-Y. M. Liao</surname>
          </string-name>
          ,
          <article-title>Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors</article-title>
          ,
          <year>2022</year>
          . URL: https: //arxiv.org/abs/2207.02696. doi:
          <volume>10</volume>
          .48550/ARXIV. 2207.02696.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rachmawati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rajabally</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quek</surname>
          </string-name>
          ,
          <article-title>Video processing from electro-optical sensors for object detection and tracking in a maritime environment: a survey</article-title>
          ,
          <source>IEEE Transactions on Intelligent Transportation Systems</source>
          <volume>18</volume>
          (
          <year>2017</year>
          )
          <fpage>1993</fpage>
          -
          <lpage>2016</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bovcon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Muhovič</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Perš</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kristan</surname>
          </string-name>
          ,
          <article-title>The mastr1325 dataset for training deep usv obstacle detection models</article-title>
          ,
          <source>in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>3431</fpage>
          -
          <lpage>3438</lpage>
          .
          <article-title>detection and tracking with deep learning and ap-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bovcon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Muhovič</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vranac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mozetič</surname>
          </string-name>
          , J. Perš, pearance feature, in: 2019 5th
          <string-name>
            <given-names>International</given-names>
            <surname>ConferM. Kristan</surname>
          </string-name>
          ,
          <article-title>Mods-a usv-oriented object detec- ence on Control, Automation and Robotics (ICCAR), tion and obstacle segmentation benchmark</article-title>
          ,
          <source>IEEE IEEE</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>276</fpage>
          -
          <lpage>280</lpage>
          .
          <source>Transactions on Intelligent Transportation Systems</source>
          [19]
          <string-name>
            <given-names>S.</given-names>
            <surname>Moosbauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Konig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jakel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Teutsch</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>benchmark for deep learning based object detection</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Steccanella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bloisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Castellini</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Farinelli, in maritime environments, in: Proceedings of the Waterline and obstacle detection in images from IEEE Conference on Computer Vision and Pattern low-cost autonomous boats for environmental mon</article-title>
          - Recognition
          <string-name>
            <surname>Workshops</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>0</fpage>
          -
          <lpage>0</lpage>
          . itoring,
          <source>Robotics and Autonomous Systems</source>
          <volume>124</volume>
          [20]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Koo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jo</surname>
          </string-name>
          , H. Myung, (
          <year>2020</year>
          ) 103346.
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Vision-based real-time obstacle segmen-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Sinisterra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Dhanak</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Von Ellenrieder, tation algorithm for autonomous surface vehicle, Stereovision-based target tracking system for usv IEEE Access 7 (</article-title>
          <year>2019</year>
          )
          <fpage>179420</fpage>
          -
          <lpage>179428</lpage>
          . operations,
          <source>Ocean Engineering</source>
          <volume>133</volume>
          (
          <year>2017</year>
          )
          <fpage>197</fpage>
          -
          <lpage>214</lpage>
          . [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bovcon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kristan</surname>
          </string-name>
          ,
          <article-title>A water-obstacle separation</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , M. Jiang,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , Y. Liu,
          <article-title>Are we ready and refinement network for unmanned surface vefor unmanned surface vehicles in inland water- hicles</article-title>
          , in: 2020 IEEE International Conference on ways?
          <article-title>the usvinland multisensor dataset and bench- Robotics and Automation (ICRA)</article-title>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>mark</fpage>
          ,
          <source>IEEE Robotics and Automation Letters</source>
          <volume>6</volume>
          <fpage>9470</fpage>
          -
          <lpage>9476</lpage>
          . (
          <year>2021</year>
          )
          <fpage>3964</fpage>
          -
          <lpage>3970</lpage>
          . [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Koporec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Perš</surname>
          </string-name>
          ,
          <article-title>Human-centered deep compo-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Damas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ventura</surname>
          </string-name>
          ,
          <article-title>Real- sitional model for handling occlusions</article-title>
          ,
          <year>2022</year>
          .
          <article-title>2nd time vision based obstacle detection in maritime en- revision in Pattern Recognition. vironments</article-title>
          , in: 2022 IEEE International Conference [23]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bourdev</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Giron Autonomous Robot Systems and Competitions shick</article-title>
          , J. Hays,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ramanan</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. L. Zitnick,</surname>
          </string-name>
          (ICARSC), IEEE,
          <year>2022</year>
          , pp.
          <fpage>243</fpage>
          -
          <lpage>248</lpage>
          . P. Dollár,
          <article-title>Microsoft coco: Common objects in con-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kuwata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zarzhitsky</surname>
          </string-name>
          , T. L. Hunts- text,
          <year>2014</year>
          . URL: http://arxiv.org/abs/1405.0312,
          <string-name>
            <surname>cite</surname>
            <given-names>berger</given-names>
          </string-name>
          ,
          <article-title>Safe maritime autonomous navigation with arxiv:1405.0312Comment: 1) updated annotation colregs, using velocity obstacles</article-title>
          ,
          <source>IEEE Journal of pipeline description and figures; 2) added new secOceanic Engineering</source>
          <volume>39</volume>
          (
          <year>2013</year>
          )
          <fpage>110</fpage>
          -
          <lpage>119</lpage>
          .
          <article-title>tion describing datasets splits; 3) updated author</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , N.-s. Son, S. Y. Kim, list. Autonomous collision detection and avoidance for [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Peršić</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Marković</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Petrović</surname>
          </string-name>
          ,
          <article-title>Extrinsic 6dof aragon usv: Development and field tests, Journal calibration of a radar-lidar-camera system enof</article-title>
          <source>Field Robotics</source>
          <volume>37</volume>
          (
          <year>2020</year>
          )
          <fpage>987</fpage>
          -
          <lpage>1002</lpage>
          .
          <article-title>hanced by radar cross section estimates evaluation,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Anguelov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Erhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. E. Robotics</surname>
          </string-name>
          <source>and Autonomous Systems</source>
          <volume>114</volume>
          (
          <year>2019</year>
          )
          <fpage>217</fpage>
          -
          <string-name>
            <surname>Reed</surname>
            , C.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Berg</surname>
          </string-name>
          , Ssd:
          <article-title>Single shot multibox 230. detector</article-title>
          ., in: B.
          <string-name>
            <surname>Leibe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Matas</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Sebe</surname>
            , M. Welling [25]
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Schöller</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Schnettler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Krämmer</surname>
          </string-name>
          , G. Hinz, (Eds.),
          <source>ECCV (1)</source>
          , volume
          <volume>9905</volume>
          of Lecture Notes in M. Bakovic,
          <string-name>
            <given-names>M.</given-names>
            <surname>Güzet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Knoll</surname>
          </string-name>
          ,
          <source>Targetless rotaComputer Science</source>
          , Springer,
          <year>2016</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>37</lpage>
          .
          <article-title>tional auto-calibration of radar and camera for in-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Farhadi, telligent transportation systems, in: 2019 IEEE IntelYou only look once: Unified, real-time object de- ligent Transportation Systems Conference</article-title>
          (ITSC), tection,
          <year>2015</year>
          . URL: http://arxiv.org/abs/1506.02640, IEEE,
          <year>2019</year>
          , pp.
          <fpage>3934</fpage>
          -
          <lpage>3941</lpage>
          . cite arxiv:
          <volume>1506</volume>
          .
          <fpage>02640</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kristan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Kenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kovačič</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Perš</surname>
          </string-name>
          ,
          <article-title>Fast imagebased obstacle detection from unmanned surface vehicles</article-title>
          ,
          <source>IEEE transactions on cybernetics 46</source>
          (
          <year>2015</year>
          )
          <fpage>641</fpage>
          -
          <lpage>654</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Ow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. T.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>A vision-based obstacle detection system for unmanned surface vehicle</article-title>
          ,
          <source>in: Robotics, Automation and Mechatronics (RAM)</source>
          ,
          <source>2011 IEEE Conference on, IEEE</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>364</fpage>
          -
          <lpage>369</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bovcon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Perš</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kristan</surname>
          </string-name>
          , et al.,
          <article-title>Stereo obstacle detection for unmanned surface vehicles by imu-assisted semantic segmentation</article-title>
          ,
          <source>Robotics and Autonomous Systems</source>
          <volume>104</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          , Surface vehicle
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>