<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Addressing Generalization Failure in Deep Detection Models for Fishing Trawler Video Analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Birk Torpmann-Hagen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pål Halvorsen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Riegler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dag Johansen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SimulaMet</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>UiT The Arctic University of Norway</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Oslo</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>3</lpage>
      <abstract>
        <p>A problem with supervised machine learning algorithms is how to precisely predict outcome values for previously unseen data. In this paper, we evaluate conventional detection models trained on the Njord commercial fishing video surveillance dataset in this generalization context. Our results show that these models fail to generalize to a test-set consisting of samples with challenging lighting- and weather-conditions. To address this, a novel distributional-shift detector is introduced, exhibiting good performance and outperforming competing methods by a considerable margin. 1</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Out of Distribution (OOD) Generalizability is an often overlooked problem in deep learning [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ].
Conventional methodologies assert that cross-validation or hold-out set evaluation is a suitable
method of approximating a model’s performance in deployment conditions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but multiple
case-studies have shown that this is far from true. Deep Neural Networks (DNNs) tend to exhibit
significant performance drops when deployed on data that is OOD from the training data, even
in a manner imperceptible to a human observer [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref5 ref6">1, 2, 5, 6, 3</xref>
        ]. This is known as generalization
failure. It has for instance been shown that medical imaging systems fail to generalize to samples
taken from diferent centers, demographics, or imaging equipment [
        <xref ref-type="bibr" rid="ref1 ref2 ref7">1, 2, 7</xref>
        ]. Similar behaviour
has been demonstrated in Natural Language Processing models [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and in context of large,
general-purpose datasets such as ImageNet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        This is further complicated by the lack of transparency intrinsic to DNNs. Their predictive
process is obscure, their confidence scores are often misleading, and generalization failure typically
manifests in ways that are dificult to detect by inspection of the input data alone [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. As a result,
conventionally implemented DNNs lack the robustness and trustworthiness typically required in
particularly sensitive or critical domains, therein as a component of an automatic anonymization
pipeline or video analytics aboard fishing-trawlers as considered in this competition [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        This quest for insight paper seeks to explore the generalizability of the deep learning pipeline
in this domain. In particular, it considers the following questions:
1. To what extent does generalization failure play a role for the Njord dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]?
2. Assuming generalization failure occurs, can the distributional shifts that induce it be
successfully detected as it arises without requiring labels for evaluation?
We observe that generalization failure does indeed occur in this domain, with all tested
models exhibiting significant performance degradation when evaluated on OOD data. To
address this, we introduce Nearest-Neighbour Distributional Shift Detection (NNDSD), a
novel distributional-shift detector, capable of estimating when a distributional-shift and thus
generalization failure is likely to occur during employment.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Shen et al. survey the space of generalizable methods, including domain-adaptation and novel
learning schemes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and conclude that there are still a number of unsolved challanges involved
therein. Ye et al. demonstrate that a majority of said methods perform worse than conventional
deep learning outside of specific conditions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Rabanser, Gunnemann, and Lipton perform
an empirical study of methods of detecting dataset shift [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], in particular methods involving
performing statistical tests on various encodings of the data such as VAEs, PCA, and
classifierbased encoding. Huang, Geng and Li investigate the use of gradients towards distributional shift
detection for classification, with state-of-the-art results [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Liang, Li, and Srikant implement
a shift-detector exploiting the discrepancy in softmax scores between In-Distribution (InD) and
OOD data after gradient-based perturbation [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. None of these works consider
distributionalshift detection in the context of a practical dataset, however, nor on the object-detection task.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Nearest Neighbour Distributional Shift Detection</title>
      <p>
        Nearest-Neighbour Distributional Shift Detection (NNDSD) is based on the work by Rabanser,
Gunnemann, and Lipton [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where distributional shifts are identified by performing statistical
tests on various forms of encodings of the data. In our application of NNDSD, we use one of
the best reported performing configurations, consisting of a trained Variational Autoencoder
(VAE) with Kolmogorov-Smirnof testing adjusted with Bonferroni correction.
      </p>
      <p>We observe that this approach is not able to distinguish between InD and OOD images. This
can be understood by visually inspecting the encoding space, through, for instance, PCA as
shown in Figure 1. Though it is clear that the validation set, for instance, is largely contained
within the bounds of the training distribution, it is evidently sampled from a limited region on
the distribution. Due to the high similarity between subsequent frames of a video feed used to
obtain data samples, sampling bias is unavoidable. This form of sampling bias does not typically
induce generalization failure.</p>
      <p>To address this, two steps are added to the procedure. Firstly (i), the encodings are transformed
to a two-dimensional space through PCA. Beyond facilitating visualization and decreasing
computing costs, this also ensures there is suficient variability along each dimension to perform
viable statistical tests. Secondly (ii), the tests are performed with a stricter null-hypothesis.
Whereas regular testing asserts that both populations are identically distributed, NNDSD instead
asserts that the data to test is distributed identically to any region within the training distribution.
In more practical terms, this involves testing the transformed deployment encodings against
their nearest neighbours in the bank of transformed training encodings. In practice, this should
potentially eliminate the efect of sampling bias on the tests.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>Two experiments were performed: a simple evaluation of the generalizability of the YoloV5
pipeline across model sizes, and evaluation of NNDSD on three diferent folds of the Njord
−3 −2 −1 0 1 2 3
(b) OOD
dataset. We demonstrate that the models fail to generalize when deployed on data with
lightingand weather-conditions not present in the training set, but that this failure can be successfully
detected with NNDSD given suficient sample sizes.</p>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>To determine the generalizability of a given model, it is necessary to evaluate it on OOD samples.
Whereas other domains may have several independently curated datasets to facilitate such
evaluation, the Njord dataset is the only publicly known one of its kind. As a result, it is necessary
to partition the dataset such that the test-set can be considered OOD to the training and validation
set. To this end, the dataset was inspected for samples that had particularly foggy, dark, or
otherwise challenging lighting and weather conditions. These samples were then reserved for
the test-set - referred to as the OOD-set - with the remaining data being split into training and
validation sets. The original test set as provided by the task organizers was also included in the
evaluation. Training samples were extracted from the videos at a rate of one image per 25 frames.</p>
        <p>
          For the generalizability experiment, ten models were trained for each size - extra small, small,
medium, and large - on the InD partition of the dataset, using the default hyperparameters as
provided by the Ultralytics YoloV5 library [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. These hyperparameters can be found on the
GitHub.
        </p>
        <p>
          To evaluate the eficacy of the distributional-shift detector, we opted to follow the methodology
described in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The VAE was trained with no augmentations on the training set, and the
resulting shift detector was evaluated at a range of diferent sample sizes on the OOD-set, the
test-set, and the InD validation set. The detection accuracy was estimated by randomly selecting
samples of the given size from the test-set 1000 times, with the deployment data being considered
OOD if testing yielded p&lt;0.05 after Bonferroni correction, and InD otherwise.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baseline Generalizability</title>
        <p>Extra Small
Small
Medium
Large
model</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Distributional Shift Detection</title>
        <p>.</p>
        <p>Table 1
Shift detection accuracy per fold for NNDSD
Samples
IND-val
OOD
Test
Total acc.</p>
        <p>100</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Concluding Remarks</title>
      <p>The results from Section 4.2 confirm that generalization failure is indeed a factor in the fishing
trawler surveillance video domain. Each of the tested models exhibited considerable performance
degradation when evaluated on the OOD partition. This reafirms the findings elsewhere in
the literature that generalization failure is ubiquitous in deep learning, and highlights the need
for research towards increasing the generalizability of DNNs. NNDSD demonstrates significant
performance potential and may have considerable utility in practical deployment.</p>
      <p>This utility is hampered somewhat by the fairly large sample-size requirements. In practical
scenarios, distributional shifts may not last long enough for a suficient amount of samples to
be collected. Further work is required towards reducing the sample size requirement to more
reasonable levels.</p>
      <p>Another limitation of this work is the method by which the shift-detectors were evaluated.
Throughout this paper, the OOD-detection problem was treated as a classification task. In
reality, it is more likely to be a regression-type problem - i.e. that there are degrees of severity
for distributional-shift, as evidenced by the diference in performance on the OOD set and the
test set. For suficient proof of performance, NNDSD needs to be evaluated on multiple test-sets
with varying degrees of distributional shift. A more sophisticated system could also leverage
the p-values directly to estimate the severity of the shift by analyzing the correlation between
the p-values and the performance drops of the system prior to deployment.</p>
      <p>Overall, our work confirms the conjecture that special consideration for generalizability
needs to be made when designing deep learning systems, in particular in domains characterized
by high degrees of complexity, dynamicity and potential for distributional shifts, such as
onboard fishing trawlers. Though there do not currently exist any methods capable of endowing
DNN with suitable levels of generalizability for practical deployment in performance-critical
domains, detecting generalization failure through NNDSD or similar methods may be a suficient
workaround and endow deep learning systems with an increased degree of trustworthiness and
transparency and thus significantly reduce the many risks associated with generalization failure.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Geirhos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-H.</given-names>
            <surname>Jacobsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Michaelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Brendel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bethge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Wichmann</surname>
          </string-name>
          ,
          <article-title>Shortcut learning in deep neural networks</article-title>
          ,
          <source>Nature Machine Intelligence</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>665</fpage>
          -
          <lpage>673</lpage>
          . URL: http://dx.doi.org/10.1038/s42256-020-00257-z. doi:
          <volume>10</volume>
          .1038/s42256-020-00257-z.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>A. D'Amour</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Heller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Moldovan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Adlam</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Alipanahi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beutel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Deaton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Eisenstein</surname>
            ,
            <given-names>M. D.</given-names>
          </string-name>
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Hormozdiari</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Houlsby</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Jerfel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Karthikesalingam</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lucic</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>McLean</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mincu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mitani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Montanari</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Nado</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Natarajan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Nielson</surname>
            ,
            <given-names>T. F.</given-names>
          </string-name>
          <string-name>
            <surname>Osborne</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Raman</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Ramasamy</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Sayres</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Schrouf</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Seneviratne</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sequeira</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Suresh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Veitch</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vladymyrov</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Webster</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yadlowsky</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Yun</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Sculley</surname>
          </string-name>
          ,
          <article-title>Underspecification presents challenges for credibility in modern machine learning</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2011</year>
          .03395. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2011</year>
          .
          <volume>03395</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , R. Xu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <article-title>Towards out-of-distribution generalization: A survey</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2108.13624. doi:
          <volume>10</volume>
          .48550/ARXIV.2108.13624.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          , Deep Learning, MIT Press,
          <year>2016</year>
          . http://www.deeplearningbook. org.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hendrycks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dietterich</surname>
          </string-name>
          ,
          <article-title>Benchmarking neural network robustness to common corruptions</article-title>
          and perturbations,
          <year>2019</year>
          . arXiv:
          <year>1903</year>
          .12261.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Ood-bench: Quantifying and understanding two dimensions of out-of-distribution generalization</article-title>
          ,
          <year>2021</year>
          . URL: http://arxiv-export-lb.library.cornell.edu/abs/2106.03721. arXiv:
          <volume>2106</volume>
          .
          <fpage>03721</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ghatwary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Isik-Polat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Polat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galdran</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. G. Ballester</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Poudel</surname>
            ,
            <given-names>S.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Gan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Yeo</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>N. K.</given-names>
          </string-name>
          <string-name>
            <surname>Tomar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Haithmi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Daul</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rittscher</surname>
            ,
            <given-names>O. E.</given-names>
          </string-name>
          <string-name>
            <surname>Salem</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lamarque</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cannizzaro</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Realdon</surname>
            , T. de Lange,
            <given-names>J. E.</given-names>
          </string-name>
          <string-name>
            <surname>East</surname>
          </string-name>
          ,
          <article-title>Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision</article-title>
          challenge,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2202.12031. doi:
          <volume>10</volume>
          .48550/ARXIV.2202.12031.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jacovi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marasović</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Formalizing trust in artificial intelligence: Prerequisites, causes and goals of human trust in ai, 2020</article-title>
          . URL: https://arxiv.org/abs/
          <year>2010</year>
          .07487. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>2010</year>
          .
          <volume>07487</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.-A. S.</given-names>
            <surname>Nordmo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Ovesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Johansen</surname>
          </string-name>
          ,
          <article-title>Njordvid: A fishing trawler video analytics task</article-title>
          , in: MediaEval'22:,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T.-A. S. Nordmo</surname>
            ,
            <given-names>A. B.</given-names>
          </string-name>
          <string-name>
            <surname>Ovesen</surname>
            ,
            <given-names>B. A.</given-names>
          </string-name>
          <string-name>
            <surname>Juliussen</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>H. D.</given-names>
          </string-name>
          <string-name>
            <surname>Johansen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Johansen</surname>
          </string-name>
          ,
          <article-title>Njord: A fishing trawler dataset</article-title>
          ,
          <source>in: Proceedings of the 13th ACM Multimedia Systems Conference, MMSys '22</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>197</fpage>
          -
          <lpage>202</lpage>
          . URL: https://doi.org/10.1145/3524273.3532886. doi:
          <volume>10</volume>
          .1145/3524273.3532886.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rabanser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Günnemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. C.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <article-title>Failing loudly: An empirical study of methods for detecting dataset shift</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1810</year>
          .11953. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1810</year>
          .
          <volume>11953</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>On the importance of gradients for detecting distributional shifts in the wild</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2110.00218. doi:
          <volume>10</volume>
          .48550/ARXIV.2110.00218.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Srikant</surname>
          </string-name>
          ,
          <article-title>Enhancing the reliability of out-of-distribution image detection in neural networks</article-title>
          ,
          <year>2017</year>
          . URL: https://arxiv.org/abs/1706.02690. doi:
          <volume>10</volume>
          .48550/ARXIV.1706.02690.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stoken</surname>
          </string-name>
          , J. Borovec, NanoCode012, ChristopherSTAN, L. Changyu, Laughing, tkianai, A. Hogan, lorenzomammana, yxNONG, AlexWang1900,
          <string-name>
            <given-names>L.</given-names>
            <surname>Diaconu</surname>
          </string-name>
          , Marc,
          <year>wanghaoyang0106</year>
          ,
          <fpage>ml5ah</fpage>
          ,
          <string-name>
            <surname>Doug</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Ingham</surname>
            , Frederik, Guilhen, Hatovix,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Poznanski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            , changyu98,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Akhtar</surname>
          </string-name>
          , PetrDvoracek, P. Rai, ultralytics/yolov5:
          <fpage>v3</fpage>
          .
          <fpage>1</fpage>
          -
          <string-name>
            <given-names>Bug</given-names>
            <surname>Fixes</surname>
          </string-name>
          and
          <string-name>
            <given-names>Performance</given-names>
            <surname>Improvements</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.5281/zenodo.4154370. doi:
          <volume>10</volume>
          .5281/zenodo.4154370.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>