<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Building Context Cases: Adding Contextual Information to Objects in Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lawrence Gates</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indiana University - Luddy School</institution>
          ,
          <addr-line>700 N Woodlawn Ave, Bloomington, IN, 47408</addr-line>
          ,
          <country country="US">U.S.A</country>
        </aff>
      </contrib-group>
      <fpage>246</fpage>
      <lpage>250</lpage>
      <abstract>
        <p>Case-Based Reasoning's (CBR) ability to handle complex data is ideal for adding contextual information to images. This work explores the idea of using hierarchical case structures to build contextual information about a given image. Utilizing image tools that identify objects, this information can be stored concretely and abstractly in the case base. The CBR system would store the relationship of past contexts to solve future contextual problems. The primary objective of this research is to establish that captured cases from prior detection episodes can be used to provide contextual information to increase accuracy of object detection in images. Case-Based Reasoning (CBR) (e.g., [1]) is well known for its strength in enabling reasoning in domains for which rule-based analysis is dificult. This is achieved by identifying key problem features and using them to retrieve relevant prior cases to apply to the new situation. As object detection may be done at various levels of granularity, it is inherently hierarchical, motivating the use of Hierarchical CBR, which uses a case structure that divides the cases up into more manageable subparts and in an abstraction hierarchy [2]. Limited work has been done on CBR and image processing [3], but integrating CBR with deep learning for object detection has been shown to be beneficial [ 4]. The primary motivation for capturing the contextual information in an image through hierarchical case structures is to exploit combined information from multiple information sources, including object recognition systems using deep learning. This work will contribute to CBR by investigating the application in the machine vision area through structural case representation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Case-Based Reasoning</kwd>
        <kwd>Object Detection</kwd>
        <kwd>Image Segmentation</kwd>
        <kwd>Case Hierarchy</kwd>
        <kwd>Knowledge Engineering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Research Plan</title>
      <p>This work investigates the application of CBR to the area of machine vision. The primary objective of
this research is to establish that captured cases from prior detection episodes can be used to provide
contextual information to increase accuracy of object detection in images. In addition to its contributions
to object detection in machine vision, the research will contribute in (1) developing new methods for
hierarchical structural case representation, (2) developing knowledge-based similarity assessment
criteria for images, and (3) developing new methods for similarity assessment in hierarchical contexts
with image data.</p>
      <p>For this work, a detection episode can involve single or multiple image operations, such as providing
detection labels and pixel coordinates in the image. There are a variety of image operations that can be
done to determine the contents of an image. At this stage of the research, the current image operations
are object detection and image segmentation, with additional operations that are applicable in the future.
Object detection and image segmentation provide diferent data. For example, image segmentation can
calculate the area of a segment versus the area of a rectangular bounding box. Eventually, meta-data
from images (i.e. location, angle) will be incorporated as well to provide context.</p>
      <sec id="sec-2-1">
        <title>2.1. Research Objectives</title>
        <p>
          Given the main objective described previously, several sub-objectives exist:
1. Developing new methods for hierarchical structural case representation. Traditionally,
there is a single case structure in CBR holding all the features (c.f., [
          <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
          ]). Image data is rich with
features but this information cannot always be efectively captured in a flat structure or single
case. The context cases representing detection episodes will require hierarchical representation.
The reason we are using hierarchical representation for context in an image is to enable reasoning
about scene components at diferent level of granularity.
        </p>
        <p>
          A Hierarchical CBR system is built of abstract and concrete cases in a tree-like structure [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The
detection episodes will be stored in the concrete cases. However, the detection episodes will have
to be massaged as a single type of concrete case cannot efectively contain the features. There
might be a need to have multiple concrete cases as sub-cases, due to the available features. Cases
involving only image segmentation might have more information (i.e. area, extent) than object
detection, since object detection defines a bounding box, but not the area of the object in the
bounding box. Within the CBR approach, ontologies will be used to determine relationships
between objects.
2. Developing knowledge-based similarity assessment criteria for images. In a naturally
occurring situation, certain objects may or may not commonly appear next to other objects.
Object sizes can be used to disambiguate diferent objects in an image. This is the knowledge
we want to store and retrieve. The similarity assessment will not only rely on the features,
but real-life connections and encoded inferences. Knowledge about the scene will be intensive,
initially starting of with hand-crafted knowledge and possibly eventually moving up to mining a
large language model for relations or information about the applicability of the relations.
3. Developing new methods for similarity assessment in hierarchical contexts with image
data. As mentioned previously, hierarchical case representations contain both abstract and
concrete cases. Similarity for hierarchical CBR requires traversing the tree structure and finding
the similarity between the problem case and the stored cases. The levels of abstraction influence
the similarity measurement. With the idea of multiple types of concrete cases, the similarity
metrics will have to contain appropriate knowledge to work with a variety of concrete cases.
Besides extracting information from the image, external knowledge can be applied in general. Objects
themselves have specific properties that can be encoded. By encoding these into the system, they are
easily applied to every relevant case and used in similarity assessment. One property would be an
object’s function: a vehicle (i.e. truck, car, van) can transport another object. A truck is an easy example
to see transporting (or object is in truck), but for the case of a car, the detector might see the person
(object) inside of the bounding box of the car, if the point of view is getting a side view into the vehicle
window. These three objects are unique vehicles, but at a higher level of abstraction are all vehicles.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Approach</title>
        <p>
          Previous research on contextual detection utilizes more probability heavy ideas; Barnea and
BenShahar [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] utilize the probability of an object occurring based on the location of objects present or not
present. Their goal is to calculate a new confidence of each detection at a given location based on the
probability of a given object appearing next to another (i.e., a keyboard appear near a monitor). Perko
and Leonardis [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] probability methods add contextual information to an image by utilizing a pre-defined
radius around a source object and the distance to surrounding objects from the source object (spatial
object co-occurrence). The authors aimed to have the system be able represent visual context, and
combine this with object detection, utilizing context priors, feature maps, and local-appearance based
object detection.
        </p>
        <p>
          The focus of the research is the construction of the hierarchical case structure and incorporating it
with the knowledge of detection episodes and encoded knowledge. Since the focus of this research is on
the development of CBR methods to assist with combining information, the image processing tools it
uses are “of-the-shelf" pre-trained models 1(i.e., [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]). By using the of-the-shelf models, one can have
a system in place to easily substitute in state-of-the-art models or models trained on specific domains.
        </p>
        <p>
          In order to build the hierarchical cases, we have to understand what information is available from
imaging tools. Understanding the diference between object detection and image segmentation was
the first step. Without this information, fuller cases could not be built. These cases initially were built
of of models that perform object detection and image segmentation. The definition being used for
image segmentation refers to assigning a pixel to a class in the image leaving no pixel unlabelled [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
The diference between the object detection and image segmentation is that the bounding box does
not provide the exact shape of the contained object where as the pixel map contours can be used to
calculate various properties (i.e., area, extent). The models being used were trained on the COCO-MS
dataset by the model creators [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          In the COCO-MS dataset, object detection is used to identify “things" in an image whereas image
segmentation identifies “stuf" in the image. “Things" are identified as objects that have “a specific size
and shape", such as a cars or faces [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. “Stuf" is identified as material “defined by a homogeneous
or repetitive pattern ... but has no specific or distinctive spatial extent or shade" [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Examples of
“stuf" are grass, buildings, or trees, which is better defined by image segmentation compared to object
detection. “Things" and “stuf" are not mutually exclusive as they can work in tandem with each other.
They can be combined to provide essential context in the image. One such example would be vehicles
(‘things’) can be found on a road (‘stuf’).
        </p>
        <p>
          After collecting the information from the images, the hierarchical case structure needs to be
constructed. The hierarchical case structure will be developed from the bottom up, starting with the
concrete cases. Construction of the abstractions can be accomplished through an object oriented design
approach [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. This approach does not apply to the current state of research, as the research into planning
involved a robot completing a task, whereas the research here will focus on identification. As part of the
abstraction, external knowledge can be encoded. The encoding would capture relationships between
objects or reasons why objects would not be near each other in an image. One such example would be
a cell phone would not be near a car when the cell phone is the same size of a car in the image from a
drone’s view.
        </p>
        <p>With the case hierarchy and external knowledge, one will have the context needed that leverages
the relationship between object and knowledge-based vision systems to improve the detection. Based
on relevant cases in the case hierarchy, as well as on external knowledge, the approach adjusts the
certainty values for predicted objects. To evaluate this approach, we will perform an ablation study by
removing some or all in various aspects: detectors used, detected object relationships, features, and
knowledge. Our plan for this stage is to use a single object detection model and image segmentation
model, whereas later iterations of the research will add additional models to incorporate in the case
information.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Progress Summary</title>
      <p>
        The task domain for the project is to object identification in images collected from autonomous drones.
The images are taken from a drone looking down in the dataset VisDrone [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>As stated earlier, in order to build the cases, we needed to know what information we could get
from of-the-shelf models. In Figure 1 are two images that have bounding boxes for object detection
and image segmentation, respectively. These images show a given source detection and the 9 nearest</p>
      <sec id="sec-3-1">
        <title>1Models used were from HuggingFace.co</title>
        <p>detections. Image segmentation was converted from selected pixels to an overall bounding box in order
to maximize space and easily compare bounding boxes in object detection.</p>
        <p>(a) Image Segmentation
(b) Object Detection</p>
        <p>The case structure can now be assembled, as the image segmentation data is available. The object
detection model being used provides a location in the image, a label, and a confidence score. Image
segmentation provides a mask representing the location in the image as contours. This is converted
to bounding box(es), a label, and a confidence score. Before the contours are converted to bounding
box(es), the following information can be calculated: area, extent, aspect ratio. At the moment, extent
and aspect ratio do not have a specific use but they are available for future use.</p>
        <p>
          After successfully implementing the retrieval of information from object detection and image
segmentation models, implementation of the hierarchical case structure is the next step. With the information
from these models, concrete cases can be constructed. Once the idea of a concrete case is developed,
abstraction levels can be determined. The abstraction can be transferred into an object-oriented language
in the form of inheritance [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          In order to evaluate the efectiveness of the hierarchical case structure, multiple evaluation methods
will be applied, as well as conducting an ablation study [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For the ablation study, the components
that can be systematically removed would be features from the cases, level(s) of abstraction from the
hierarchy, relational connections, and detection episode information (i.e., only using image segment or
object detection). This should help show the strength of each piece in the overall structure. Additionally,
since the models are “of-the-shelf", the system can be compared with diferent models. By comparing
to diferent models, one would be able to see the system is not dependent on the model but uses it as a
tool. Ultimately, that could show that a specific domain can be substituted in outside of the drone view,
such as dash cam or animal identification in a forest.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>From the work completed in Section 3, the goal of integrating contextual information and assessing
proposed object detections and justify the assessments. The current project is at the stage of starting
the hierarchical case structure to be implemented. Once that has successfully been completed, the next
steps would be incorporating external knowledge. The external knowledge will be useful in finding
relationships between the cases and improving the similarity methods.</p>
      <p>
        Following the use of external knowledge, making the hierarchical case representation approach
explainable will be the next logical avenue. The structure of cases is ripe for explanation of detection
decisions. Cases have been found to be a useful form in presenting explanations, as explored in a
human-subjects study [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. With the context of what is happening in an image, the system’s explanation
would highlight why a detection is correct or incorrect. Visual Question Answering can benefit from
this approach, as it can provide relationships between items in an image[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The ability to explain
why a detection was identified as a given label will demonstrate the power of the contextual knowledge
in this approach.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Acknowledgements</title>
      <sec id="sec-5-1">
        <title>This work was funded by the US Department of Defense (Contract W52P1J2093009).</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>López de Mántaras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McSherry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Leake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Smyth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Craw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Faltings</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Maher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Forbus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aamodt</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Watson</surname>
          </string-name>
          , Retrieval, reuse, revision, and retention in CBR,
          <source>Knowledge Engineering Review</source>
          <volume>20</volume>
          (
          <year>2005</year>
          )
          <fpage>215</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smyth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <article-title>Hierarchical case-based reasoning integrating case-based and decompositional problem-solving techniques for plant-control software design</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>13</volume>
          (
          <year>2001</year>
          )
          <fpage>793</fpage>
          -
          <lpage>812</lpage>
          . doi:
          <volume>10</volume>
          .1109/69.956101.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Perner</surname>
          </string-name>
          ,
          <article-title>Why case-based reasoning is attractive for image interpretation</article-title>
          , in: D. W. Aha, I. Watson (Eds.),
          <source>Case-Based Reasoning Research and Development</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2001</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. W.</given-names>
            <surname>Floyd</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Gupta</surname>
          </string-name>
          , D. W. Aha,
          <article-title>Novel object discovery using case-based reasoning and convolutional neural networks</article-title>
          , in: M. T.
          <string-name>
            <surname>Cox</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Funk</surname>
          </string-name>
          , S. Begum (Eds.),
          <source>Case-Based Reasoning Research and Development</source>
          , Springer International Publishing, Cham,
          <year>2018</year>
          , pp.
          <fpage>399</fpage>
          -
          <lpage>414</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          , W. Wilke,
          <article-title>On the role of abstraction in case-based reasoning</article-title>
          , in: I.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Faltings (Eds.),
          <source>Advances in Case-Based Reasoning</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>1996</year>
          , pp.
          <fpage>28</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Barnea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ben-Shahar</surname>
          </string-name>
          ,
          <article-title>Contextual object detection with a few relevant neighbors</article-title>
          , in: C. V.
          <string-name>
            <surname>Jawahar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Mori</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          Schindler (Eds.),
          <source>Computer Vision - ACCV 2018</source>
          , Springer International Publishing, Cham,
          <year>2019</year>
          , pp.
          <fpage>480</fpage>
          -
          <lpage>495</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Perko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Leonardis</surname>
          </string-name>
          ,
          <article-title>A framework for visual-context-aware object detection in still images</article-title>
          ,
          <source>Computer Vision and Image Understanding</source>
          <volume>114</volume>
          (
          <year>2010</year>
          )
          <fpage>700</fpage>
          -
          <lpage>711</lpage>
          . doi:https://doi.org/10.1016/j. cviu.
          <year>2010</year>
          .
          <volume>03</volume>
          .005,
          <string-name>
            <surname>special</surname>
          </string-name>
          <article-title>Issue on Multi-Camera and Multi-Modal Sensor Fusion</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          , G. Synnaeve,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirillov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zagoruyko</surname>
          </string-name>
          ,
          <article-title>End-to-end object detection with transformers</article-title>
          , CoRR abs/
          <year>2005</year>
          .12872 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2005</year>
          .12872. arXiv:
          <year>2005</year>
          .12872.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Cheng</surname>
          </string-name>
          , A.
          <string-name>
            <surname>Schwing</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kirillov</surname>
          </string-name>
          ,
          <article-title>Per-pixel classification is not all you need for semantic segmentation</article-title>
          , in: M.
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beygelzimer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Dauphin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          <string-name>
            <surname>Vaughan</surname>
          </string-name>
          (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>34</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2021</year>
          , pp.
          <fpage>17864</fpage>
          -
          <lpage>17875</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hays</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ramanan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>C. L.</given-names>
          </string-name>
          <string-name>
            <surname>Zitnick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Microsoft</surname>
            <given-names>COCO</given-names>
          </string-name>
          :
          <article-title>Common objects in context</article-title>
          , in: D.
          <string-name>
            <surname>Fleet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Pajdla</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Schiele</surname>
          </string-name>
          , T. Tuytelaars (Eds.),
          <source>Computer Vision - ECCV 2014</source>
          , Springer International Publishing, Cham,
          <year>2014</year>
          , pp.
          <fpage>740</fpage>
          -
          <lpage>755</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Heitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koller</surname>
          </string-name>
          ,
          <article-title>Learning spatial context: Using stuf to find things</article-title>
          , in: D.
          <string-name>
            <surname>Forsyth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Torr</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Zisserman (Eds.),
          <source>Computer Vision - ECCV 2008</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2008</year>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <article-title>Detection and tracking meet drones challenge</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>44</volume>
          (
          <year>2021</year>
          )
          <fpage>7380</fpage>
          -
          <lpage>7399</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Howe</surname>
          </string-name>
          ,
          <article-title>How evaluation guides AI research: The message still counts more than the medium</article-title>
          ,
          <source>AI</source>
          Magazine
          <volume>9</volume>
          (
          <year>1988</year>
          )
          <article-title>35</article-title>
          . URL: https://ojs.aaai.org/aimagazine/index.php/aimagazine/ article/view/952. doi:
          <volume>10</volume>
          .1609/aimag.v9i4.
          <fpage>952</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Leake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wilkerson</surname>
          </string-name>
          ,
          <article-title>Cases are king: A user study of case presentation to explain CBR decisions</article-title>
          ,
          <source>in: Case-Based Reasoning Research and Development: 31st International Conference Proceedings</source>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2023</year>
          , p.
          <fpage>153</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Caro-Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wijekoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Diaz-Agudo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Recio-Garcia</surname>
          </string-name>
          ,
          <article-title>The current and future role of visual question answering in eXplainable artificial intelligence</article-title>
          .,
          <source>CEUR Workshop Proceedings</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>172</fpage>
          -
          <lpage>183</lpage>
          . URL: https://rgu-repository.worktribe.com/output/2048580.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>