<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Rule-based Reasoning for Semantic Image Segmentation and Interpretation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petr Berka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thanos Athanasiadis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yannis Avrithis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>P. Berka is with the Dept. of Information and Knowledge Engineering, University of Economics, Prague. Th. Athanasiadis and Y. Avrithis are with Image, Video and Multimedia Systems Laboratory, National Technical University of Athens</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>- In this paper, we propose the application of rulebased reasoning for knowledge assisted image segmentation and object detection. A region merging approach is proposed based on fuzzy labeling and not on visual descriptors, while reasoning is used in evaluation of dissimilarity between adjacent regions according to rules applied on local information.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Index Terms— Knowledge-assisted analysis, rule-based
reasoning, semantic region merging.</p>
    </sec>
    <sec id="sec-2">
      <title>I. INTRODUCTION</title>
      <p>A ing task in computer vision and one of the most crucial</p>
      <p>
        UTOMATIC segmentation of images is a very
challengsteps toward image understanding. Although a great effort
has been consumed in designing generic, robust and efficient
segmentation algorithms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], still human vision perception
outperforms any state-of-the-art computer algorithm. One main
reason for this is that human vision is based also in high level
prior knowledge about the semantic meaning of the objects
that compose the image.
      </p>
      <p>Knowledge assisted analysis can be defined as a tightly
coupled and constant interaction between low level image
analysis and higher level knowledge. In this paper we propose
an algorithm that involves simultaneously both segmentation
and detection of simple objects. Starting from traditional
graph-based segmentation, the proposed technique continues
with fuzzy region labeling and semantic region merging. More
specifically the latter is based on rule-based reasoning, using
knowledge about the possible labels of the candidate for
merging regions and of their direct neighbors, to improve the
initial segmentation and labeling.</p>
      <p>
        To perform rule-based reasoning, we use the expert system
NEST [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], developed at the University of Economics, Prague.
This system follows the idea of compositional approach to
inference introduced in mid 70s by the early expert systems
MYCIN and PROSPECTOR. The rules used by NEST are in
the form:
      </p>
      <p>
        IF condition THEN conclusion (w)
where condition is a conjunction of propositions, conclusion
is a single proposition and weight w expresses the uncertainty
of the conclusion if the condition is true. During consultation,
all rules for which the condition is true with some positive
degree (weight) are activated and their contributions are used
to compute the weights of goals.
II. INITIAL SEGMENTATION AND REGION LABELING
Our intention is to operate on a higher level of information
where image regions are linked to possible labels rather than
only to their visual features. For the representation of an image
we adopt the Attributed Relational Graph (ARG) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. An ARG
is a graph structure that holds a region-based representation of
the image, where the set of vertices V corresponds to image
regions and the set of edges E to links between adjacent
regions. The ARG is constructed based on an initial color
RSST segmentation that produces a few tens of regions.
Each vertex of the graph holds the Dominant Color, Region
Shape and Homogeneous Texture MPEG-7 visual descriptors
extracted for this specific region.
      </p>
      <p>
        Based on these features and the adjacency information
provided by the ARG, the target is to assign to each region a a
fuzzy set of labels La. In order to achieve this (for more details
please refer to previous work in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), we compute a matching
distance value between each one of these regions and each
one of the prototype instances of all concepts in the domain
ontology. This process results to an initial fuzzy labeling of
the regions with concepts from the knowledge base, i.e. for
region a we have the fuzzy set La = k ck/wk, where k
is the cardinality of the (crisp) set of all concepts C = {ck}
in the knowledge base and wk = μa(ck) is the degree of
membership of element ck in the fuzzy set La.
      </p>
    </sec>
    <sec id="sec-3">
      <title>III. RULE-BASED IMAGE ANALYSIS</title>
      <p>This section presents the integration of a rule based
reasoning system into an image segmentation algorithm. Based
on the foundations described in the previous section, we
introduce a novel segmentation algorithm that relies on fuzzy
region labeling and rules to solve the problem of image
oversegmentation.</p>
      <sec id="sec-3-1">
        <title>A. Semantic region merging</title>
        <p>Recursive Shortest Spanning Tree, or simply RSST, is a
bottom-up segmentation algorithm that begins from the pixel
level and iteratively merges neighbor regions according to a
distance value until certain termination criteria are satisfied.
This distance is calculated based on color and texture
characteristics, which are independent of the area’s size. In every
step the two regions with the least distance are merged; visual
characteristics of the new region are extracted and all distances
are updated accordingly.</p>
        <p>
          We introduce here a modified version of RSST, called
Semantic RSST (S-RSST) that aims to improve the usual
oversegmentation results by incorporating region labeling in
the segmentation process [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In this approach the distance
between two adjacent regions a and b (vertices va and vb in
the graph) is calculated using NEST, in a fashion described
later on, and this dissimilarity value is assigned as the weight
of the respective graph’s edge eab.
        </p>
        <p>Let us now examine in detail one iteration of the
SRSST algorithm. Firstly, the edge eab with the least weight
is selected, then regions a and b are merged. Vertex vb is
removed completely from the ARG, whereas va is updated
appropriately. This update procedure consists of the following
two actions:
1) Re-evaluation of the degrees of membership of the labels
fuzzy set in a weighted average (w.r.t. the regions’ size)
fashion.
2) Re-adjustment of the ARG edges by removing edge
eab and re-evaluating the weight of the affected edges
invoking NEST.</p>
        <p>This procedure continues until the edge e∗ with the least
weight in the ARG is bigger than a threshold: w(e∗) &gt; Tw.
This threshold is calculated in the beginning of the algorithm,
based on the histogram of all weights in E.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Estimation of regions dissimilarity</title>
        <p>Expert system NEST serves in knowledge-assisted analysis
as an estimator for the dissimilarity of two adjacent regions
a and b in the image. The input consists of the fuzzy sets
La and Lb of the two regions, with membership functions μA
and μB respectively as well as the fuzzy set of their direct
neighbors. More specifically, we consider as direct neighbors
only the four dominant directional neighbors of regions a and
b (north, south, west, east). With this input, NEST is able to
reason about the (dis)similarity between two adjacent regions,
according to rules applied on local information of what each
region might represent given its neighborhood. The rule base
is structured into two layers. The first layer determines the
dominant label(s) of the regions according to the initial
labeling of regions and their neighbors; the confidence of the initial
labeling (expressed in these rules as ”high”, ”medium”, ”low”)
is derived from the initial fuzzy labels. This layer contains 60
rules; a rule for each combination of label, of its confidence, of
the region and its neighbor. The next layer increases/decreases
the similarity of regions a and b if they share/don’t share
the same dominant label. This layer contains 48 rules; a
rule for each combination of possible dominant labels of the
two regions. All rules have been created empirically, by a
knowledge engineer. Example rules for the first and second
layer look like follows:</p>
        <p>IF A conf sky(medium) AND A conf sea(medium) AND
A conf sand(low) AND leftofA conf sky(high) AND
rightofA conf sky(high) THEN A dominant(sky) (0.8)
IF A dominant(sky) AND B dominant(sky,sea)</p>
        <p>THEN A B similar (0.7)</p>
        <p>Inferred weight of the (dis)similarity proposition is used
to drive segmentation based on semantic information and to
resolve oversegmentation problems.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. FIRST EXPERIMENTS</title>
      <p>We will illustrate the rule-based reasoning step on a
following simple example. Let 0, 2, 3, 7, 8 be five regions of
an exemplar image of a beach scene, where 0, 2, 3, 8, are
adjacent to 7. Table I illustrates the associated fuzzy set of each
region, i.e. the degrees of membership of each concept. Correct
concepts for each region, according to the ground truth, are
highlighted with bold letters.</p>
      <p>From this input, the rule base infers as dominant labels Sea
and Person for 0, Sea and Sky for 2, Person and Sky for 3, Sea
and Sky for 7, and Sea for 8. This will result in the following
ordering of region pairs according to their similarity: 7-2 (most
similar), 7-8, 7-0 (less similar), 7-3 (most dissimilar); 47 rules
have been activated during this inference. Thus the regions 7
and 2 will be merged in this iteration of the S-RSST algorithm.</p>
    </sec>
    <sec id="sec-5">
      <title>V. CONCLUSIONS</title>
      <p>The methodology presented in this paper aimed in
improving image segmentation based on initial labeling of regions
and not only on visual features. This hybrid algorithm involves
ruled-based reasoning in a region merging process, by means
of calculating the semantic distance between two adjacent
regions based on local knowledge. First experiments have
shown the feasibility of our approach, nevertheless the current
rule base must be thoroughly tested. In our future work, we
will incorporate also some domain knowledge (e.g. sea cannot
be above sky) to improve the whole process.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENT</title>
      <p>This research was supported by the European Commission
under contract FP6-027026 K-SPACE. Thanos Athanasiadis is
funded by the Greek Secretariat of Research and Technology
(PENED Ontomedia 03 ED 475).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Salembier</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Marques</surname>
          </string-name>
          , “
          <article-title>Region-based representations of image and video - segmentation tools for multimedia services</article-title>
          ,
          <source>” IEEE Trans. on Circuits and Systems for Video Technology</source>
          , vol.
          <volume>9</volume>
          , no.
          <issue>8</issue>
          ,
          <string-name>
            <surname>December</surname>
          </string-name>
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Berka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Las</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Svatek</surname>
          </string-name>
          , “
          <article-title>Nest: re-engineering the compositional approach to rule-based inference</article-title>
          ,
          <source>” Neural Network World</source>
          , pp.
          <fpage>367</fpage>
          -
          <lpage>379</lpage>
          , May
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Berretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Bimbo</surname>
          </string-name>
          , and E. Vicario, “
          <article-title>Efficient matching and indexing of graph models in content-based retrieval</article-title>
          ,
          <source>” IEEE Trans. on Circuits and Systems for Video Technology</source>
          , vol.
          <volume>11</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>1089</fpage>
          -
          <lpage>1105</lpage>
          , Dec.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Th. Athanasiadis</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tzouvaras</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Petridis</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Precioso</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Avrithis</surname>
            , and
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kompatsiaris</surname>
          </string-name>
          , “
          <article-title>Using a multimedia ontology infrastructure for semantic annotation of multimedia content</article-title>
          ,
          <source>” in Proceedgins of 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot '05)</source>
          , Galway, Ireland,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Th. Athanasiadis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Avrithis</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Kollias</surname>
          </string-name>
          , “
          <article-title>A semantic region growing approach in image segmentation and annotation</article-title>
          ,” in
          <source>Proceedgins of 1st International Workshop on Semantic Web Annotations for Multimedia (SWAMM '06)</source>
          , Edinburgh, Scotland,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>