Rule-based Reasoning for Semantic Image Segmentation and Interpretation Petr Berka, Thanos Athanasiadis and Yannis Avrithis Abstract— In this paper, we propose the application of rule- II. I NITIAL S EGMENTATION AND R EGION L ABELING based reasoning for knowledge assisted image segmentation and object detection. A region merging approach is proposed based Our intention is to operate on a higher level of information on fuzzy labeling and not on visual descriptors, while reasoning where image regions are linked to possible labels rather than is used in evaluation of dissimilarity between adjacent regions only to their visual features. For the representation of an image according to rules applied on local information. we adopt the Attributed Relational Graph (ARG) [3]. An ARG Index Terms— Knowledge-assisted analysis, rule-based reason- is a graph structure that holds a region-based representation of ing, semantic region merging. the image, where the set of vertices V corresponds to image regions and the set of edges E to links between adjacent regions. The ARG is constructed based on an initial color I. I NTRODUCTION RSST segmentation that produces a few tens of regions. Each vertex of the graph holds the Dominant Color, Region A UTOMATIC segmentation of images is a very challeng- ing task in computer vision and one of the most crucial steps toward image understanding. Although a great effort Shape and Homogeneous Texture MPEG-7 visual descriptors extracted for this specific region. has been consumed in designing generic, robust and efficient Based on these features and the adjacency information segmentation algorithms [1], still human vision perception out- provided by the ARG, the target is to assign to each region a a performs any state-of-the-art computer algorithm. One main fuzzy set of labels La . In order to achieve this (for more details reason for this is that human vision is based also in high level please refer to previous work in [4]), we compute a matching prior knowledge about the semantic meaning of the objects distance value between each one of these regions and each that compose the image. one of the prototype instances of all concepts in the domain ontology. This process results to an initial fuzzy labeling of Knowledge assisted analysis can be defined as a tightly the regions with concepts from the knowledge base, i.e. for coupled and constant interaction between low level image region a we have the fuzzy set La = k ck /wk , where k analysis and higher level knowledge. In this paper we propose is the cardinality of the (crisp) set of all concepts C = {ck } an algorithm that involves simultaneously both segmentation in the knowledge base and wk = μa (ck ) is the degree of and detection of simple objects. Starting from traditional membership of element ck in the fuzzy set La . graph-based segmentation, the proposed technique continues with fuzzy region labeling and semantic region merging. More III. RULE -BASED I MAGE A NALYSIS specifically the latter is based on rule-based reasoning, using knowledge about the possible labels of the candidate for This section presents the integration of a rule based rea- merging regions and of their direct neighbors, to improve the soning system into an image segmentation algorithm. Based initial segmentation and labeling. on the foundations described in the previous section, we To perform rule-based reasoning, we use the expert system introduce a novel segmentation algorithm that relies on fuzzy NEST [2], developed at the University of Economics, Prague. region labeling and rules to solve the problem of image This system follows the idea of compositional approach to oversegmentation. inference introduced in mid 70s by the early expert systems MYCIN and PROSPECTOR. The rules used by NEST are in A. Semantic region merging the form: Recursive Shortest Spanning Tree, or simply RSST, is a IF condition THEN conclusion (w) bottom-up segmentation algorithm that begins from the pixel level and iteratively merges neighbor regions according to a where condition is a conjunction of propositions, conclusion distance value until certain termination criteria are satisfied. is a single proposition and weight w expresses the uncertainty This distance is calculated based on color and texture char- of the conclusion if the condition is true. During consultation, acteristics, which are independent of the area’s size. In every all rules for which the condition is true with some positive step the two regions with the least distance are merged; visual degree (weight) are activated and their contributions are used characteristics of the new region are extracted and all distances to compute the weights of goals. are updated accordingly. We introduce here a modified version of RSST, called P. Berka is with the Dept. of Information and Knowledge Engineering, Semantic RSST (S-RSST) that aims to improve the usual University of Economics, Prague. Th. Athanasiadis and Y. Avrithis are with Image, Video and Multimedia oversegmentation results by incorporating region labeling in Systems Laboratory, National Technical University of Athens. the segmentation process [5]. In this approach the distance between two adjacent regions a and b (vertices va and vb in an exemplar image of a beach scene, where 0, 2, 3, 8, are the graph) is calculated using NEST, in a fashion described adjacent to 7. Table I illustrates the associated fuzzy set of each later on, and this dissimilarity value is assigned as the weight region, i.e. the degrees of membership of each concept. Correct of the respective graph’s edge eab . concepts for each region, according to the ground truth, are Let us now examine in detail one iteration of the S- highlighted with bold letters. RSST algorithm. Firstly, the edge eab with the least weight TABLE I is selected, then regions a and b are merged. Vertex vb is I NITIAL FUZZY LABELING removed completely from the ARG, whereas va is updated appropriately. This update procedure consists of the following Concepts two actions: Regions Sky Sea Cliff Plant Sand Person 1) Re-evaluation of the degrees of membership of the labels 0 0.66 0.82 0.47 0.25 0.58 0.75 fuzzy set in a weighted average (w.r.t. the regions’ size) 2 0.74 0.77 0.44 0.34 0.69 0.62 fashion. 3 0.70 0.52 0.34 0.40 0.61 0.73 2) Re-adjustment of the ARG edges by removing edge 7 0.78 0.79 0.38 0.35 0.55 0.39 8 0.66 0.79 0.64 0.27 0.59 0.37 eab and re-evaluating the weight of the affected edges invoking NEST. This procedure continues until the edge e∗ with the least From this input, the rule base infers as dominant labels Sea weight in the ARG is bigger than a threshold: w(e∗ ) > Tw . and Person for 0, Sea and Sky for 2, Person and Sky for 3, Sea This threshold is calculated in the beginning of the algorithm, and Sky for 7, and Sea for 8. This will result in the following based on the histogram of all weights in E. ordering of region pairs according to their similarity: 7-2 (most similar), 7-8, 7-0 (less similar), 7-3 (most dissimilar); 47 rules B. Estimation of regions dissimilarity have been activated during this inference. Thus the regions 7 Expert system NEST serves in knowledge-assisted analysis and 2 will be merged in this iteration of the S-RSST algorithm. as an estimator for the dissimilarity of two adjacent regions a and b in the image. The input consists of the fuzzy sets V. C ONCLUSIONS La and Lb of the two regions, with membership functions μA The methodology presented in this paper aimed in improv- and μB respectively as well as the fuzzy set of their direct ing image segmentation based on initial labeling of regions neighbors. More specifically, we consider as direct neighbors and not only on visual features. This hybrid algorithm involves only the four dominant directional neighbors of regions a and ruled-based reasoning in a region merging process, by means b (north, south, west, east). With this input, NEST is able to of calculating the semantic distance between two adjacent reason about the (dis)similarity between two adjacent regions, regions based on local knowledge. First experiments have according to rules applied on local information of what each shown the feasibility of our approach, nevertheless the current region might represent given its neighborhood. The rule base rule base must be thoroughly tested. In our future work, we is structured into two layers. The first layer determines the will incorporate also some domain knowledge (e.g. sea cannot dominant label(s) of the regions according to the initial label- be above sky) to improve the whole process. ing of regions and their neighbors; the confidence of the initial labeling (expressed in these rules as ”high”, ”medium”, ”low”) ACKNOWLEDGMENT is derived from the initial fuzzy labels. This layer contains 60 rules; a rule for each combination of label, of its confidence, of This research was supported by the European Commission the region and its neighbor. The next layer increases/decreases under contract FP6-027026 K-SPACE. Thanos Athanasiadis is the similarity of regions a and b if they share/don’t share funded by the Greek Secretariat of Research and Technology the same dominant label. This layer contains 48 rules; a (PENED Ontomedia 03 ED 475). rule for each combination of possible dominant labels of the two regions. All rules have been created empirically, by a R EFERENCES knowledge engineer. Example rules for the first and second [1] P. Salembier and F. Marques, “Region-based representations of image layer look like follows: and video - segmentation tools for multimedia services,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 9, no. 8, December 1999. IF A conf sky(medium) AND A conf sea(medium) AND [2] P. Berka, V. Las, and V. Svatek, “Nest: re-engineering the compositional A conf sand(low) AND leftofA conf sky(high) AND approach to rule-based inference,” Neural Network World, pp. 367–379, rightofA conf sky(high) THEN A dominant(sky) (0.8) May 2004. [3] S. Berretti, A. D. Bimbo, and E. Vicario, “Efficient matching and indexing IF A dominant(sky) AND B dominant(sky,sea) of graph models in content-based retrieval,” IEEE Trans. on Circuits and THEN A B similar (0.7) Systems for Video Technology, vol. 11, no. 12, pp. 1089–1105, Dec. 2001. [4] Th. Athanasiadis, V. Tzouvaras, K. Petridis, F. Precioso, Y. Avrithis, Inferred weight of the (dis)similarity proposition is used and Y. Kompatsiaris, “Using a multimedia ontology infrastructure for to drive segmentation based on semantic information and to semantic annotation of multimedia content,” in Proceedgins of 5th In- resolve oversegmentation problems. ternational Workshop on Knowledge Markup and Semantic Annotation (SemAnnot ’05), Galway, Ireland, 2005. IV. F IRST E XPERIMENTS [5] Th. Athanasiadis, Y. Avrithis, and S. Kollias, “A semantic region growing approach in image segmentation and annotation,” in Proceedgins of 1st We will illustrate the rule-based reasoning step on a fol- International Workshop on Semantic Web Annotations for Multimedia lowing simple example. Let 0, 2, 3, 7, 8 be five regions of (SWAMM ’06), Edinburgh, Scotland, 2006.