Using Context and a Genetic Algorithm for Knowledge-Assisted Image Analysis Stamatia Dasiopoulou, George Th. Papadopoulos, Phivos Mylonas, Yannis Avrtihis, and Ioannis Kompatsiaris Abstract— In this poster, we present an approach to contex- Low-Level Descriptors tualized semantic image annotation as an optimization problem. Segmentation Descriptors Matching Ontologies are used to capture general and contextual knowledge of the domain considered, and a genetic algorithm is applied Hypothesis Sets to realize the final annotation. Experiments with images from Ontology Infrastructure the beach vacation domain demonstrate the performance of the Context Multimedia Domain Analysis Ontologies Ontologies proposed approach and illustrate the added value of utilizing Spatial Relations Module contextual information. Refined Context DOLCE Hypothesis Ontology Index Terms— knowledge-assisted image analysis, ontologies, Sets context modelling, genetic algorithm. Genetic Algorithm I. I NTRODUCTION Semantic Given the continuously increasing information flow, provid- Annotation ing tools and methodologies for (semi-) automatically extract- ing content descriptions at a conceptual level is a key factor Fig. 1. Overall architecture for enabling users to effectively access the information needed. Incorporating knowledge has been acknowledged as probably the only viable solution to overcome the limitations resulting high- and low-level information in order to provide the means from attempts to imitate the way users assess similarity by for driving the semantic analysis and annotation. High-level focusing on the definition of suitable descriptors that could knowledge includes the domain concepts of interest, their re- be automatically extracted and of appropriate metrics. Going lations and contextual knowledge in terms of fuzzy ontological though the relevant literature, one can distinguish two main relations. Low-level knowledge on the other hand, consists of streams in the reported knowledge-driven approaches with the low-level visual and structural descriptions required for the respect to the adopted knowledge acquisition and processing actual analysis process. strategy: implicit, realized by machine learning methods, and explicit, realized by model-based approaches. The former pro- II. OVERALL A RCHITECTURE vide relatively powerful methods for discovering complex and The overall architecture of the proposed framework is hidden relationships between image data and corresponding illustrated in Fig. 1. To represent the required knowledge conceptual descriptions. Model-based analysis approaches on components, the ontology infrastructure introduced in [2] has the other hand make use of explicitly defined prior knowledge, been employed and appropriately extended to provide support attempting to provide a coherent domain model to support for contextual knowledge modelling. Analysis starts by seg- symbolic inference; as a result issues are raised with respect mentation, and subsequently low-level descriptors and spatial to the entailed complexity and the completeness of knowledge relations are extracted for the generated image segments. An acquisition and construction. extension of the Recursive Shortest Spanning Tree (RSST) In order to benefit from the advantages of each category and algorithm has been used for segmenting the image [1], while overcome their individual limitations, we propose an approach descriptors extraction is based on the guidelines given by the coupling explicit prior knowledge and a genetic algorithm for MPEG-7 eXperimentation Model (XM). The image prepro- domain-specific semantic image analysis. Ontologies, being cessing stage completes with the extraction of spatial relations the leading edge technology for knowledge sharing and reuse between adjacent image segments, where fuzzy definitions and providing well-defined inferences, have been selected have been employed [4]. Once the low-level descriptors are for representation. The employed knowledge considers both available, an initial set of hypotheses is generated for each S. Dasiopoulou and G.Th. Papadopoulos are with the Information Process- image segment based on the distance between each segment ing Laboratory, Electrical and Computer Engineering Department, Aristotle extracted descriptors and the domain concepts prototypical University of Thessaloniki, Greeece, and the Informatics and Telematics Institute, Centre for Research and Technology Hellas (email: dasiop@iti.gr; descriptors that are included in the knowledge base. Thereby, papad@iti.gr). I. Kompatsiaris is with Informatics and Telematics Institute, a set of plausible annotations with corresponding degrees of Centre for Research and Technology Hellas (email: ikom@iti.gr). P. Mylonas confidence are produced for each segment, which are refined in and Y. Avrithis are with the Image, Video and Multimedia Laboratory, School of Electrical and Computer Engineering, National Technical University the sequel based on the provided fuzzy contextual knowledge. Athens, Greece (email: fmylonas@image.ntua.gr; iavr@image.ntua.gr). The refined hypotheses sets along with the segments spatial TABLE I F UZZY SEMANTIC RELATIONS . Abbreviation Name P PartOf Sp SpecializationOf Pr PropertyOf Ct ContextOf Loc Location Ins InstrumentOf P at PatientOf relations are eventually passed to the genetic algorithm, which based on the provided domain knowledge decides the optimal semantic interpretation. Fig. 2. Exemplar results for the beach domain, depicting the original image, and the GA application without and with context utilization. III. C ONTEXT M ODELLING AND A NALYSIS A fuzzified ontology, defined on top of the domain one, has been built to represent the modelled contextual knowledge. To provide a sufficiently descriptive context model, we selected a PM SC + 1 r=1 Isr (gi , gj ) set of diverse relations from the ones included in the MPEG- SCnorm = and SC = , (4) 7 specification (Table ??) and extended their definition to 2 M support uncertainty. More specifically, the defined fuzzified where M denotes the number of relations in the constraints contextual domain knowledge consists of the set that had to be examined.   OF = C, rci ,cj , i, j = 1..n, i 6= j, (1) V. E XPERIMENTAL R ESULTS AND C ONCLUSIONS where OF forms a domain-specific “fuzzified” ontology, C The proposed semantic image analysis framework was is the set of all possible concepts it describes, rci ,cj = tested on the beach vacations domain on the following con- F (Rci ,cj ) : C ×C → [0, 1] , Rci ,cj : C ×C → {0, 1}, i, j ∈ N cepts: Sea, Sky, Sand, Plant, Cliff and Person. The employed denotes a fuzzy ontological relation amongst two concepts descriptors are Scalable Color, Homogeneous Texture, Edge ci , cj and Rci ,cj is a crisp semantic relation amongst the Histogram and Region Shape, while the implemented fuzzy two concepts. For the representation the RDF language has spatial relations include the eight diagonal relations, i.e., left, been selected and reification was used in order to achieve the above, above-left etc. Indicative results are given in Fig. 2, desired expressiveness. The refinement of the initial hypothe- illustrating the added value of utilizing contextual information ses’ degrees is performed based on the readjustment algorithm and the performance of the genetic algorithm. presented in [3], where the notion of overall context relevance of a concept to the root element is employed to tackle cases ACKNOWLEDGMENT where a concept is related to multiple ones. The work presented in this paper was partially supported by the European Commission under contracts FP6-001765 IV. G ENETIC A LGORITHM aceMedia and FP6-027026 K-Space. Under the proposed approach, each chromosome represents a possible solution. To determine the degree to which each R EFERENCES interpretation is plausible, the employed fitness function has [1] T. Adamek, N. O’Connor, N. Murphy, Region-based Segmentation of been defined as follows: Images Using Syntactic Visual Features. Workshop on Image Analysis for Multimedia Interactive Services, (WIAMIS), Montreux, Switzerland, f (C) = λ × F Snorm + (1 − λ) × SCnorm , (2) 2005. [2] S. Bloehdorn, K. Petridis, C. Saathoff, N. Simou, V. Tzouvaras, where C denotes a particular Chromosome, F Snorm refers Y. Avrithis, I. Kompatsiaris, S. Staab, M. G. Strintzis, Semantic Anno- to the degree of low-level descriptors similarity, and SCnorm tation of Images and Videos for Multimedia Analysis. 2nd European Semantic Web Conference (ESWC), Heraklion, Greece, May 2005. stands for the degree of consistency with respect to the pro- [3] Ph. Mylonas, Th. Athanasiadis and Y. Avrithis, Improving image analysis vided spatial domain knowledge. The variable λ is introduced using a contextual approach. Internationl Workshop on Image Analysis to adjust the degree to which visual similarity and spatial for Multimedia Interactive Services (WIAMIS), Seoul, April 2006. [4] P. Panagi, S. Dasiopoulou, G.Th. Papadopoulos, I. Kompatsiaris and consistency should affect the final outcome. SCnorm and M.G. Strintzis, A Genetic Algorithm Approach to Ontology-Driven Se- F Snorm are computed as follows: mantic Image Analysis. IEE International Cofnerence on Visual Infor- PN −1 mation Engineering (VIE), Bangalore, India, Sept 2006. i=0 IM (gi ) − Imin F Snorm = , (3) Imax − Imin where Imin is the sum of the minimum degrees of confidence assigned of each region hypotheses set and Imax the sum of the maximum degrees of confidence values respectively.