LABELLING IMAGE REGIONS USING SPATIAL PROTOTYPES Carsten Saathoff, Marcin Grzegorzek, and Steffen Staab University of Koblenz {saathoff,marcin,staab}@uni-koblenz.de ABSTRACT 2.1. Constraint acquisition In this paper we present an approach for introducing spatial Spatial constraint templates constitute the background knowl- context into image region labelling. We combine low-level edge in our approach. We acquire these templates from so- classification with spatial reasoning based on explicitly repre- called spatial prototypes, which are manually labelled images. sented spatial arrangements of labels. We formalise the prob- We mine the prototypes using support and confidence as se- lem using Linear Programming, and provide an evaluation on lection criteria, and come up with a set of templates represent- a set of 923 images. ing typical spatial arrangements. For each label l we have to determine in what spatial re- lation to other labels it might be found. Therefore, for each 1. INTRODUCTION spatial relation type t, we consider the relation set Rt↓l , which contains the relations of type t from images depicting l. We Exploiting solely low-level features for automatic image re- l,l0 then define Rt↓l to be the set of relations between segments gion labelling often leads to unsatisfactory results, and recent ∗,l 0 studies [1] show the importance of contextual and spatial in- s, s0 depicting l and l0 , respectively, and finally Rt↓l to denote formation. In this paper, we propose an approach based on all relations between an arbitrary region and a region depict- [2] that integrates a wavelet-based low-level classification [3] ing l0 . The confidence of a label arrangement is then defined l,l0 0 |Rt↓l | |Rl,l | and spatial reasoning based on Linear Programming. as γt (l, l0 ) = ∗,l0 , and the support as σt (l, l0 ) = |Rtt↓l | . |Rt↓l | During the training phase we train the classifiers and ac- Finally, we define the template Tt for the spatial relation quire our background knowledge. In the classification phase, type t as Tt (l, l0 ) = 1 iff σt (l, l0 ) > thσ and γt (l, l0 ) > thγ , each image is first segmented by an automatic segmentation and Tt (l, l0 ) = 0 otherwise. For absolute spatial relations we algorithm. The low-level classification produces for each re- define support, confidence, and the template accordingly. gion si and each supported label lj a probability θi (lj ). Then relative (e.g. above-of, left-of ) and absolute (e.g. above-all) spatial relations are extracted, and are processed by the spa- 2.2. Spatial reasoning with linear programming tial reasoning together with the probabilities. The output is a We will show in the following how to formalize image la- final labelling that is optimal with respect to both our spatial belling with spatial constraints as a linear program. We con- background knowledge and the probabilities. sider Binary Integer Programs, which have the form We provide results of a number of experiments showing that our approach provides comparable performance with low maximize Z = cT x numbers of training examples. Due to length constraints, we subject to Ax =b (1) will not detail the low-level processing at all. x ∈ {0, 1} Goal of the solving process is to find a set of assignments to 2. SPATIAL REASONING BASED ON the integer variables in x with a maximum evaluation score Z CONSTRAINTS that satisfy all the constraints. In order to represent the image labelling problem as a lin- The goal of the spatial reasoning step is to exploit background ear program, we create a set of linear constraints from each knowledge about the typical spatial arrangements of objects spatial relation in the image, and determine the objective co- in images in order to improve the labelling accuracy com- efficients based on the hypotheses sets and the constraint tem- pared to pure local, low-level feature-based approaches. We plates. Let Oi ⊆ R be the set of outgoing relations for re- will first discuss the acquisition of constraint templates from gion si ∈ S, i.e. Oi = {r ∈ R|∃s ∈ S, s 6= si : r = a set of spatial prototypes, and then describe the formalisation (si , s)}, and Ei ⊆ R the set of incoming spatial relations, of the problem as a Linear Program. i.e. Ei = {r ∈ R|∃s ∈ S, s 6= si : r = (s, si )}. Then, for each possible pair of label assignments to the regions, we 3. EXPERIMENTS AND RESULTS create a variable cko itj , representing the possible assignment of lk to si and lo to sj with respect to the relation r with type We evaluated the approach on a set of 923 images depicting t ∈ T . Each cko ko itj is an integer variable and citj = 1 repre- outdoor scenes. We used the labels building, foliage, moun- sents the assignments si = lk and sj = lo , while cko tain, person, road, sailing-boat, sand, sea, sky, snow. In our itj = 0 means that these assignments are not made. Since every such evaluation we used the spatial relations above-of, below-of, variable represents exactly one assignment of labels to the left-of and right-of, the absolute spatial relations above-all involved regions, and only one label might be assigned to and below-all, and we used the thresholds σ = 0.001 and a region in the final solution, we have to add this restric- γ = 0.2 for both relative and absolute spatial relations. We tion as linear constraints. The constraints are compared the performance of the low-level classification with P P formalised ko as the spatial reasoning on different training set sizes and mea- ∀r ∈ R : r = (si , sj ) ∈ R → lk ∈L c lo ∈L itj = 1. These constraints assure that there is only one pair of labels sured precision (p), recall (r) and the classification rate (c). assigned to a pair of regions per spatial relation, but it still Further we computed the F-Measure (f). In Table 1 the aver- k0 o0 age for each of these measures is given. there could be two variables ckoitj and cit0 j 0 both being set to 1, 0 which would result in both k and k assigned to si . Low-Level BIP set size p r f c p r f c Since our solution requires that there is only one label 50 .63 .65 .57 .60 .77 .75 .73 .75 assigned to a region, we have to add constraints that “link” 100 .70 .67 .65 .69 .78 .77 .75 .80 the variables accordingly. This can be accomplished by link- 150 .67 .63 .61 .66 .74 .71 .70 .75 ing pairs of relations, and start by defining the constraints for 200 .69 .65 .63 .67 .80 .75 .76 .80 the outgoing relations. We arbitrarily take one base relation 250 .69 .64 .60 .66 .78 .73 .72 .77 rO ∈ Oi and then create constraints for all r ∈ Oi \ rO . 300 .68 .63 .61 .66 .82 .77 .78 .82 350 .63 .68 .61 .66 .80 .75 .76 .80 Let rO = (si , sj ) with type tO , and r = (si , sj 0 ) with type 400 .68 .63 .61 .66 .80 .75 .75 .79 t be the twoP relations to be linked. Then, the constraints are ko ko0 P ∀lk ∈ L : lo ∈L citO j − lo ∈L itj 0 = 0. The first sum 0 c Table 1. Overall results for the two approaches. can either take the value 0 if lk is not assigned to si by the relation r, or one if it is assigned, and basically the same ap- The best overall classification rate is achieved with the bi- plies for the second sum. Since both are subtracted and the nary integer programming approach on the data set with 300 whole expression has to evaluate to 0, either both equal 1 or training images. However, with only 100 training examples both equal 0 and subsequently, if one of the relations assigns we achieve nearly the same performance, indicating that 100 lk to si , the others have to do the same. The constraints for training examples are a good size for training a well perform- the incoming relations are defined accordingly, where rE is ing classifier using our approach. the base relation. Finally we have to link the outgoing to the incoming re- 4. CONCLUSIONS lations. Since the same label assignment is already enforced In this paper we have introduced a novel spatial reasoning within those two types of relations, we only have to link rO approach based on an explicit model of spatial context. Our P rE , ko and using the P following set of constraints: ∀lk ∈ L : o0 k results show a good classification rate compared to results in c lo ∈L itO j − c lo0 ∈L j 0 tE i = 0 Absolute relations are for- the literature, while requiring only a low number of training malized and linked accordingly. data. Eventually, let tr and ta refer to the type of the relative relation r and the absolute relation a, respectively, then the 5. REFERENCES objective function is defined as [1] M. Grzegorzek and E. Izquierdo, “Statistical 3d object classification and localization with context modeling,” in X X X min(θi (lk ), θj (lo )) ∗ Ttr (lk , lo ) ∗ cko 15th European Signal Processing Conference, 2007. itr j + r=(si ,sj ) lk ∈L lo ∈L [2] Carsten Saathoff and Steffen Staab, “Exploiting spatial X X θi (lk ) ∗ Tta (lk ) ∗ ckita . (2) context in image region labelling using fuzzy constraint a=si lk ∈L reasoning,” in Proc. of WIAMIS, 2008. [3] M. Grzegorzek and H. Niemann, “Statistical object recognition including color modeling,” in 2nd Interna- This function rewards label assignments that satisfy the back- tional Conference on Image Analysis and Recognition, ground knowledge and that involve labels with a high confi- 2005. dence score provided during the classification step.