Using f-SHIN to represent objects: an aid to visual grasping Nicola Vitucci, Mario Arrigoni Neri, and Giuseppina Gini Politecnico di Milano - Dipartimento di Elettronica e Informazione Via Ponzio 34/5, 20133 Milano, Italy {vitucci,arrigoni,gini}@elet.polimi.it Abstract. Description Logics (DLs) are nowadays used to face a va- riety of problems. When dealing with numerical data coming from the real world, however, the use of traditional logics results in a loss of useful information that can be otherwise exploited using more expressive log- ics. Fuzzy extensions of traditional DLs, being able to represent vague concepts, are well suited to reason on such objects. In this paper we present an architecture for the automatic building and querying of a fuzzy ontology related to the representation of objects in terms of their composing parts. Our approach mainly aims to face the problem of visual grasping, which is of wide interest in the robotics field. 1 Introduction The decomposition of an object in parts has been recognized as an important problem in artificial intelligence: it is considered both as a human-like way of reasoning on objects [1] and as a good way to reduce complexity in tasks like object recognition [14]. Apart of the actual image decomposition phase, a major issue is constituted by the semantic description of the extracted features and their mutual relationships. Due to the vagueness affecting real world data, some tolerance should be taken into account when formally representing the structure of an object; this is a reason to take advantage of novel tools as fuzzy DLs [11]. Fuzzy DLs extend crisp DLs by adding imprecision and vagueness in the reasoning process, thus giving some degrees of truth in place of binary answers as yes or no. Although the available fuzzy reasoners are not yet as powerful as their crisp counterparts, some interesting applications can be found. One of them lies in the robotics field, in which a symbolic representation of objects can improve the grasping capabilities of a robot by the use of some semantic information, regarding both the type of grasp itself and the structure of the object to be grasped. To the best of our knowledge, the problem of semantic part decomposition is still an open problem and there are no tools available to automatically create a fuzzy ontology from raw concepts. The use of ontologies for object recognition has been investigated in some works as [4,5,6], but none of them makes explicitly use of fuzzy reasoning except for the creation of (crisp) descriptors as Very high to be used in the classical way; furthermore, they rely on a previous phase of semantic annotation by domain experts, while we focus on the automatic generation of simple concepts, which are sufficient for our purposes. There are some recent works in which fuzzy DLs are thoroughly used to rea- son on multimedia information (see [7,8,10,12]) but little advantage is taken from the expressiveness given by cardinality restrictions (when available). Generally speaking, this is due to the fact that, for scene understanding purposes, it is sufficient to know whether a kind of object is present or not (see [15,16]). On the other hand, for object recognition purposes, it is often necessary to be able to count the instances of each kind of recognized component. In this paper, we show why we use f-SHIN [9] as the underlying DL for addressing this problem, then we describe an architecture for the automatic building of a (crisp) ontology and its use for object recognition via fuzzy ABox reasoning services; eventually, in the last section, we make some considerations and propose some future work. The architecture we propose here is still far from being considered complete, yet we were able to obtain some interesting results. 2 The f-SHIN logic The f-SHIN logic is the fuzzy extension of the SHIN logic [9]. The main im- provement of this extension with respect to its crisp version is the possibility to use assertions like Concept(p)[≥ 0.7], meaning that the individual p has a min- imum degree of participation of 0.7 to the concept Concept, or role(p,q)[≤ 0.3], meaning that the individuals p and q participate in the role role with a maxi- mum degree of 0.3. The greatest lower bound (GLB) [11] is used to know “how much” an individual can be considered to belong to a certain class. A complete description of the f-SHIN logic can be found in [9]. For the f-SHIN logic there exists a reasoner called FiRE 1 , while there exist other reasoners like fuzzyDL2 which is based on the fuzzy extension of the SHIF logic. The reason why we chose to use FiRE as reasoner is, independently from the supported reasoning services, the high expressivity of the underlying f-SHIN logic as it supports cardinality restrictions; on the other hand, such a choice needs some functional blocks to be added to carry out operations like the definition of concepts in terms of membership functions. 3 Architecture As anticipated in the previous section, due to the limitations of the reasoner, the whole architecture is complex and requires some functional elements to be split among different modules (e.g. the reasoner used on the definitions ontology is different from the one used on the objects ontology). The whole architecture of the system is depicted in Fig. 1. 1 http://www.image.ece.ntua.gr/~nsimou/FiRE/ 2 http://gaia.isti.cnr.it/~straccia/software/fuzzyDL/fuzzyDL.html Image segmentation and part decomposition Feature extraction Calculation of relationships for each part among the parts Types of features to Selection of interesting extract from the image quantitative measures Types and parameters of Calculation and selection membership functions of truth values Names of fuzzy concepts Image analysis (e.g. LongObject) External ontology Fuzzy concepts with truth degrees Creation of component Creation of component axioms in the TBox assertions in the ABox Creation of roles related Calculation of the GLB to component concepts for each component Creation of object axioms Creation of role assertions in the TBox in the ABox Objects ontology building Calculation of the GLB for the whole object Object recognition Fuzzy reasoner Fig. 1: The general architecture of the system The “high level” information, which reflects the kind of knowledge that is to be extracted from the image, is encoded in the external ontology; the image analysis and the numerical calculations are performed with MATLABTM , while the intermediate steps are performed either in MATLABTM or in JavaTM . The FiRE reasoner is standalone, thus some steps are still to be carried out by hand. As an example, we will model a fork in terms of its parts; thus, we will use the images shown in Fig. 2. 3.1 External ontology The external ontology, also called the “definitions ontology”, is used to specify the kinds of membership functions to be used as well as the kinds of features to be extracted from the objects found in the images (e.g. elongation, eccentricity, parallelism with respect to other objects and so on) and the meanings of concepts like LongObj and SmallObj in terms of membership functions. Taking the ontology described in [2] as an example, we built a meta-ontology (based on the crisp logic SHOIN (D) with datatypes) in which the features to be extracted from the image are subclasses of the meta-class GeometricConcept and the kinds of membership functions to use are subclasses of the meta-class MembFunc. The ontology presented in [2] makes use of some “concrete” concepts like TrapezoidalConcreteFuzzyConcept and TriangularConcreteFuzzyConcept, each one having several properties defined as hasParameterX (where X stands for A, B, K1 etc.) depending on the parameters needed by the considered membership function; an individual tra1 represents a trapezoidal membership function with given parameters. In our ontology, a concept like “a long object” is modeled as an individual longObject of meta-class Length which has, as its membership function, another individual longMF of a subclass of MembFunc with the function parameters given as datatype properties. By means of the Jena Ontology API3 and the Pellet reasoner4 , information like the kind and the parameters of a membership function representing a concept related to the image is extracted to feed the image analysis module; thus, a SPARQL query like: SELECT * WHERE { ?x rdfs:subClassOf :GeometryConcept . ?y rdf:type ?x . ?y :hasMembershipFunction ?z . ?z rdf:type ?w . ?w rdf:subClassOf :MembershipFunction . FILTER (?w != :MembershipFunction) . ?z :hasParameter1 ?k1 . ?z :hasParameter2 ?k2 . OPTIONAL {?z :hasParameter3 ?k3} . OPTIONAL {?z :hasParameter4 ?k4} } is used to extract the individuals representing the actual fuzzy geometry concepts (e.g. LongObject) used in the objects ontology and their related membership functions data (e.g. a sigmoidal function with two parameters). The ontology is built by a domain expert to reflect the physical characteristics of the robot, so that for example an object can be considered “long” with respect to the maximum aperture of the robot hand. Although a system of measurement has to be established, we now use only pixel measures. 3 http://jena.sourceforge.net/ 4 http://clarkparsia.com/pellet/ (a) Original image (b) Image after segmentation (c) Image after edge dilation and part decomposition (with three parts out of six put in evidence) Fig. 2: Steps of the image analysis phase 3.2 Image analysis In this phase, the original image is converted in a binary image after thresholding and edge recognition performed by Canny method [17] (Fig. 2b); the resulting edges are dilated, then the parts having an area over a threshold are selected (Fig. 2c). This segmentation and decomposition phase is actually non-robust, so that the use of fuzzy relationships can be better shown. After the first phase, some features like the area, the length of the major axis of the ellipse having the same normalized second central moments as the selected region, and so on, are extracted from each found part (see Tab. 1 for some examples of extracted values); then, some quantitative characteristics are computed: for example, the measure of parallelness π, given α and β as the angles between the major axes of the two objects and the x axis of the image, is defined as π = | cos (α − β) |, while the distance between two parts, instead, is defined as the minimum distance between their convex hulls. Using the definitions from the external ontology, for every part we calculate the degree of membership of each feature to its related membership functions. For example, for the feature “length” (i.e. the length of its major axis), the truth values for the functions “LongObj”, “MediumLengthObj” and “ShortObj” are calculated; if a MediumLengthObj is associated to a generalized bell curve membership function with parameters a = 240, b = 2.5, c = 600 and the length of the major axis of the considered object is 456.61 pixels, the object will belong to the class MediumLengthObj with a truth degree µ = 0.93. Table 1: Examples of features extracted from the image (a) Measures of parallelness be- (b) Other features (area and lengths are in tween every pair of parts pixels) p1 p2 p3 p4 p5 p6 Major Minor Area Eccentricity Axes ratio p1 1.00 0.97 0.97 0.97 0.97 0.97 axis axis p2 0.97 1.00 0.89 0.90 0.88 0.89 p1 14860 0.99 456.61 47.33 0.10 p3 0.97 0.89 1.00 0.99 0.99 1.00 p2 12351 0.95 288.23 88.41 0.30 p4 0.97 0.90 0.99 1.00 0.99 0.99 p3 2194 0.98 151.78 23.11 0.15 p5 0.97 0.88 0.99 0.99 1.00 0.99 p4 500 0.99 93.09 8.81 0.09 p6 0.97 0.89 1.00 0.99 0.99 1.00 p5 2617 0.98 181.07 25.79 0.14 p6 771 0.99 141.47 9.39 0.06 3.3 Objects ontology building Using the results from the previous phase, and taking as a working hypothesis that all the found parts belong to the same object (i.e. there is just one object in the scene), for each part only the membership functions which give the highest truth value for each feature are selected; for example, if a part has a truth degree over a threshold for the membership function “MediumLengthObj”, the concept MediumLengthObj is added to the concept representing that part in the fuzzy ontology. At the end, we obtain a concept like (for the sake of simplicity we list only some concepts and roles): ObjClass1 ≡ MediumLengthObj u SmallObj u ≥ 5 parall u ≥ 1 near u . . . where ObjClass1 is the newly created concept related to the part which has been considered. A new fuzzy concept is created only if the current analyzed part does not belong to any existing concept, i.e. there is no concept that fully describes the part (it can be verified via the fuzzy reasoner). Since FiRE does not let us write fuzzy TBox axioms, the degrees of truth are discarded in this phase. When there are no parts left, a role for each concept is created. For example, from the class ObjClass1 the role hasObjClass1 is created, so that the class Fork can be created using the previously found number of objects per class: Fork ≡ ≥ 1 hasObjClass1 u ≥ 4 hasObjClass2 u ≥ 1 hasObjClass3 This is due to the fact that the f-SHIN logic lacks of the qualified cardinality restrictions, so a general hasPart role cannot be used. We use a “typographical” operation, yet the problem of role creation has been faced in [3]. For the sake of completeness, domain and range role axioms should be added to qualify the new roles introduced, but the used reasoner does not fully support them yet. 3.4 Object recognition Once the objects ontology TBox has been built, it is possible to find whether an object, after it has been decomposed in parts, belongs to a class or not (i.e. how much it can be considered to belong to the considered class with respect to a certain threshold); the image analysis steps are the same for the ontology building phase. When for every part all the pertaining concepts and roles can be written in the ABox, the fuzzy reasoning is performed to find the GLB of that part belonging to a certain class; then, roles like hasObjClass1 are created with the same value of the found GLBs and, at the end, the GLB of the main object is calculated. This procedure can be applied to determine whether a specific kind of grasp can be performed or not on the selected object. For example, given the concept defined as (for the sake of simplicity using no roles): GraspableByPinch ≡ MediumLengthObj u HighlyEccentricalObj representing objects that are graspable by a pinch grip, we can find which part of the object (if any) can be grasped this way via a subsumption check. 4 Conclusions In this paper we have presented a possible architecture for the generation and the use of a fuzzy ontology for object recognition by means of objects decomposition in parts. We take advantage of the use of fuzzy cardinality restrictions which, to the best of our knowledge, have not been fully exploited in the current fuzzy DLs applications (e.g. multimedia retrieval). Our results are preliminar and prone to errors, partly due to limitations in the modules in use (e.g. the fuzzy reasoner is still experimental), partly due to the approximations induced by the use of a SHIN logic, while at least qualified cardinality restrictions would be needed. As future work, we plan to take advantage of a more powerful fuzzy DL as it seems to be needed for object modeling purposes, so we will work on a more powerful reasoner and on a better integration between classical and fuzzy knowledge bases; furthermore, as we plan to use the system as an aid to the grasping task, we will add physical information (that can obtained via different sensors, e.g. haptic devices) and further information on the grasping types along with their quality measurements. References 1. Biederman, I.: Recognition-by-components: A theory of human image understand- ing. Psychological Review 94 2 (1987) 115-117 2. Bobillo, F., Straccia, U.: An OWL Ontology for Fuzzy OWL 2. Proceedings of the 18th International Symposium on Methodologies for Intelligent Systems (2009) 3. Haarslev, V., Lutz, C., Möller, R.: Foundations of spatioterminological reasoning with description logics. Proceedings of Sixth International Conference on Principles of Knowledge Representation and Reasoning (1998) 112–123 4. Hudelot, C.: Towards a cognitive vision platform for semantic image interpretation; Application to the recognition of biological organisms. PhD thesis. University of Nice Sophia Antipolis (2005) 5. Maillot, N.: Ontology based object learning and recognition. PhD thesis. University of Nice Sophia Antipolis (2005) 6. Hudelot, C., Atif, J., Bloch, I.: Fuzzy spatial relation ontology for image interpre- tation. In: Fuzzy Sets and Systems , 159 15 (2008) 1929–1951 7. Stoilos, G., Stamou, G., Pan, J.Z., Simou, N., Tzouvaras, V.: Reasoning with the fuzzy description logic f-SHIN : Theory, practice and applications. In P.C.G. da Costa et al. (eds): Uncertainty Reasoning for the Semantic Web I (2008) 262–281 8. Simou, N., Athanasiadis, T., Tzouvaras, V., Kollias, S.: Multimedia reasoning with f-SHIN . Second International Workshop on Semantic Media Adaptation and Per- sonalization (2007) 44–49 9. Stoilos, G., Stamou, G., Tzouvaras, V., Pan, J.Z., Horrocks, I.: The fuzzy descrip- tion logic f-SHIN . International Workshop on Uncertainty Reasoning For the Semantic Web (2005) 10. Mylonas, P., Simou, N., Tzouvaras, V., Avrithis, Y.: Towards semantic multimedia indexing by classification and reasoning on textual metadata. Knowledge Acquisi- tion from Multimedia Content Workshop (2007) 11. Lukasiewicz, T., Straccia, U.: Managing uncertainty and vagueness in Description Logics for the Semantic Web. Journal of Web Semantics 6 4 (2008) 291–308 12. Straccia, U.: Towards Spatial Reasoning in Fuzzy Description Logics. Proc. of the 2009 IEEE International Conference on Fuzzy Systems (2009) 13. Suh, I. H., Lim, G. H., Hwang, W., Suh, H., Choi, J.-H., Park, Y.-T.: Ontology- based multi-layered robot knowledge framework (OMRKF) for robot intelligence. IEEE Int. Conf. on Intelligent Robots and Systems (2007) 429–436 14. Wan, L.: Parts-based 2D shape decomposition by convex hull. IEEE International Conference on Shape Modeling and Applications (2009) 89–95 15. Dasiopoulou, S., Kompatsiaris, I., Strintzis, M.G.: Applying Fuzzy DLs in the extraction of image semantics. Journal of Data Semantics 14 (2009) 105–132 16. Meghini, C., Sebastiani, F., Straccia, U.: A model of multimedia information re- trieval. Journal of ACM 48 5 (2001) 909–970 17. Canny, J.: A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8 6 (1986) 679-698