Semantic Annotation of Image Collections Laura Hollink1 Guus Schreiber1 Jan Wielemaker2 Bob Wielinga2 1 Free University Amsterdam, Computer Science E-mail: {laurah,schreiber}@cs.vu.nl 2 University of Amsterdam, Social Science Informatics E-mail: {jan,wielinga}@swi.psy.uva.nl ABSTRACT 2. ONTOLOGIES, ANNOTATION TEM- In this paper we discuss a tool for semantic annotation and PLATE AND THEIR INTERRELATIONS search in a collection of art images. Multiple existing on- tologies are used to support this process, including the Art 2.1 Ontologies and Architecture Thesaurus, WordNet, ULAN and Icon- For this study we used four thesauri, which are relevant class. We discuss knowledge-engineering aspect such as the for the art-image domain: annotation structure and links between the ontologies. The annotation and search process is illustrated with an appli- cation scenario. 1. The Art and Architecture Thesaurus (AAT) [11] is a large thesaurus containing some 125,000 terms relevant for the art domain. The terms are organized in a single hierarchy. 2. WordNet [8] is a general lexical database in which nouns, verbs, adjectives and adverbs are organized into synonym 1. INTRODUCTION AND APPROACH sets, each representing one underlying lexical concept. In this paper we show how ontologies can be used to sup- WordNet concepts (i.e. “synsets”) are typically used to port annotation and search in image collections. Many of describe the content of the image. In this study we used such collections currently exist and users are increasingly WordNet version 1.5, limited to hyponym relations. faced with problems of finding a suitable (set of) image(s) 3. Iconclass [16, 15] is an iconographic classification sys- for a particular purpose. Each collection usually has its own tem, providing a hierarchally organized set of concepts (semi-)structured indexing scheme that typically supports a for describing the content of visual resources. We used a keyword-type search. However, finding the right image is subset of Iconclass. often still problematic. 4. The Union list of Artist Names (ULAN [2] contains infor- Figure 1 shows the general architecture we used within mation about around 220,000 artists. The information this study. For this study we used four ontologies (AAT, includes name variants and some limited biographical WordNet, ULAN, Iconclass) which were represented in RDF information (dates, locations, artist type). A subset of Schema [1]. The resulting RDF Schema files are read into 30,000 artists, representing painters, is incorporated in the tool with help of the SWI-Prolog RDF parser [19, 10]. the tool. The tool subsequently generates a user interface for annota- tion and search based on the RDF Schema specification. The AAT, WordNet, Iconclass and ULAN were all translated tool supports loading images and image collections, creat- into the RDF Schema notation. For example, WordNet was ing annotations, storing annotations in a RDF file, and two represented in the following fashion: types of image search facilities. • WordNet concepts (“synsets” which have a numerical The ontologies, the annotation template and their interre- identifier) were represented as RDFS classes; lations are discussed in Section 2. The annotation and query • word forms of concepts were represented as RDFS labels process is discussed in, Section 3 in the form of an applica- of the corresponding class; tion scenario. Section 5 discusses related work. Finally, Sec- • hyponym relations were represented as RDFS subclass tion 5 provides a general discussion on the approach taken. relations; This work is a sequel to earlier work on semantic annota- • glossary entries of concepts were represented as RDFS tion and search of a collection of photographs of apes [13]. comments. In the earlier study the emphasis was mainly on the subject- matter of the image. For art images both the image subject In another paper [20] we discuss how we can use WordNet and the art-historic features, such as artist and style, are 1.6 as represented by Melnik and Decker1 . In a prior publi- important. This requires the use of additional ontologies cation [21] one can find a discussion on issues arising when (AAT, ULAN) and poses research questions with respect to 1 the links between ontologies (see Section 2.4). See http://www.semanticweb.org/library/#wordnet Figure 1: Overview of the approach in this study. The RDF Schema specifications of the ontologies, of the ontology links and of the annotation template are parsed by the SWI-Prolog RDF parser into the tool. The tool generates an annotation and search interface from these specifications. This interface is used to annotate and query images. The annotations are stored in an RDF file Source Triples The 17 VRA data elements were for visualization pur- WordNet 1.5 (limited to hyponym relations) 280.558 poses grouped into three sets: Art and Architecture Thesaurus 179.410 Iconclass (partial) 15.452 Production-related descriptors: title, creator, date, ULAN (limited to painters) 100.607 style/period, technique, culture and and relation. . Total 576.027 Physical descriptors: materials.medium, materi- Table 1: Number of RDF triples in the four ontolo- als.support, measurements, type and record type. gies Administrative descriptors: location, collection ID, source and rights. representing AAT in RDF Schema. Table 1 shows the number of RDF triples in the tool for Two VRA data elements are not included in the template: each of the thesauri. The infrastructure of our current tool description and subject. Both are used to describe the con- can handle this set of 576,000 triples efficiently, but it is ex- tent of the image. As we were interested in providing a more pected to break down when the triple base becomes signifi- structured content description we used an adapted version of cantly larger. Based on our experiences in this work we have the “sentence structure” proposed by Tam [14] as a means recently constructed a revised infrastructure that should be of structuring image-subject descriptions. The subject of able to handle up to 40,000,000 triples [20]. the image is described with a collection of statements of the form “agent action object recipient”. Each statement should 2.2 Annotation template at least have an agent (e.g. a portrait) or an object (e.g. a For annotation and search purposes the tool provides the still life). The terms used in the sentences are selected from user with a description template derived from the VRA 3.0 terms in the various thesauri. Multiple sentences may be Core Categories [17]. The VRA template is defined as a used to describe a single painting. specialization of the Dublin Core set of metadata elements, For example, the painting by Chagall in Figure 2, in which tailored to the needs of art images. The VRA Core Cat- Chagall kisses his wive and gets flowers from her, can be egories follow the “dumb-down” principle, i.e., a tool can described with the following two statements (source of the interpret the VRA data elements as Dublin Core data ele- term parenthesized): ments.2 ments, including links to Dublin Core, can be found at 2 An unofficial OWL specification of the VRA ele- http://www.cs.vu.nl/g̃uus/public/vra.owl terial.support, material.medium and culture. One VRA data element is linked to ULAN, namely creator. The slots of the subject-matter description are also linked to subtrees of the ontologies. WordNet provides many gen- eral concepts for subject-matter description. AAT also pro- vides some concepts useful for this purpose. There is some overlap here between AAT and WordNet. In the next sub- section we come back to this issue. Iconclass is particularly useful for describing scenes as a whole (cf. the birthday celebration example earlier). ULAN contains specific persons, which are typically used to annotate images in which artists themselves are depicted (e.g., a self portrait). We are currently considering to in- clude also some geographical terminology base, such as the Thesaurus of Geographical Names (TGN)4 , to be able to describe specific locations in a semantically meaningful way. 2.4 Links between ontologies The four ontologies contain many terms that are in some way related. For example, WordNet contains the concept Figure 2: Painting of Chagall wife, which is in fact equal to the AAT concept wives (AAT uses the plural form as the preferred one). One could con- sider to design a new ontology by merging them. However, Agent: "Chagall, Marc" (ULAN) to make the Semantic Web work, we will need to reuse exist- Action: "kiss" (WordNet) ing ontologies rather than redoing them. Thus, we decided Recipient: "wives" (AAT) to use the ontologies “as-is” and create separate corpora Agent: "woman" (WordNet) of ontology links. We added three types of ontology links. Action: "give" (WordNet) Equivalence relations and subclass relations are often men- Object: "flower" (WordNet) tioned in the literature as useful link primitives (e.g. [9]). In Recipient: "Chagall, Marc" (ULAN) addition, we added links specific for the art-image domain. 2.4.1 Equivalence links The scheme was developed for a previous experiment [13]. It avoids the problems of parsing natural language descrip- We added equivalence relations between terms appear- tions, while maintaining some of the naturalness3 and rich- ing in multiple ontologies that refer to the same concept. ness. Note that the use of such concepts to describe the For example, the artistic movements branch in WordNet is image allows one to do semantic matching during search. linked to the equivalent styles and periods subtree is AAT. For example, one can find this picture when searching for a Similarly, the WordNet concept wife is linked to the AAT picture using a synonym or hypernym (e.g., “touch” instead concept wives. of “kiss”). The application scenario in Section 3 gives an As RDF Schema does not provide an equivalence rela- example of the use of this template. tion5 , we had to introduce our own special-purpose prop- In addition, one can describes the “setting”, i.e., char- erty for this. In forthcoming versions of the tool this re- acteristics of the scene as a whole. We use three slots to lation will be replaced by the OWL language construct describe the setting: event, place and time. These three owl:equivalentClass [18]. slots are also filled with terms from the thesauri. For ex- 2.4.2 Subclass links ample, the painting by Chagall can be described with the event birthday celebration (concept from Iconclass) and the When differences in the structure of the ontology are large location artist’s workplace (concept from WordNet). (a common feature), equivalence relations are sometimes The tool also provides a free text field, where information only possible at the lowest, most specific branches of the hi- can be stored that doesn’t fit into one of the slots, or is not erarchies. We use the RDFS subclass relation to create links present in any of the ontologies. at a higher level in the hierarchies. Consider the example in Figure 3 which show two subtrees of respectively AAT 2.3 Linking the annotation template to the on- and WordNet. One can see that the term artist in Word- tologies Net, does not refer to the same concept as artist in the AAT, since some subconcepts of artist in WordNet, such Where possible, a slot in the annotation template is bound as musician, are not subconcepts of artists in AAT, which to one or more relevant subtrees of the ontologies. For ex- contains only people in the visual arts. To link WordNet ample, the VRA slot style/period is bound to two subtrees in to AAT we need to create a subclass link: AAT artist is a AAT containing the appropriate style and period concepts. subclass of WordNet artist. The following VRA data elements are currently linked to 4 parts of AAT: technique, style/period, type, record type, ma- http://www.getty.edu/research/tools/vocabulary/tgn/ 5 The revised version of RDF Schema allows cycles of sub- 3 The naturalness is limited, see. the term “wives” in the class relations. This means that one can now represent first statement. This is because AAT uses the plural form equivalence of A and B by stating the A is a subclass of for concepts. B and that B is a subclass of A. could in principle be derived from the slot creator. The type of a painting can sometimes be derived from descriptions of the content. If the only description of a painting is an agent, the painting is probably a portrait. If the agent is equal to the creator, we are looking at a self-portrait. The suggested values act as default values and can be overridden by the annotator. 2.5 Using the links Equivalence and subclass relations increase the recall of the tool. They make it possible to retrieve images anno- tated with concepts from one ontology while searching with concepts from another ontology. Domain-specific relations are especially useful for annotation. Values in the anno- tation that are suggested by the tool reduce the time and effort spend by the human annotator. Domain-specific rela- tions can also be used to improve search. For example, if a user is searching for Fauvist paintings, the tool can retrieve paintings by Matisse, Derain and De Vlaminck, all Fauvist painters. Domain-specific relations like artist-style. have to be interpreted by the annotation and search algorithms in a domain-specific fashion. A more general mechanisms for handling this would require a rule language. 3. AN APPLICATION SCENARIO 3.1 Annotating art-historic features Figure 4 shows a screenshot of the annotation interface. In this scenario the user is annotating an image representing the painting by Chagall of Figure 2. The figure shows the tab for production-related VRA data elements. The four elements with a “binoculars” icon are linked to subtrees in the ontologies, i.e., AAT and ULAN. For example, if we would click on the “binoculars” for style/period the window shown in Figure 5 would pop up, showing the place in the hierarchy of the concept Surrealist. We see that it is a con- cept from AAT. The top-level concepts of the AAT subtrees from which we can select a value for style/period are shown with an underlined bold font (i.e., and ). 3.2 Using existing annotations Figure 3: Subtrees of AAT (above) and WordNet The collection of art paintings that was used for this (below) in which the concept artist appears. The study, was accompanied by short semistructured textual an- figures are snapshots of the RDF browser of our tool notations. For example, this is the text accompanying the chagall painting: Chagall, Marc 2.4.3 Domain-specific links Birthday In addition to equivalence and subclass links, we also use 1915 domain specific relations. For example, by linking painting Oil on cardboard techniques to materials, we were able to derive the value 31 3/4 x 39 1/4 in. of the technique slot from the values of the material.support The museum of Modern Art, New York and material.medium slots. Similarly, a link between artists in ULAN and painting styles in AAT, made it possible to We included in the tool a parsing facility, implemented as suggest to the user the value of the style/period slot once the a special-purpose set of definite-clause grammar rules, This creator was known. In this way, Picasso is linked to cubism, facility is able to create a partial annotation from these texts. Matisse is linked to Fauve, Van Gogh to impressionism, and For the image in Figure 4 the following VRA slot values so on. This relation is many-to-many: a artist may belong could be derived directly from the text: title, creator, date, to multiple styles. materials.support, materials.medium, measurements, location Other derivations are possible, but are not yet supported and ID . For the style/period slot a value is suggested based by the tool. ULAN contains information about the country on the slot value for creator. The same is done for technique, of origin of the artists. This means that the VRA slot culture the value for which can be derived from the two material Figure 6: Description of the content of the painting Figure 4: Screenshot of the annotation interface The fig- “Portrait of Derain” ure shows one tab with VRA data elements for describing the image, here the production-related descriptors. The slots associated with a “binoculars” button are linked to one or more subparts of the underlying ontologies, which provide the concepts for this part of the annotation Figure 7: Browser window for the concept smoke provide the concepts for this part of the annotation. For example, if we would click on the binocular icon for action Figure 5: Browser window for values of style/period. The the window shown in Figure 7 would pop up, showing the concept Surrealist has been selected as a value for this place in the hierarchy of the concept smoke. We see that it slot. The top-level concepts of the AAT subtrees from is a concept from WordNet. which we can select a value for style/period are shown The user interface provides some support for finding the with an underlined bold font (i.e., and ) acters of a term and then invoke a completion mechanism (by typing a space). This will provide a popup list of con- cepts matching the input string. In the browser window, slots. In Figure 4 all values except for culture are derived more advanced concept search options can be selected, in- automatically from the existing annotation. cluding substrings and use of synonyms. One synonym of smoke is provided, namely smoking. The ontology makes it 3.3 Annotating image content easier for people to select the correct concept. For example, Figure 6 shows the annotation of the content of the paint- seeing the specialization puffing of the concept smoke, the ing called “Portrait of Derain” by Maurice de Vlaminck. user might decide to use this term. The template on the right-hand side implements the sub- For annotation purposes the ontologies serve two pur- ject template as described in Section 2.2. The content has poses. Firstly, the user is immediately provided with the been tersely described with the following terms: right context for finding an adequate index term. This en- sures quicker and more precise indexing. Also, the hierar- Agent: "Derain, Andre" (ULAN) chical presentation of concepts helps to disambiguate terms. Action: "smoke" (WordNet) When the user types in the term “pipe” as the object in Object: "pipes(smoking equipment)" (AAT) a content-description template, the tool will indicate that this an ambiguous term. In the user interface the term itself As with the art-historic features, the slots are linked to gets a green color to indicate this and the status bar near one or more subparts of the underlying ontologies, which the bottom shows the number of hits in the ontologies. If Netherlandish. 4. RELATED WORK The architecture shown in Figure 1 is in the same spirit as the one described by Lafon and Bos [7]. The main dif- ference lies in the fact that we place more emphasis on the nature of the ontologies. Koivunen and Swick [6] discuss an architecture semantic annotation, but mainly from the perspective of the shared collaborations. CREAM [3] also provides an architecture for semantic annotation including both manual and semi-automatic techniques. The present work differs from the latter two approaches through its focus on images (which creates special problems, such as annotat- ing the image content) and the practical work on integrat- Figure 8: Browser window for the pipe concepts ing multiple existing ontologies. The work of Hyvönen and colleagues [4] combines ontology-based image retrieval view view-based and topic-based retrieval and is probably closest one clicks on the binoculars button, the tool will provide to the present work. So far, they have not reported many the user with a choice of concepts from the ontologies that details on the ontologies being used. are associated with this term. Figure 8 shows three of the concepts associated with pipe, namely conduits, hangings and smoking equipment. From the placement of the terms in 5. DISCUSSION the respective hierarchies, it is usually immediately clear to This paper gives some indication on how a semantic web the indexer which meaning of the term is the intended one. for images might work. Semantic annotation allows us to Term disambiguation is a frequent occurrence in this type make use of concept search instead of keyword search. It of application. paves also the way for more advanced search strategies. For The ontologies provide a wide range of concepts for the example, users can specialize or generalise a query with the subject-matter descriptions. Although the choice of con- help of the concept hierarchy when too many or too few hits cepts depends on the indexer, and although the quality of are found. an annotation is subjective, there are some general guide- In a previous study on a collection of ape photographs lines for good annotations. An annotation is most effective [13] we did some qualitative analysis on the added value if the annotator chooses the concepts as specific as possi- with respect to keyword search. The provisional conclusion ble. Experiments [5] have shown that users describe images was that for some queries (e.g., “ape”) keyword search does in terms of the agents and objects that are depicted. An reasonably well, but for other sightly different queries (e.g., annotator should therefore focus on agent and object de- “great ape”) the results are suddenly poor. This is exactly scriptions. where semantic annotation could help. In another prior study [12] we reported on a small exper- 3.4 Searching for an image iment concerning the usability of the annotation toll. Al- The tool provides two types of semantic search. With though our approach relies to some extent on manual an- the first search option the user can search for concepts at notation, it should be possible to generate partial seman- a random place in the image annotation. Figure 9 shows tic annotations from existing annotations (which vary from an example of this. Suppose the user wants to search for free text to structured database entries). The application images associated with the concept Aphrodite. Because the scenario in Section 3 shows an example of this. However, ontologies contain an equivalence relation between Venus (as the example is based on a special-purpose parser. System- a Roman deity, not the planet nor the tennis player) and atic use of NLP techniques should be considered here. Also, Aphrodite, the search tool is able to retrieve images for which content-based image analysis techniques could be used to there is no syntactic match. For example, if we would look at derive image features, such as the location and color of ob- the annotation of the first hit in the right-hand part of Fig- jects. ure 9, we would find “Venus” in the title (“Birth of Venus” Our experiences with RDF Schema were generally posi- by Botticelli) and in the subject-matter description (Venus tive. We made heavy use of the metamodelling facilities of (a Roman deity) standing seashell). The word “Venus” in RDF Schema (which allows one to treat classes as instances the title can only be used for syntactic marches (we do not of other classes) for defining and manipulating the metamod- have an ontology for titles), but the concept in the subject els of the different thesauri. In our experience this feature description can be used for semantic matches, thus satisfying is in particular needed in cases where one has to work with the “Aphrodite” query. existing representations of large ontologies. This is a typical General concept search retrieves images which match the feature for a semantic web: one has to work with existing query in some part of the annotation. The second search op- ontologies to get anywhere, even if one disagrees with some tion allows the user to exploit the annotation template for of the design principles of the ontology. search proposes. An example of this is shown in Figure 10. For our purposes RDF Schema has some limitations in Here, the user is searching for images in which the slot cul- expressivity. We especially needed a notion of property car- ture matches Netherlandish. This query retrieves all images dinality and of equivalence between resources (classes, in- with a semantic match for this slot. This includes images stances, properties). For this reason we plan to move at of Dutch and Flemish paintings, as these are subconcepts of some near point in the future to OWL, the Web Ontology Figure 9: Example of concept search. The query “Aphrodite” will retrieve all images for which we can derive a semantic match with the concept Aphrodite. This includes all images annotated with the concept Venus (as a Roman deity). Only a small fragment of the search results is depicted Figure 10: Search using the annotation template. The query “Netherlandish” for the slot culture retrieve all images with a semantic match for this slot. This includes images of Dutch and Flemish paintings, as these are subconcepts of Netherlandish Language currently under development at W3C [18]. Web - ISWC 2002, number 2342 in Lecture Notes in Computer Science, pages 404–408, Berlin, 2002. Acknowledgments Springer-Verlag. ISSN 0302-9743. This work was supported by the IOP Project “Interactive [13] A. Th. Schreiber, B. Dubbeldam, J. Wielemaker, disclosure of Multimedia Information and Knowledge” and and B. J. Wielinga. Ontology-based photo the ICES-KIS project “Multimedia Information Analysis”, annotation. IEEE Intelligent Systems, 16(3):66–74, both funded by the Dutch Ministry of Economic Affairs. We May/June 2001. gratefully acknowledge the contributions of Marcel Worring, [14] A. M. Tam and C. H. C. Leung. Structured Giang Nguyen and Maurice de Mare. natural-language description for semantic content retrieval. Journal of the American Society for 6. REFERENCES Information Science, to appear. [1] D. Brickley and R. V. Guha. Resource description [15] J. van den Berg. Subject retrieval in pictorial framework (RDF) schema specification 1.0. information systems. In Electronic Filing, Candidate recommendation, W3C Consortium, 27 Registration, and Communication of Visual March 2000. See: http://www.w3.org. Historical Data. Abstracts for Round Table no 34 of [2] The Getty Foundation. ULAN: Union List of Artist the 18th International Congress of Historical Names. Sciences. Copenhagen, pages 21–28, 1995. http://www.getty.edu/research/tools/vocabulary/ulan/, http://www.iconclass.nl. 2000. [16] H. van der Waal. ICONCLASS: An inconographic [3] S. Handschuh and S. Staab. Annotation of the classification system. Technical report, Royal Dutch shallow and the deep web. In S. Handschuh and Academy of Sciences (KNAW), 1985. S. Staab, editors, Annotation for the Semantic Web, [17] Visual Resources Association Standards Committee. volume 96 of Frontiers in Artificial Intelligence and VRA Core Categories, Version 3.0. Technical Applications, pages 25–45. IOS Press, Amsterdam, report, Visual Resources Association, July 2000. 2003. URL: http://www.vraweb.org/vracore3.htm. [4] E. Hyvönen, S. Kettula, V. Raatikka, S. Saarela, [18] Web Ontology Working Group. OWL Web and K. Viljanen. Finnish museums on the semantic Ontology Language Overview. W3C Candidate web. In Proceedings of WWW2003, Budapest, poster Recommendation, World Wide Web Consortium, 18 papers, 2003. August 2003. Latest version: [5] C. Jörgensen. Indexing images: Testing an image http://www.w3.org/TR/owl-features/. description template. In ASUS 1996 Annual [19] J. Wielemaker. SWI-Prolog RDF Parser. SWI, Conference Proceedings, 1996. University of Amsterdam, 2000. URL: [6] M-R. Koivunen and R. R. Swick. Collaboration http://www.swi-prolog.org/packages/rdf2pl.html. through annotation on the semantic web. In [20] J. Wielemaker, A. Th. Schreiber, and B. J. S. Handschuh and S. Staab, editors, Annotation for Wielinga. Prolog-based infrastructure for rdf: the Semantic Web, volume 96 of Frontiers in performance and scalability. In Proceedings Artificial Intelligence and Applications, pages ISWC’03, 2003. 46–60. IOS Press, Amsterdam, 2003. [21] B. J. Wielinga, A. Th. Schreiber, J. Wielemaker, [7] Y. Lafon and B. Bos. Describing and retrieving and J. A. C. Sandberg. From thesaurus to ontology. photographs using RDF and HTTP. Note, W3C In Y. Gil, M. Musen, and J. Shavlik, editors, Consortium, 28 September 2000. URL: Proceedings 1st International Conference on http://www.w3.org/TR/2000/NOTE-photo-rdf- Knowledge Capture, Victoria, Canada, pages 20000928. 194–201, New York, 21-23 October 2001. ACM [8] G. Miller. WordNet: A lexical database for english. Press. Comm. ACM, 38(11), November 1995. [9] I. Niles and A. Pease. Linking lexicons and ontologies: Mapping wordnet to the suggested upper merged ontology. In Proceedings of the 2003 International Conference on Information and Knowledge Engineering (IKE 03), Las Vegas, Nevada, June 23-26 2003. [10] Bijan Parsia. Rdf applications with prolog. http://www.xml.com/pub/a/2001/07/25/prologrdf.html, 2001. [11] T. Peterson. Introduction to the Art and Architecture Thesaurus. Oxford University Press, 1994. See also: http://www.getty.edu/research/tools/vocabulary/aat/. [12] A. Th. Schreiber, I. I. Blok, D. Carlier, W. P. C. van Gent, J. Hokstam, and U. Roos. A mini-experiment in semantic annotation. In I. Horrocks and J. Hendler, editors, The Semantic