=Paper=
{{Paper
|id=None
|storemode=property
|title=Ontology-based Descriptions of Image Collections
|pdfUrl=https://ceur-ws.org/Vol-804/13_LANMR11.pdf
|volume=Vol-804
|dblpUrl=https://dblp.org/rec/conf/lanmr/MedinaSCB11
}}
==Ontology-based Descriptions of Image Collections==
<pdf width="1500px">https://ceur-ws.org/Vol-804/13_LANMR11.pdf</pdf>
<pre>
Ontology-based descriptions of image collections

Ma. Auxilio Medina1 , J. Alfredo Sánchez2 , J. de la Calleja1 , Antonio Benitez1
                         1
                         Universidad Politécnica de Puebla
                       Tercer Carril del Ejido Serrano S/N
                         Juan C. Bonilla, Puebla, México
                      2
                        Universidad de las Américas Puebla
                    Ex-Hacienda Santa Catarina Mártir S/N
                      San Andrés Cholula, Puebla, México
              {mmedina,abenitez,jdelacalleja}@uppuebla.edu.mx
                         j.alfredo.sanchez@gmail.com


      Abstract. Thesaurus have been widely used to provide structured vo-
      cabularies to describe books, however, they only provide part of the re-
      quired knowledge for semantic web contexts. This paper proposes the
      use of ontologies and standard metadata to model semantic descrip-
      tions of image collections that refers to book covers. Ontologies represent
      keyword-based organizations. The classes and the hierarchical relation-
      ships of the ontology allow users to query images by topic with common
      search engines. The greenBookC collection is used as a test bed. This
      collection is published and maintained in Greenstone, a suite of software
      for digital libraries.

      Keywords: digital libraries, ontologies, semantic descriptions


1   Introduction

Users of digital libraries employ keyword based search engines, browsing mecha-
nisms or recommendation systems to ﬁnd relevant information. In these tools,
words are considered as “units of meaning”. However, there is a gap between the
contents and meanings in multimedia data [4].
    Diﬀerent approaches exist to organize multimedia collections that vary from
processing low level features to associate concepts to media. The results of these
approaches often require knowledge of experts or specialized users. Most of the
users of digital libraries interact with image collections through search engines
that allow them to query images by title, date, author, format, publisher or with
a feature associated with the ﬁle that stores the image. These descriptors are
associated with image content.
    Frequently, image collections are also described using a set of terms of a
keyword-based organization system or with natural language descriptions. In or-
der to extend the implicit syntactic level of these descriptions, semantic descrip-
tions with predeﬁned structures and control vocabularies are considered richer
structures to store background knowledge [5].


                                          133
    Our interest is focused on describe images that refers to book covers. Tradi-
tionally, thesaurus have been widely used to provide structured vocabularies to
describe books, however, they only provide part of the required knowledge for
semantic web contexts. In semantic digital libraries (SDLs), materials, tools and
meanings are addressed to oﬀer beneﬁts such as: anyone can use it, knowledge
is accessible from the SDL, resources are available with the modality anytime
anywhere, there are friendly and multi-modal interfaces with multiple connected
devises [2].
    This paper explores an alternative to associate background knowledge to
image descriptions that refers to book covers. The alternative makes use of on-
tologies and metadata standards. The classes and the hierarchical relationships
of ontologies can be exploited by keyword-based search engines. A collection of
images is built as a test bed. The collection is published in Greenstone, a suite
of software for digital libraries.
    The paper is organized as follows. Section 2 presents brief descriptions of
software tools that allow users to construct image collections in the semantic
web. Section 3 explains the ontologies and metadata management. Section 4
describes the test bed collection. Section 5 presents preliminary results. Finally,
Section 6 includes conclusions and suggests future directions of our work.


2   Related work

Open freely distributed software exists around the world to construct image
collections for educational, government and commercial institutions. This section
describes some of the most common tools used in semantic web applications and
depicts some representative collections.
    DSpace1 allow users to create repositories of digital content in multiple for-
mats. This is a widely popular tool that supports the customization of interfaces
to ﬁt diﬀerent user needs [6]. “The Spanish Image collection” is an example of a
collection constructed in DSpace. The collection is formed by 1196 JPEG ﬁles2 .
The images can be query with descriptors such as author, title or date; a list
of categories is used to organize images. The description of images at DSpace is
syntax-based.
    Fedora commons repository software3 is a general purpose, open-source digi-
tal object repository system under the Apache License. This is a centered plat-
form that enables storage, access and management of digital content, although
does not support indexing, discovery and delivery application mechanisms [3].
The modules oﬀer interoperability and extensibility of data. Image collections
can be constructed upon Fedora such as the “Maryland Map Collection”. This
collection is formed by JPEG ﬁles that depicts Maryland, the Chesapeake Bay,
1
  Dspace home page is available at http://www.dspace.org/
2
  The Spanish Image Collection is available at:
  http://dspace.nitle.org/handle/10090/1267
3
  Fedora home page is available at: http://fedora-commons.org/. This work is licensed
  under a Creative Commons Attribution-Share Alike 3.0 Unported License


                                        134
and the surrounding region from 1590 to the present. The interface visualizes
images and metadata. The Libraries’ Catalog and the ArchivesUM are used to
organize this collection.
    Greenstone4 is a suite of software to build and construct collection of seman-
tic digital libraries. This is produced by the New Zealand Digital Library Project
at the University of Waikato, and developed and distributed in cooperation with
UNESCO and the Human Info NGO. This is an open-source, multilingual soft-
ware issued under the terms of the GNU General Public License. Greenstone
supports a variety of formats that represents text, images, audio and video [6].
The “Historic Campus Architecture Project”5 developed by the Council of Inde-
pendent Colleges (CIC) in the United States, is an example of an image collection
in Greenstone. In the project, 5000 images of hard copy photographs of 4 x 6
and 8 X 10 inches were transformed into JPEG ﬁles. Metadata is used as the
descriptors to explore the collection. Metadata is stored in XML ﬁles.
    After analyzing these software tools, we choose Greenstone to implement
the ontology-based alternative to describe image collections. The features of
Greenstone related with the support of semantic descriptions of images are the
following ones [6]:

 – The digital library server runs in diﬀerent platforms
 – Full text is searchable by default
 – Servers and harvesters data compliance with the Open Archives Initiative
   Protocol for Metadata Harvesting (OAI-PMH)
 – Export and ingest data from DSpace
 – Collections can be updated anytime without disturbing users
 – Dublin Core (DC) is the default metadata format when a new collection is
   constructed


3     Building semantic descriptions

Although thesaurus have been a typical classiﬁcation system of digital libraries,
they are not able to accomplish the information requirements of the semantic
web. Ontologies have larger representation power than thesaurus [2]. An adapta-
tion of the steps proposed [5] was done to construct a lightweight ontology used
to build semantic descriptions.
    The steps to construct semantic descriptions based on metadata and ontolo-
gies that refers to book covers are the following ones:

1. Construction of a description template for book covers. This template an-
   swer questions such as: what kind of information does users want to record
   for a particular image, how users query an image collection of book covers,
   what format is appropriate to store images, how can images be organized,
4
    Greenstone is available at: http://www.greenstone.org/
5
    The History Campus Arquitecture Project is available at
    http://puka.cs.waikato.ac.nz/cgi-bin/cic/library


                                         135
    what metadata standard is useful to support semantic descriptions, will the
    descriptions be used locally or a mechanism of information sharing must be
    considered

 2. Linking images to ontology classes in such a way that classes can be used as
    values for a metadata standard. Hierarchical relationships should be used as
    a query expansion mechanism

 3. Describing additional domain and background knowledge in metadata stan-
    dard elements


    Each case study might use of a diﬀerent ontology as well as a distinct meta-
data standard as input. The goal in this work is to exploit ontological charac-
teristics as the basis to improve image descriptions and retrieval mechanisms.
    Table 1 shows an excerpt of an ontology that was used to construct semantic
descriptions of book covers that form a collection called greenBookC. This collec-
tion correspond to book covers of physical books of the library of the UPPuebla.
The areas of knowledge proposed by the “Consejo Nacional de Ciencia y Tec-
nologa” (CONACYT) are used as the main classes of this ontology.


                Table 1. An excerpt of an ontology for book covers

               1. Mathematical physicist and Earth sciences
               2. Biology and chemistry
               3. Medicine and health sciences
               4. Humanities and behavior sciences
               5. Social sciences
               6. Biotechnology and food sciences
               7. Engineering
               Computer science
               Economics
               Electronics
               Financial engineering
               Industrial engineering
                   - Materials
                   - Mechatronics
               Science and technology
               Telecommunications
               8. Others


    The ontology of Table 1 has 8 main classes and 29 subclasses organized in
3 levels. Each class has a label, a level and one or more instances (images). An
additional class is used to hold images of speciﬁc types of literature: biographies,
dictionaries and novels. The classes were constructed using knowledge of experts.
Semantic information is represented by metadata attached to each image; in


                                        136
particular, the dc:subject element is used to support the organization of topics,
this means that the names of the classes are used to ﬁll this element. Table 1
shows the DC elements used to form a semantic description. An institutional
policy help to maintain a controlled vocabulary for the semantic descriptions.


          Table 2. DC elements used to construct a semantic description

DC element              Description                        Fixed values
Creator    Indicates the name of the ﬁrst author
Date       The year of the book
Format     The digital manifestation of the book    JPEG
Identifier Identiﬁer of the physical book
Language   Language of the content                  Spanish, English or French
Type       Categorizes the nature of the content    image
Subject    A class name
Source     A reference to the owner of the resource Universidad Politécnica de Puebla
Title      The title extracted from the book cover


    The hierarchical relationships of the ontology are useful for free text search.
For example, a cover book that belongs to the materials class, it is also considered
as a member of the industrial engineering class.


4   greenBookC: an image collection with semantic
    descriptions at Greenstone
This section describes the use of the ontology described in Section 3 to enrich
an image collection of book covers. GreenBookC collection uses existing legal
metadata of Greenstone. Semantic description of images are described with the
unqualiﬁed Dublin Core (DC) elements of Table 2 as shown in Figure 1. After a
metadata standard is added, metadata elements need to be ﬁlled as illustrated in
Figure 2. These data can be assigned in diﬀerent languages in order to improve
collection accessibility.
    According to [1], semantic descriptions are stored in standard metadata for-
mats and knowledge representation languages such as XML, RDF, RDF-Schema
and OWL. At greenBookC, these descriptions are stored as XML ﬁles. These ﬁles
can be exported to DSpace or be processed by another XML tools.


5   Preliminar results
The greenBookC collection is formed by 1504 images of book covers in JPEG
format. JPEG is a standard to represent compressed continuous-tone images [6].
The images are normalized as follows: 32 bits for colours, each one 288 per 352
pixels. The size of the ﬁles varies between 33 and 98 Kbytes. A document camera
was used to get the images in order to enhance text and graphics.


                                        137
Fig. 1. Using DC elements at Greenstone


                  138
        Fig. 2. Filling metadata for an instance of the greenBookC collection


    The create panel at the librarian interface of Greenstone is used to construct
the greenBookC collection. The simple image collection (image-e) from the demo
Greenstone collections was used as a template. The ImagePlugin of Greenstone
processed all the images appropriately on a Windows XP operative system.
    greenBookC has a web-based interface to search the images. Descriptors such
as topic, author or the browse list can be used to query the images. Users can
carry traditional keyword-based searches in diﬀerent DC elements or use the
metadata descriptions whose values comes from the ontology. Distinct languages
can be used to query the collection. At the date, the work has been address to
construct the semantic descriptions and empirically the descriptions have im-
prove collection accessibility. However, a lot of eﬀort is still required to evaluate
the proposed approached.


6   Conclusions

The use of lightweight ontologies to enrich metadata standard is a basic but
simple policy to organize images by content and an alternative to construct
semantic descriptions. Ontologies and metadata enable information sharing be-
tween academic communities. Ontologies provide controlled vocabularies that
help to reduce ambiguity from natural language or keyword based descriptions.
    The visualization of book covers at Greenstone allow users to recognize visual
features of book covers. The use of existing legal metadata enables information
sharing and improves interoperability in digital libraries. Diﬀerent metadata
standards or additional elements can be used to store relevant information of


                                         139
book covers as the abstract or the synopsis of a book, the publisher and the
physical location, respectively.
    There are several challenges in the construction of collections for semantic
digital libraries, however there are software tools that can help users to develop
this task successfully. As future work, we plan to represent the classes, sub-
classes, individual, properties and restrictions of the ontology formally in order
to integrate reasoning capabilities.


References
1. Allemang, D., Hendler, J.: Semantic web for the working ontologist. Morgan Kauﬀ-
   man Publishers (2011)
2. Kruk, S.R., McDaniel, B.: Semantic Digital Libraries. Springer-Verlag (2009)
3. Lagoze, C., Payette, S., Shin, E., Wilper, C.: Fedora: an architecture for complex
   objects and their relationships. International Journal on Digital Libraries V6(2),
   124–138 (Apr 2006)
4. Ricardo Baeza-Yates, B.R.N.: Modern information retrieval. Addison Wesley (2011)
5. Wielinga, B.J., Schreiber, A.T., Wielemaker, J., Sandberg, J.A.C.: From the-
   saurus to ontology. In: Proceedings of the 1st international conference on Knowl-
   edge capture. pp. 194–201. K-CAP ’01, ACM, New York, NY, USA (2001),
   http://doi.acm.org/10.1145/500737.500767
6. Witten, I.H., Bainbridge, D., Nichols, D.M.: How to Build a Digital Library. Morgan
   Kaufmann, Burlington, MA, 2. edn. (2010)


Acknowledgments

We thank to the staﬀ library at the UPPuebla for their help and cooperation in
the construction process. This work is partially supported by PROMEP grant
Biblioteca Digital Semántica de Recursos Educativos /103-5/09/4023.


                                         140

</pre>