=Paper= {{Paper |id=Vol-201/paper-9 |storemode=property |title=Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues |pdfUrl=https://ceur-ws.org/Vol-201/41.pdf |volume=Vol-201 |dblpUrl=https://dblp.org/rec/conf/swap/BartoliniGMMA06 }} ==Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues== https://ceur-ws.org/Vol-201/41.pdf
                                                                                                                                             1




                    Multimedia Information Extraction in
                   Ontology-based Semantic Annotation of
                            Product Catalogues
             Roberto Bartolini, Emiliano Giovannetti, Simone Marchi, and Simonetta Montemagni, ILC-CNR

                                             Claudio Andreatta and Roberto Brunelli, ITC-irst

                                                        Rodolfo Stecher, Fraunhofer IPSI

                                                   Paolo Bouquet, DIT-University of Trento


                                                                                   The field of semi-automatic information extraction from
   Abstract—The demand for efficient methods for extracting                     multimedia corpora is central for overcoming the so-called
knowledge from multimedia content has led to a growing research                 “knowledge acquisition bottleneck”. Multimedia sources of
community investigating the convergence of multimedia and                       information, such as product catalogues, contain text
knowledge technologies. In this paper we describe a methodology
for extracting multimedia information from product catalogues
                                                                                (captions) and images (pictures of the products) thus requiring
empowered by the synergetic use and extension of a domain                       information extraction approaches combining several different
ontology. The methodology was implemented in the Trade Fair                     techniques, ranging from Natural Language Processing to
Advanced Semantic Annotation Pipeline of the VIKE-framework.                    Image Analysis and Understanding. In our approach we have
                                                                                three main aspects to consider: 1) the information extraction
   Index Terms—Semantic Web Technologies, Ontology Creation,                    per se, 2) the ontology, its use and creation, and 3) the usage
Ontology Extraction, Ontology Evolution, Semantic Annotation
                                                                                of the ontology in the information extraction process and the
of Multimedia Content
                                                                                synergy between different kinds of extraction processes.
                           I. INTRODUCTION                                         The development of adequate ontologies itself is one of the
                                                                                knowledge acquisition bottlenecks: the use of (semi-)
                                                                                automatic tools for semantic information extraction from
E      FFECTIVE acquisition, organization, processing, use and
    sharing of the knowledge embedded in textual and
multimedia content play a major role for competitiveness in
                                                                                multimedia corpora is very promising but, to be efficiently
                                                                                exploited, must have access to a formal representation of a
                                                                                given domain, i.e., an ontology. We support the ontology
the modern information society and for the emerging
                                                                                creation process in two different and complementary ways,
knowledge economy. However, this wealth of knowledge
                                                                                ontology learning and reuse of existing ontologies. The
implicitly conveyed in the vast amount of available digital
                                                                                ontology learning approach takes advantage of the results of
content is nowadays only accessible, if considerable manual
                                                                                the extraction to enrich the ontology, and the reuse support
effort has been invested into its interpretation and semantic
                                                                                provides methods and tools to re-use already existing
annotation, which is possible only for a small fraction of the
                                                                                ontologies which capture the target domain under a similar
available content.
                                                                                modelling perspective as the one of interest for the extraction
                                                                                task. This (apparent) vicious circle (between the need of
   Manuscript received October 27, 2006.
   Roberto Bartolini, Emiliano Giovannetti, Simone Marchi, and Simonetta        having the domain represented in the ontology for an
Montemagni work at the Istituto di Linguistica Computazionale (ILC-CNR)         extraction process and the enrichment of the ontology based on
in Pisa, via Moruzzi 1, 56124, Pisa, Italy (emails: {roberto.bartolini,         the results obtained from the extraction) can be turned to a
emiliano.giovannetti, simone.marchi, simonetta.montemagni}@ilc.cnr.it)
   Claudio Andreatta and Roberto Brunelli work at the Istituto per la Ricerca
                                                                                virtuous circle if the necessary conditions are set to let the
Scientifica e Tecnologica ITC-irst in Trento, via Sommarive 18, 38050,          evolving ontology and the information extraction tool interact
Trento, Italy (emails: {andreatta, brunelli}@itc.it)                            in a synergetic way.
   Rodolfo Stecher works at the L3S Research Center, Appelstrasse 9a,
                                                                                   After a brief introduction to the Vike-Framework the
Hannover, Germany (email:stecher@l3s.de)
   Paolo Bouquet works at the Department of Information and                     general methodology is described in section III, including
Communication Technologies of the University of Trento, via Sommarive 14,       specific details about the four different components of the
38050 Trento, Italy (email: paolo.bouquet@unitn.it)                             system pipeline. Some conclusions will be presented in section
                                                                                                                                              2

IV.                                                                 creation and refinement by exploiting existing ontologies.

                  II. THE VIKE-FRAMEWORK                               The approach has been thought and implemented to provide
   The methodology we present is developed inside the Vikef         the possibility of triggering a “virtuous circle”: once
project (Virtual Information and Knowledge Environment              information extracted in the annotation steps are integrated
Framework, IST-2002-507173 - http://www.vikef.net/), which          inside the ontology, the whole process can be restarted, thus
creates an advanced software framework for enabling the             allowing the textual and image annotators to exploit the novel
integrated development of semantic-based Information,               information added to the ontology during the previous run.
Content, and Knowledge (ICK) management systems. Apart                A. Annotation of Text
from the scientific and academic interest related to these fields
                                                                       Semantic annotation of content is a crucial task (probably
of research we have also registered a growing need from
                                                                    the most important) for processing documents to be accessed
industrial parties for automated knowledge elicitation tools to
                                                                    inside the Semantic Web. To semantically annotate a text it is
be applied to their commercial resources, such as product
                                                                    necessary to develop (semi) automatic Information Extraction
catalogues.
                                                                    techniques capable of overcoming the so-called “knowledge
   VIKEF bridges the gap between the partly implicit
                                                                    acquisition bottleneck” typical of Semantic Web related
knowledge and information conveyed in scientific and
                                                                    applications.
business content resources (e.g. text, speech, images) and the
                                                                       Semantic annotation of product catalogues poses different
explicit representation of knowledge required for a targeted
                                                                    challenges at different levels. Concerning the textual part,
and effective access, dissemination, sharing, use, and
                                                                    relative to product descriptions, catalogues do not contain
annotation of ICK resources by scientific and business
                                                                    linguistically sound text: very often, sentences are constituted
communities and their information- and knowledge-based
                                                                    by strictly nominal descriptions, thus discouraging the recourse
work processes.
                                                                    to traditional NLP techniques. On the other hand, product
   R&D within VIKEF builds on and significantly extends the
                                                                    descriptions appear as semi-structured texts where product
current Semantic Web efforts by addressing crucial
                                                                    features appear in a fixed (or at least regular) order. Semantic
operationalisation and application challenges in building up
                                                                    annotation of product catalogues appears therefore as a
real-world semantically enriched virtual information and
                                                                    complex task requiring the combination of different types of
knowledge environments.
                                                                    techniques. Previous works about product information
                                                                    semantic annotations are quite scarce: the two main works
                   III. THE METHODOLOGY
   The task of (semi) automatically annotating content objects
with semantic information requires a multi-phased process,
where multimedia entities discovered within a content object
are coupled with domain knowledge represented by an
ontology. For effective semantic annotation support, linguistic,
image-related and knowledge representation aspects,
approaches, and formats, have to be combined in a synergetic
way. The proposed methodology can be presented as a
pipeline (together with the employed representation formats
within the pipeline), which supports semantic annotation in a
flexible and pragmatic way.
   The pipeline has been implemented as a prototype
developed as part of the VIKEF project and evaluated for
content from the Trade Fair domain.
   The main components of the pipeline are four, and can be
functionally summarized in this way:

  A) Annotation of text – the ontology based semantic
annotation of the textual part of the catalogue;
  B) Annotation of images – the ontology based semantic
annotation of the images appearing in the catalogue;
                                                                     Fig. 1: The general architecture of the annotation system, producing a
  C) Elicitation and refinement – to make information                semantic annotation of product descriptions extracted from the input
extracted by the annotation component machine-                       catalogue.
understandable and to enrich the ontology for further               being the European project CROSSMARC [1] and the Czech
annotations;
                                                                    national project Rainbow [2]. The project CROSSMARC aims
  D) Reuse of existing ontologies – to support the ontology
                                                                    to the electronic-retail product comparison, using a
                                                                                                                                                  3

combination of language engineering, machine learning and                 terminological basis for the development of the final (project)
user modelling, where a domain ontology is used as “semantic              ontology, to be populated using the information derived from
glue” to link together the various analysis modules.                      the multimedia semantic annotation tasks.
   Within the Rainbow project, a multi-layered ontology has                  Once this proto-ontology has been constructed using the
been defined, to integrate the more abstract aspects of the               PTP module, it is possible to run the PISA component (Fig. 2):
domain (domain-neutral), relative to web-sites in general, with           each product description is firstly extracted by pattern
the more specific ones (domain-dependent) relative to                     matching starting from a set of regular expressions, each one
concepts found in sites of small organization offering products           matching a particular product structure. Once a description has
or services. Concerning the information extraction task,                  been isolated, some of its components, interpreted on the basis
Rainbow makes use of lexical indicators and, depending on the             of particular “groups” of the matching regular expressions
document to analyze, applies HTML-centred or free-text-                   (“name”, “type”, “product id”, etc.) can be detected.
centred extractors, in the latter case using shallow parsing                 The remaining “free text” part of the description is then
techniques.                                                               processed by the NLP Manager module, which is able to
   The hybrid methodology we propose (which has been                      access the NLP tools and the ontology, the former to
applied to Italian product catalogues belonging to the furniture          linguistically analyze the text, the latter to resolve possible
domain) makes use of two different approaches: first, pattern             syntactical ambiguities found during the analysis.
matching techniques are resorted to for isolating individual                 Consider the example in Fig. 3 which is relative to the
product descriptions within the textual flow and for identifying          annotation of the description in the box about a given product.
their basic building blocks (e.g. the product name, its price as          Through pattern matching it is possible to extract its name
well as its natural language description). Then, for each                 (“Sanela”), the type (cushion), the price (€12,95), its
identified product, the natural language description is                   dimensions (40 cm of width and 60 cm of length) and the
processed by a battery of NLP tools ([3] [4]) in charge of                product unique identifier (900.582.56), as well as the relations
identifying relevant entities (e.g. colour, material, parts of a
given product) and the relations holding between them (e.g.
part_of, colour_of which can be referred either to the product
itself or to individual parts).
   The architecture in Fig. 1 includes two main components,
the Product catalogue Italian Semantic Annotator (PISA), and
the Product catalogues Terminology Processor (PTP), both
exploiting the battery of NLP modules, the former to




                                                                             Fig. 3. Example of product semantic annotation, including entities
                                                                             and relations among entities.

                                                                          between this information and the product itself (i.e. name_of,
                                                                          price_of, etc). The natural language description identified at
                                                                          this stage is then passed to the NLP Manager which is in
                                                                          charge of acquiring, with the support of the application
                                                                          ontology, further information about the product: in this
  Fig. 2: The general architecture of PISA component, including the
  RegExp Manager committed for the pattern matching step, and the
                                                                          example, the system detects a part (“cover”) and a material
  NLP Manager in charge for the analysis of the “free text” part of the   (“cotton”), as well as a relation holding between them
  product description.                                                    (“made_of”). The box below contains a snippet of the (XML-
                                                                          style) final annotated product description, where some of the
linguistically analyze the free text part of the product
                                                                          extracted features of the product are listed, including the fact,
descriptions, the latter to obtain the TermBank [5] upon which
                                                                          for instance, that the cover of the cushion is made of cotton.
a simple application ontology can be constructed, to be
                                                                             A preliminary evaluation of the system has been done
exploited for disambiguation. This “proto”-ontology forms the
                                                                          analyzing the semantic annotation of the “IKEA 2006” italian
                                                                                                                                                       4

 furniture catalogue. There are two main levels of evaluation
 that have been taken into consideration, relative to the two
 main components of the system: PTP and PISA. Due to the
 lack of a “golden standard” furniture ontology to compare to
 the one obtained with the help of the PTP component, a “task
 based” evaluation technique has been adopted, where the
 coverage of the ontology has been indirectly evaluated on the
 basis of the quality of the obtained annotation.
    To sum up the results of the preliminary evaluation we can
 say the ontology is able to detect approximately the 70% of the
 terms appearing in the free text description of the extracted
 product and put them in the correct relation, thus scoring a 0,7
 of recall. On the other hand, considering that only terms
 included in the ontology can be extracted, the system scores a               Fig. 5. The final result of the geometrical layout analysis algorithm. The
                                                                              figure depicts the subgraphs extracted and the thickness of the edges is
 precision of 1. Concerning the disambiguation functionality,                 related to the score obtained by the caption-figure association.
 our analysis has proved that every time two terms are correctly
 detected and recognized, the disambiguation works. After all,                   A catalogue document usually provides a rich source of
 we can assert that the quality of the linguistic analysis is                 structure. As far as the first algorithm is based on a modified
 strongly related to the ontology coverage.                                   version of the data mining system SUBDUE ([6]). SUBDUE is
    From the pattern matching point of view, the system has                   a system that discovers interesting substructures in structural
 scored a 0,9 of precision and 0,8 of recall, extracting 800                  data based on inexact graph matching techniques and
 products out of 1000 applying 9 different regular expressions.               computationally constrained beam search algorithm guided by
    Concerning the task of semantic text annotation future                    heuristics. In particular, a substructure is evaluated based on
 directions of research include, on the one hand the application              how well it can compress the graph, according to the minimum
 of the presented technique to different product catalogues (not              description length principle. Highly compressing structures
 necessarily in italian) and, on the other hand, the evolution of             can be considered as building block of the entire data set.
 the methodology by the integration of a more sophisticated                      To apply the algorithm to a document, we construct a graph
 ontology learning from text system currently under                           representing the page layout (Fig.4). The graph representation
 development within the DylanLab of the ILC-CNR in Pisa.                      consists of two types of vertices, image (classified in three
   B. Annotation of Images                                                    categories: highlight, scene and miscellaneous) and text
                                                                              paragraph, and one type of arc representing the spatial
    The information conveyed by a multimedia document is
                                                                              relationships between the vertices (e.g. top-left, overlapping).
 analyzed and extracted at two different levels: the document
                                                                                 The algorithm can provide abstract structured components
 level, in which the document (geometrical) layout is
                                                                              resulting in a hierarchical view of the document that can be
 investigated considering both text and images, and the image
                                                                              analyzed at many levels of detail and focus. Usually in
 level, in which pictures are examined in order to describe their
                                                                              catalogues a common recurring structure is formed by image
 visual content and to recognize the depicted objects. Two
                                                                              and caption (Fig.5). It is remarkable fact that, while text and
 interacting methodologies were adopted to analyze the
                                                                              images may be separately ambiguous, jointly they tend not to
 document at the two different levels, both of them exploiting a
                                                                              be. Establishing from the catalogue structure meaningful links
 domain ontology.                                                             between images and text paragraphs allows exploiting the
                                                                              semantic annotation of the textual part to semantically annotate
                                                                              the images or to guide image processing algorithms in order to
                                                                              recognize the object depicted or to infer correspondence
                                                                              between words and particular image structures (as in [7][8]).
                                                                              Domain knowledge can be also added to guide the discovery
                                                                              process and to separate the important substructures from the
                                                                              irrelevant ones.
                                                                                 The methodology adopted to describe the image content
                                                                              relies on MEMORI ([9][10]), a system for the detection and
                                                                              recognition of objects in digital images using pre stored visual
                                                                              information obtained from shots or 3D models. Object
                                                                              recognition plays a crucial role in Computer Vision, especially
                                                                              in the semantic description of visual content. Although object
Fig. 4. The graph structure representing the page layout. Each graph vertex
is associated to a text caption or to a product image. Images are extracted   recognition has been intensely studied, it still remains a hard
from the page background by means of a segmentation algorithm.                and computational expensive problem. The main difficulty in
                                                                                                                                                      5

                                                                                version of objects taken from COIL-100 [13], a database
                                                                                widely adopted in the object recognition community.
                                                                                   The achieved recognition rates were compared with other
                                                                                state of the art recognition methods ([14][15]), MEMORI
                                                                                performs best in all experiments, regardless of the number of
                                                                                training views. The recognition rate is over 96% with as few as
                                                                                only 4 training views, demonstrating the robustness of the
                                                                                method. New techniques to make the system robust with
                                                                                respect to illumination changes and partial occlusions are
                                                                                currently under development. Future work will extend the
                                                                                evaluation to more complex and non synthetic images. The
                                                                                potential synergy between visual similarity and semantic
Fig. 6. The MEMORI system analyzes color digital images in order to
                                                                                similarity measures based on ontologies will also be
detect and recognize objects. The figure shows on the left an image extracted   investigated and exploited.
from a catalog, and on the right the recognized object along with the
resulting annotation compliant to the MPEG-7 standard.
                                                                                  C. Elicitation and Refinement
the description of image content is the lack of information
                                                                                  Once information has been extracted from text and images,
about the kind and the number of objects possibly present.
                                                                                and stored in the XML-based format partially described in
Moreover, objects can appear at different locations in the
                                                                                Figure 3, the next task is to make its meaning explicit (and thus
image and they can be deformed, rotated, rescaled, differently
                                                                                machine-understandable) by transforming it into a suitable
illuminated or even occluded with respect to a reference view.
                                                                                RDF/OWL representation.
In order to simplify object detection and to reduce
                                                                                  This transformation has two parts1:
computational cost, many systems (e.g. [11]) limit the
recognition to specific classes of objects. In these cases, a
priori knowledge permits to select the most descriptive                         •    The first step, called syntactic elicitation, aims at
features for the objects at hand and to circumscribe the search                      producing a collection of RDF statements, namely an
space. However, even under this restriction, high classification                     explicit representation of the knowledge content of the
performance is seldom reached. Moreover, many object                                 XML elements in the schema. For example, we want to
recognition systems rely on user interaction to label as wrong                       construct the fact (expressed by an RDF statement) that
or correct the returned items or to improve system response                          the string “900.582.56” is the ID of “SANELA” (actually,
[12].                                                                                of some entity which happens to be a product named
   The MEMORI system tackles the object recognition                                  “SANELA”).
problem by segmenting the input image into regions, applying                    •    The second step, called semantic elicitation and
a region grouping algorithm, which interacts with an object                          refinement, aims at leveraging the RDF representation to a
classifier, producing a set of object candidates and filtering the                   full semantic level, where entities are assigned a proper
candidate list (Fig.6). The object segmentation and recognition                      identifier (a URI), properties are associated to some data
modules need domain knowledge in the form of object                                  type or object property in some domain ontology (more
snapshots from multiple view points.                                                 precisely, are replaced by the URI of their ontological
   Supporting the extraction and recognition process with a                          counterpart), and finally entities are assigned to the most
domain ontology allows the development of context-aware                              appropriate class (e.g. “SANELA” should be assigned to
strategies in order to guide and focus the multimedia semantic                       the class of cushions, which belongs to a hierarchy of
analysis. The knowledge of the image context permits the                             classes which is very likely to include among its ancestors
object recognition module to restrict the search to a limited                        the class of products and of physical entity).
subset of objects and to refine the heuristics weighting object
hypotheses according to their own context. The other way                           The first step, in our implementation, is rather simple, as it
round content based image analysis allows the acquisition and                   is performed via a simple XSL transformation from the XML
exploitation of similarity relations among multimedia entities                  schema depicted in Figure 3 to a collection of RDF statements
thus allowing to refine and enrich the knowledge                                expressed in the RDF XML syntax. The only tricky part is the
representation modeled in the domain ontology.                                  decision of what statements are to be produced from the XML
   The experiments and tests conducted so far show that the                     file (in fact, we observe that the simple snippet in Figure 3
proposed approach is a promising method for the detection and                   contains a large number of implicit statements, and therefore
recognition of objects and for image annotation. An extensive                   one needs to select the most useful ones). The outcome of this
assessment of the performance was conducted on a synthetic                      step is an RDF file containing a potentially large number of
test-bed. A set of synthetic images has been created by
drawing on a non uniform background rotated and rescaled                           1
                                                                                     More details are provided in [11], where the entire VIKEF knowledge
                                                                                pipeline is described thoroughly.
                                                                                                                                                6

facts about the product catalog.                                             modeled in different ways by different ontology engineers.
   The second step is by far more interesting. Indeed, the goal              Even if the domains are the same, the perspectives under
is to qualify the RDF statements generated in the previous step              which they are modeled are usually different. One way to
by linking their constituents to some pre-existing ontology2                 detect the differences (or similarities) of the modeling
(this is what we call semantic elicitation). In our approach, this           perspectives is by analyzing the relations existing between the
task can be decomposed in two different sub-tasks: (1) entity-               concepts. Due to the already mentioned intrinsic difficulties of
level elicitation, and (2) class/property-level elicitation.                 the ontology engineering task, advanced research is being
      The first sub-task is implemented as a problem of                      performed in order to find a solution to (partially) overcome
matching entity descriptions onto the entities stored in an                  these problems. The method and tools aim to support the
repository called OKKAM. A full description of OKKAM is                      ontology engineer in finding existing Semantic Web ontologies
beyond the scope of this paper; the main idea is that it creates             written in OWL (or parts of them) that model the targeted
and stores URIs (which can then be reused through multiple                   domain in a similar way; this is, with the same modeling
applications) together with additional information, including                perspective. The approach relies on the existence of searchable
any known description of the entity itself. Once a new entity is             pool(s) of ontologies where candidate ontologies can be
recognized in any digital document (plain text documents,                    searched for and pre-selected based on a user-specified set of
relational databases, HTML pages, and so on), OKKAM can                      desired ontological classes (this set is called fragment). The
be queried to check whether that entity already has a known                  pre-selection process searches for a specific percentile match
URI, which is returned for reuse; if no match is found, a new                of labels and synonyms of them appearing in the user specified
entity is created, and stored with its URI and all available                 fragment and ontologies in the pool of ontologies. Currently
descriptions.                                                                we are using the SWOOGLE (http://swoogle.umbc.edu/)
   The second task uses a tool, called CtxMatch2.0 [16] for                  ontology repository, but any other similar repository can be
schema and ontology matching, which is a VIKEF motivated                     easily incorporated. The pre-selected ontologies are then
extension of a pre-existing tool. The details of semantic                    further analyzed in detail by analyzing the labels of classes and
elicitation with CtxMatch2.0 are provided in other papers (see               relations in combination with a lexical resource (currently
e.g. [17]). In the scenario described in this paper, the tool is             WordNet). Labels are analyzed and word meanings are related
used to match categories and relations extracted from catalogs               by various means in
to classes and properties found in any available domain                      order to compute the
ontology. The matching method uses two sources of                            likelihood of one or the
information: the hierarchical structure of elements (which is                other possible sense of a
particularly important in product catalogs) and lexical                      word to hold in the
information associated both to catalog labels and ontology                   given context. This is
elements.                                                                    then represented in two
   The outcome of this second step is a refined RDF file where               ways, in a logical
linguistic descriptions of entities are replaced by unique URIs              formulae (following the
(which can be used later to merge RDF graphs produced from                   [18] approach) used for
different catalogs or in general from different collections of               detecting similarity of
documents), category names are replaced by the URIs of                       concepts       using     a
ontological classes (if available), and relation names are                   reasoner and as weights
replaced by the URIs of data type or object properties (if                   of relevance for each
available). We notice that entities may be classified using                  possible sense in order
complex concepts, compositionally constructed from their                     to compare concepts
linguistic descriptions (for example, “SANELA” will                          and decide if they are
                                                                                                         Fig. 7. The architecture of the described
correspond to a cushion made of cotton).                                     similar and to which approach for the re-use of existing
   A final remark is that the outcome of this elicitation and                degree this similarity ontologies.
refinement phase includes a mixture of knowledge coming                      holds. The relevance
from multimedia sources, namely text and pictures, in a single               measures are combined with the logical results in order to give
representation.                                                              a measure of the closeness of the considered concepts. The
                                                                             architecture of this approach is presented in Fig. 7. The
  D. Reuse of existing ontologies
                                                                             ontology engineer can then decide to use any of the proposed
  Ontology engineering is a very time consuming and                          ontologies as a basis for performing the required extension of
subjective task. It is time consuming because it is hard to                  the initially specified fragment. There exist several different
model not only new, but also already known domains, and                      techniques for matching ontologies, using very different
subjective since in most of the cases the same domains are                   approaches. Ontology mapping, ontology alignment and
                                                                             ontology matching fall all under a broader research area where
   2
     See next section for what concerns the support we provide to build an   correspondences between two schemata need to be discovered.
ontology by reusing existing ontologies.
                                                                                                                                              7

In some cases rules for mapping schemata need to be created,       extraction from multimedia sources, in particular from product
in others it is enough to know the relations existing between      catalogues. Our methodology tries to enable a virtuous circle
two or more elements from different schemata. The area of          by which domain ontologies are used in the extraction process,
alignment or matching of schemata has been long studied and        and at the same time the extraction process becomes a way for
many approaches exist in integrating database schemata, XML        creating or extending the available ontologies. The result of
schemata in general, query mediation, etc. The closest             the extraction process is a semantically rich representation of
approaches to ontology alignment are the approaches for            the content of catalogs, where knowledge extracted from texts
schemata alignment, especially taxonomic hierarchies or            (e.g. product descriptions) is integrated with knowledge
classification hierarchies in general. A comprehensive study is    extracted from pictures, and made available for any service
presented in [19]. In the following we will present some           one may want to build on top of it.
approaches that are more closely related to the one presented
in this section.                                                                                 REFERENCES
• The iPrompt [20] tool suite contains tools for ontology          [1]  M. T. Pazienza, A. Stellato, M. Vindigni, Cross-lingual multi-agent
     merging and alignment among others. Its input are a set of         retail comparison, AWSS 2003 Workshop on Applications, Products
                                                                        and Services of Web-based Support Systems, 13 Oct. 2003, Halifax
     pairs of related/similar concept from different ontologies,        Canada in conjunction with Web-Intelligence international Conference
     based on them it proceeds to find other pairs of similar           WI 2003.
     concepts relying heavily on the structure.                    [2] V. Svatek, J. Kosek, M. Labsky, J. Braza, M. Kavalec, M. Vacura, V.
                                                                        Vavra, V. Snasel: Rainbow – multiway semantic analysis of websites, in
• The GLUE approach [21] uses a machine learning                        Proceedings of the 14th International Workshop on Database and
     approach to analyze instances. The similarity of concept           Expert Systems Applications (DEXA'03), 2003, p. 635.
     meaning is defined based on the joint probability             [3] A. Lenci, S. Montemagni, V. Pirrelli, Chunk-it. An Italian shallow
                                                                        parser for robust syntactic annotation. In A. Zampolli, N. Calzolari, L.
     distribution of the involved concepts.
                                                                        Cignoni, (eds.), Computational Linguistics in Pisa - Linguistica
• The QOM [22] approach is based on a combination of                    Computazionale a Pisa. Linguistica Computazionale, Special Issue,
     several approaches that analyze features of different              XVI-XVII, (2003). Pisa-Roma, IEPI. Tomo I, 2003, pp. 353–386.
     ontologies. The similarity of concepts is later computed      [4] R. Bartolini, A. Lenci, S. Montemagni, V. Pirrelli, Hybrid Cosntraints
                                                                        for Robust Parsing: First Experiments and Evaluation, in Proceedings of
     by aggregation of different similarity. The approach is            LREC 2004: Fourth International Conference on Language Resources
     iterative; every iteration uses results obtained in the            and Evaluation. Lisbon, Portugal, 26th, 27th & 28 May 2004, Volume
     previous ones in order to enhance quality of the results.          III, Paris, pp. 859-862.
                                                                   [5] R. Bartolini, D. Giorgetti, A. Lenci, S. Montemagni, V. Pirrelli,
• Cupid [23] presents an approach that considers schemata               Automatic Incremental Term Acquisition from Domain Corpora, in
     and not instances information for matching XML                     Proceedings of the 7th International conference on Terminology and
     Schemata mainly. It uses linguistic information and the            Knowledge Engineering (TKE2005), Copenhagen Business School,
                                                                        Copenhagen, Denmark, 2005, pp. 17–18.
     structure of the schema. This approach uses also keys,        [6] J. Coble, R. Rathi, D. J. Cook and L. B. Holder, Iterative Structure
     referential constraints and other schema information to            Discovery in Graph-Based Data, International Journal on Artificial
     derive more precise matching results. Cupid also handles           Intelligence Tools, 14(1-2):101-124, 2005
                                                                   [7] P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth. Object
     context dependent matches of shared type definitions that          Recognition as Machine Translation: Learning a lexicon for a fixed
     are used in several larger structures in the schema.               image vocabulary. In European Conference on Computer Vision
• The MoA [24] is an approach to merge and align OWL                    (ECCV) Copenhagen, 2002
     ontologies. This approach uses linguistic methods to          [8] C. Carson, S. Belongie, H. Greenspan, and J. Malik. Blobworld: Image
                                                                        segmentation using expectation-maximization and its application to
     disambiguate the meaning of concepts and proposes an               image querying. IEEE Transactions on Pattern Analysis and Machine
     algorithm to detect and represent in a Semantic Bridge             Intelligence, 24(8):1026-1038, 2002.
     semantic equivalence between concepts. The Semantic           [9] M. Lecca, Object Recognition in Color Images by the Self Configuring
                                                                        System MEMORI, International Journal of Signal Processing, Vol. 3,
     Bridge can represent equivalence of concepts and
                                                                        No. 3, pp. 176-185, World Enformatika Society, 2006
     properties, subconcept and subproperties and identity of      [10] C. Andreatta, M. Lecca and S. Messelodi, Memory-based Object
     instances. A merging algorithm is also presented.                  Recognition in digital Images, 10th International Fall Workshop -
• The OMEN [25] approach uses Bayesian Networks (BN)                    Vision, Modeling, and Visualization - VMV 2005, Erlangen, Germany,
                                                                        November 16-18, 2005
     for deciding the match of concepts based on an initial        [11] A. Hoogs, R. Collins, R. Kaucic, and J. Mundy. A common set
     hand made match. Based on known matches it analyzes                ofperceptual observables for grouping, figure - ground discrimination,
     the structure (e.g. domain and range of properties) to             and texture classification. IEEE Transaction on Pattern Analysis and
                                                                        Machine Intelligence, (4):458–474, 2003.
     derive further matches. The algorithm operates iteratively    [12] B. Ko and H. Byun. Extracting Salient Regions And Learning
     and produces in every iteration a new match which is used          Importance Scores In Region-Based Image Retrieval. International
     in the following interactions.                                     Journal of Patter Recognition and Artificial Intelligence, (17(8)):1349–
                                                                        1367, 2003.
                                                                   [13] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library
                                                                        (COIL-100). In Technical Report CUCS-006-96, Columbia University,
                      IV. CONCLUSIONS                                   1996.
                                                                   [14] S. Obdrzalek and J. Matas. Object recognition using local affine frames
  In this paper we presented a methodology for supporting the           on distinguished regions. In Paul L. Rosin and David Marshall, editors,
use of background ontologies in the task of information
                                                                              8

     Proceedings of the British Machine Vision Conference, volume 1, pages
     113-122, London, UK, September 2002. BMVA.
[15] Ming-Hsuan Yang, D. Roth and N. Ahuja. Learning to Recognize 3D
     Objects with SNoW. In ECCV-2000, The Proceedings of the Sixth
     European Conference on Computer Vision, volume 1, pages 439-454,
     2000.
[16] Bouquet, P., Serafini, L., Zanobini, S.: Peer-to-peer semantic
     coordination. Journal of Web Semantics 2 (2005)
[17] Bouquet, P., Serafini, L., Zanobini, S., Sceffer S.: Bootstrapping
     semantics on the web. meaning elicitation from schemas. WWW2006,
     Edinburgh (Scotland, UK), May 2006
[18] R. Stecher, C. Niederee, P. Bouquet, T. Jaquin, S. Ait-Mokhtar, S.
     Montemagni, R. Brunelli, G. Demetriou, "Enabling a knowledge supply
     chain: from content resources to ontologies", in Proceedings of the
     ESWC2006 workshop on "Mastering the Gap: from information
     extraction to semantic representation", CEUR workshop proceedings,
     vol. 187, Budva (Montenegro), 11 June 2006.
[19] Pavel Shvaiko, Jérôme Euzenat: “A Survey of Schema-Based Matching
     Approaches”, in Journal of Data Semantics IV, 2005.
[20] N. Noy and M. Musen: “The PROMPT suite: Interactive tools for
     ontology merging and mapping”, in Technical report, SMI, Stanford
     University, CA, USA (2002)
[21] A. Doan, P. Domingos and A. Halevy: “Learning to match the schemas
     of data sources: A multistrategy approach”, 2003.
[22] Marc Ehrig and Steffen Staab: „QOM - Quick Ontology Mapping“, in
     Proceedings of the International Semantic Web Conference, 2004.
[23] Jayant Madhavan, Philip A. Bernstein and Erhard Rahm: “Generic
     Schema Matching with Cupid”, in The VLDB Journal, 2001.
[24] Jaehong Kim, Minsu Jang, Young-Guk Ha, Joo-Chan Sohn and Sang Jo
     Lee: “MoA: OWL Ontology Merging and Alignment Tool for the
     Semantic Web” in Innovations in Applied Artificial Intelligence: 18th
     International Conference on Industrial and Engineering Applications of
     Artificial Intelligence and Expert Systems, 2005.
[25] Prasenjit Mitra, Natasha F. Noy and Anuj Jaiswal: “OMEN: A
     Probabilistic Ontology Mapping Tool”, in Proceedings of the
     International Semantic Web Conference, 2005.