=Paper=
{{Paper
|id=Vol-201/paper-9
|storemode=property
|title=Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues
|pdfUrl=https://ceur-ws.org/Vol-201/41.pdf
|volume=Vol-201
|dblpUrl=https://dblp.org/rec/conf/swap/BartoliniGMMA06
}}
==Multimedia Information Extraction in Ontology-based Semantic Annotation of Product Catalogues==
1
Multimedia Information Extraction in
Ontology-based Semantic Annotation of
Product Catalogues
Roberto Bartolini, Emiliano Giovannetti, Simone Marchi, and Simonetta Montemagni, ILC-CNR
Claudio Andreatta and Roberto Brunelli, ITC-irst
Rodolfo Stecher, Fraunhofer IPSI
Paolo Bouquet, DIT-University of Trento
The field of semi-automatic information extraction from
Abstract—The demand for efficient methods for extracting multimedia corpora is central for overcoming the so-called
knowledge from multimedia content has led to a growing research “knowledge acquisition bottleneck”. Multimedia sources of
community investigating the convergence of multimedia and information, such as product catalogues, contain text
knowledge technologies. In this paper we describe a methodology
for extracting multimedia information from product catalogues
(captions) and images (pictures of the products) thus requiring
empowered by the synergetic use and extension of a domain information extraction approaches combining several different
ontology. The methodology was implemented in the Trade Fair techniques, ranging from Natural Language Processing to
Advanced Semantic Annotation Pipeline of the VIKE-framework. Image Analysis and Understanding. In our approach we have
three main aspects to consider: 1) the information extraction
Index Terms—Semantic Web Technologies, Ontology Creation, per se, 2) the ontology, its use and creation, and 3) the usage
Ontology Extraction, Ontology Evolution, Semantic Annotation
of the ontology in the information extraction process and the
of Multimedia Content
synergy between different kinds of extraction processes.
I. INTRODUCTION The development of adequate ontologies itself is one of the
knowledge acquisition bottlenecks: the use of (semi-)
automatic tools for semantic information extraction from
E FFECTIVE acquisition, organization, processing, use and
sharing of the knowledge embedded in textual and
multimedia content play a major role for competitiveness in
multimedia corpora is very promising but, to be efficiently
exploited, must have access to a formal representation of a
given domain, i.e., an ontology. We support the ontology
the modern information society and for the emerging
creation process in two different and complementary ways,
knowledge economy. However, this wealth of knowledge
ontology learning and reuse of existing ontologies. The
implicitly conveyed in the vast amount of available digital
ontology learning approach takes advantage of the results of
content is nowadays only accessible, if considerable manual
the extraction to enrich the ontology, and the reuse support
effort has been invested into its interpretation and semantic
provides methods and tools to re-use already existing
annotation, which is possible only for a small fraction of the
ontologies which capture the target domain under a similar
available content.
modelling perspective as the one of interest for the extraction
task. This (apparent) vicious circle (between the need of
Manuscript received October 27, 2006.
Roberto Bartolini, Emiliano Giovannetti, Simone Marchi, and Simonetta having the domain represented in the ontology for an
Montemagni work at the Istituto di Linguistica Computazionale (ILC-CNR) extraction process and the enrichment of the ontology based on
in Pisa, via Moruzzi 1, 56124, Pisa, Italy (emails: {roberto.bartolini, the results obtained from the extraction) can be turned to a
emiliano.giovannetti, simone.marchi, simonetta.montemagni}@ilc.cnr.it)
Claudio Andreatta and Roberto Brunelli work at the Istituto per la Ricerca
virtuous circle if the necessary conditions are set to let the
Scientifica e Tecnologica ITC-irst in Trento, via Sommarive 18, 38050, evolving ontology and the information extraction tool interact
Trento, Italy (emails: {andreatta, brunelli}@itc.it) in a synergetic way.
Rodolfo Stecher works at the L3S Research Center, Appelstrasse 9a,
After a brief introduction to the Vike-Framework the
Hannover, Germany (email:stecher@l3s.de)
Paolo Bouquet works at the Department of Information and general methodology is described in section III, including
Communication Technologies of the University of Trento, via Sommarive 14, specific details about the four different components of the
38050 Trento, Italy (email: paolo.bouquet@unitn.it) system pipeline. Some conclusions will be presented in section
2
IV. creation and refinement by exploiting existing ontologies.
II. THE VIKE-FRAMEWORK The approach has been thought and implemented to provide
The methodology we present is developed inside the Vikef the possibility of triggering a “virtuous circle”: once
project (Virtual Information and Knowledge Environment information extracted in the annotation steps are integrated
Framework, IST-2002-507173 - http://www.vikef.net/), which inside the ontology, the whole process can be restarted, thus
creates an advanced software framework for enabling the allowing the textual and image annotators to exploit the novel
integrated development of semantic-based Information, information added to the ontology during the previous run.
Content, and Knowledge (ICK) management systems. Apart A. Annotation of Text
from the scientific and academic interest related to these fields
Semantic annotation of content is a crucial task (probably
of research we have also registered a growing need from
the most important) for processing documents to be accessed
industrial parties for automated knowledge elicitation tools to
inside the Semantic Web. To semantically annotate a text it is
be applied to their commercial resources, such as product
necessary to develop (semi) automatic Information Extraction
catalogues.
techniques capable of overcoming the so-called “knowledge
VIKEF bridges the gap between the partly implicit
acquisition bottleneck” typical of Semantic Web related
knowledge and information conveyed in scientific and
applications.
business content resources (e.g. text, speech, images) and the
Semantic annotation of product catalogues poses different
explicit representation of knowledge required for a targeted
challenges at different levels. Concerning the textual part,
and effective access, dissemination, sharing, use, and
relative to product descriptions, catalogues do not contain
annotation of ICK resources by scientific and business
linguistically sound text: very often, sentences are constituted
communities and their information- and knowledge-based
by strictly nominal descriptions, thus discouraging the recourse
work processes.
to traditional NLP techniques. On the other hand, product
R&D within VIKEF builds on and significantly extends the
descriptions appear as semi-structured texts where product
current Semantic Web efforts by addressing crucial
features appear in a fixed (or at least regular) order. Semantic
operationalisation and application challenges in building up
annotation of product catalogues appears therefore as a
real-world semantically enriched virtual information and
complex task requiring the combination of different types of
knowledge environments.
techniques. Previous works about product information
semantic annotations are quite scarce: the two main works
III. THE METHODOLOGY
The task of (semi) automatically annotating content objects
with semantic information requires a multi-phased process,
where multimedia entities discovered within a content object
are coupled with domain knowledge represented by an
ontology. For effective semantic annotation support, linguistic,
image-related and knowledge representation aspects,
approaches, and formats, have to be combined in a synergetic
way. The proposed methodology can be presented as a
pipeline (together with the employed representation formats
within the pipeline), which supports semantic annotation in a
flexible and pragmatic way.
The pipeline has been implemented as a prototype
developed as part of the VIKEF project and evaluated for
content from the Trade Fair domain.
The main components of the pipeline are four, and can be
functionally summarized in this way:
A) Annotation of text – the ontology based semantic
annotation of the textual part of the catalogue;
B) Annotation of images – the ontology based semantic
annotation of the images appearing in the catalogue;
Fig. 1: The general architecture of the annotation system, producing a
C) Elicitation and refinement – to make information semantic annotation of product descriptions extracted from the input
extracted by the annotation component machine- catalogue.
understandable and to enrich the ontology for further being the European project CROSSMARC [1] and the Czech
annotations;
national project Rainbow [2]. The project CROSSMARC aims
D) Reuse of existing ontologies – to support the ontology
to the electronic-retail product comparison, using a
3
combination of language engineering, machine learning and terminological basis for the development of the final (project)
user modelling, where a domain ontology is used as “semantic ontology, to be populated using the information derived from
glue” to link together the various analysis modules. the multimedia semantic annotation tasks.
Within the Rainbow project, a multi-layered ontology has Once this proto-ontology has been constructed using the
been defined, to integrate the more abstract aspects of the PTP module, it is possible to run the PISA component (Fig. 2):
domain (domain-neutral), relative to web-sites in general, with each product description is firstly extracted by pattern
the more specific ones (domain-dependent) relative to matching starting from a set of regular expressions, each one
concepts found in sites of small organization offering products matching a particular product structure. Once a description has
or services. Concerning the information extraction task, been isolated, some of its components, interpreted on the basis
Rainbow makes use of lexical indicators and, depending on the of particular “groups” of the matching regular expressions
document to analyze, applies HTML-centred or free-text- (“name”, “type”, “product id”, etc.) can be detected.
centred extractors, in the latter case using shallow parsing The remaining “free text” part of the description is then
techniques. processed by the NLP Manager module, which is able to
The hybrid methodology we propose (which has been access the NLP tools and the ontology, the former to
applied to Italian product catalogues belonging to the furniture linguistically analyze the text, the latter to resolve possible
domain) makes use of two different approaches: first, pattern syntactical ambiguities found during the analysis.
matching techniques are resorted to for isolating individual Consider the example in Fig. 3 which is relative to the
product descriptions within the textual flow and for identifying annotation of the description in the box about a given product.
their basic building blocks (e.g. the product name, its price as Through pattern matching it is possible to extract its name
well as its natural language description). Then, for each (“Sanela”), the type (cushion), the price (€12,95), its
identified product, the natural language description is dimensions (40 cm of width and 60 cm of length) and the
processed by a battery of NLP tools ([3] [4]) in charge of product unique identifier (900.582.56), as well as the relations
identifying relevant entities (e.g. colour, material, parts of a
given product) and the relations holding between them (e.g.
part_of, colour_of which can be referred either to the product
itself or to individual parts).
The architecture in Fig. 1 includes two main components,
the Product catalogue Italian Semantic Annotator (PISA), and
the Product catalogues Terminology Processor (PTP), both
exploiting the battery of NLP modules, the former to
Fig. 3. Example of product semantic annotation, including entities
and relations among entities.
between this information and the product itself (i.e. name_of,
price_of, etc). The natural language description identified at
this stage is then passed to the NLP Manager which is in
charge of acquiring, with the support of the application
ontology, further information about the product: in this
Fig. 2: The general architecture of PISA component, including the
RegExp Manager committed for the pattern matching step, and the
example, the system detects a part (“cover”) and a material
NLP Manager in charge for the analysis of the “free text” part of the (“cotton”), as well as a relation holding between them
product description. (“made_of”). The box below contains a snippet of the (XML-
style) final annotated product description, where some of the
linguistically analyze the free text part of the product
extracted features of the product are listed, including the fact,
descriptions, the latter to obtain the TermBank [5] upon which
for instance, that the cover of the cushion is made of cotton.
a simple application ontology can be constructed, to be
A preliminary evaluation of the system has been done
exploited for disambiguation. This “proto”-ontology forms the
analyzing the semantic annotation of the “IKEA 2006” italian
4
furniture catalogue. There are two main levels of evaluation
that have been taken into consideration, relative to the two
main components of the system: PTP and PISA. Due to the
lack of a “golden standard” furniture ontology to compare to
the one obtained with the help of the PTP component, a “task
based” evaluation technique has been adopted, where the
coverage of the ontology has been indirectly evaluated on the
basis of the quality of the obtained annotation.
To sum up the results of the preliminary evaluation we can
say the ontology is able to detect approximately the 70% of the
terms appearing in the free text description of the extracted
product and put them in the correct relation, thus scoring a 0,7
of recall. On the other hand, considering that only terms
included in the ontology can be extracted, the system scores a Fig. 5. The final result of the geometrical layout analysis algorithm. The
figure depicts the subgraphs extracted and the thickness of the edges is
precision of 1. Concerning the disambiguation functionality, related to the score obtained by the caption-figure association.
our analysis has proved that every time two terms are correctly
detected and recognized, the disambiguation works. After all, A catalogue document usually provides a rich source of
we can assert that the quality of the linguistic analysis is structure. As far as the first algorithm is based on a modified
strongly related to the ontology coverage. version of the data mining system SUBDUE ([6]). SUBDUE is
From the pattern matching point of view, the system has a system that discovers interesting substructures in structural
scored a 0,9 of precision and 0,8 of recall, extracting 800 data based on inexact graph matching techniques and
products out of 1000 applying 9 different regular expressions. computationally constrained beam search algorithm guided by
Concerning the task of semantic text annotation future heuristics. In particular, a substructure is evaluated based on
directions of research include, on the one hand the application how well it can compress the graph, according to the minimum
of the presented technique to different product catalogues (not description length principle. Highly compressing structures
necessarily in italian) and, on the other hand, the evolution of can be considered as building block of the entire data set.
the methodology by the integration of a more sophisticated To apply the algorithm to a document, we construct a graph
ontology learning from text system currently under representing the page layout (Fig.4). The graph representation
development within the DylanLab of the ILC-CNR in Pisa. consists of two types of vertices, image (classified in three
B. Annotation of Images categories: highlight, scene and miscellaneous) and text
paragraph, and one type of arc representing the spatial
The information conveyed by a multimedia document is
relationships between the vertices (e.g. top-left, overlapping).
analyzed and extracted at two different levels: the document
The algorithm can provide abstract structured components
level, in which the document (geometrical) layout is
resulting in a hierarchical view of the document that can be
investigated considering both text and images, and the image
analyzed at many levels of detail and focus. Usually in
level, in which pictures are examined in order to describe their
catalogues a common recurring structure is formed by image
visual content and to recognize the depicted objects. Two
and caption (Fig.5). It is remarkable fact that, while text and
interacting methodologies were adopted to analyze the
images may be separately ambiguous, jointly they tend not to
document at the two different levels, both of them exploiting a
be. Establishing from the catalogue structure meaningful links
domain ontology. between images and text paragraphs allows exploiting the
semantic annotation of the textual part to semantically annotate
the images or to guide image processing algorithms in order to
recognize the object depicted or to infer correspondence
between words and particular image structures (as in [7][8]).
Domain knowledge can be also added to guide the discovery
process and to separate the important substructures from the
irrelevant ones.
The methodology adopted to describe the image content
relies on MEMORI ([9][10]), a system for the detection and
recognition of objects in digital images using pre stored visual
information obtained from shots or 3D models. Object
recognition plays a crucial role in Computer Vision, especially
in the semantic description of visual content. Although object
Fig. 4. The graph structure representing the page layout. Each graph vertex
is associated to a text caption or to a product image. Images are extracted recognition has been intensely studied, it still remains a hard
from the page background by means of a segmentation algorithm. and computational expensive problem. The main difficulty in
5
version of objects taken from COIL-100 [13], a database
widely adopted in the object recognition community.
The achieved recognition rates were compared with other
state of the art recognition methods ([14][15]), MEMORI
performs best in all experiments, regardless of the number of
training views. The recognition rate is over 96% with as few as
only 4 training views, demonstrating the robustness of the
method. New techniques to make the system robust with
respect to illumination changes and partial occlusions are
currently under development. Future work will extend the
evaluation to more complex and non synthetic images. The
potential synergy between visual similarity and semantic
Fig. 6. The MEMORI system analyzes color digital images in order to
similarity measures based on ontologies will also be
detect and recognize objects. The figure shows on the left an image extracted investigated and exploited.
from a catalog, and on the right the recognized object along with the
resulting annotation compliant to the MPEG-7 standard.
C. Elicitation and Refinement
the description of image content is the lack of information
Once information has been extracted from text and images,
about the kind and the number of objects possibly present.
and stored in the XML-based format partially described in
Moreover, objects can appear at different locations in the
Figure 3, the next task is to make its meaning explicit (and thus
image and they can be deformed, rotated, rescaled, differently
machine-understandable) by transforming it into a suitable
illuminated or even occluded with respect to a reference view.
RDF/OWL representation.
In order to simplify object detection and to reduce
This transformation has two parts1:
computational cost, many systems (e.g. [11]) limit the
recognition to specific classes of objects. In these cases, a
priori knowledge permits to select the most descriptive • The first step, called syntactic elicitation, aims at
features for the objects at hand and to circumscribe the search producing a collection of RDF statements, namely an
space. However, even under this restriction, high classification explicit representation of the knowledge content of the
performance is seldom reached. Moreover, many object XML elements in the schema. For example, we want to
recognition systems rely on user interaction to label as wrong construct the fact (expressed by an RDF statement) that
or correct the returned items or to improve system response the string “900.582.56” is the ID of “SANELA” (actually,
[12]. of some entity which happens to be a product named
The MEMORI system tackles the object recognition “SANELA”).
problem by segmenting the input image into regions, applying • The second step, called semantic elicitation and
a region grouping algorithm, which interacts with an object refinement, aims at leveraging the RDF representation to a
classifier, producing a set of object candidates and filtering the full semantic level, where entities are assigned a proper
candidate list (Fig.6). The object segmentation and recognition identifier (a URI), properties are associated to some data
modules need domain knowledge in the form of object type or object property in some domain ontology (more
snapshots from multiple view points. precisely, are replaced by the URI of their ontological
Supporting the extraction and recognition process with a counterpart), and finally entities are assigned to the most
domain ontology allows the development of context-aware appropriate class (e.g. “SANELA” should be assigned to
strategies in order to guide and focus the multimedia semantic the class of cushions, which belongs to a hierarchy of
analysis. The knowledge of the image context permits the classes which is very likely to include among its ancestors
object recognition module to restrict the search to a limited the class of products and of physical entity).
subset of objects and to refine the heuristics weighting object
hypotheses according to their own context. The other way The first step, in our implementation, is rather simple, as it
round content based image analysis allows the acquisition and is performed via a simple XSL transformation from the XML
exploitation of similarity relations among multimedia entities schema depicted in Figure 3 to a collection of RDF statements
thus allowing to refine and enrich the knowledge expressed in the RDF XML syntax. The only tricky part is the
representation modeled in the domain ontology. decision of what statements are to be produced from the XML
The experiments and tests conducted so far show that the file (in fact, we observe that the simple snippet in Figure 3
proposed approach is a promising method for the detection and contains a large number of implicit statements, and therefore
recognition of objects and for image annotation. An extensive one needs to select the most useful ones). The outcome of this
assessment of the performance was conducted on a synthetic step is an RDF file containing a potentially large number of
test-bed. A set of synthetic images has been created by
drawing on a non uniform background rotated and rescaled 1
More details are provided in [11], where the entire VIKEF knowledge
pipeline is described thoroughly.
6
facts about the product catalog. modeled in different ways by different ontology engineers.
The second step is by far more interesting. Indeed, the goal Even if the domains are the same, the perspectives under
is to qualify the RDF statements generated in the previous step which they are modeled are usually different. One way to
by linking their constituents to some pre-existing ontology2 detect the differences (or similarities) of the modeling
(this is what we call semantic elicitation). In our approach, this perspectives is by analyzing the relations existing between the
task can be decomposed in two different sub-tasks: (1) entity- concepts. Due to the already mentioned intrinsic difficulties of
level elicitation, and (2) class/property-level elicitation. the ontology engineering task, advanced research is being
The first sub-task is implemented as a problem of performed in order to find a solution to (partially) overcome
matching entity descriptions onto the entities stored in an these problems. The method and tools aim to support the
repository called OKKAM. A full description of OKKAM is ontology engineer in finding existing Semantic Web ontologies
beyond the scope of this paper; the main idea is that it creates written in OWL (or parts of them) that model the targeted
and stores URIs (which can then be reused through multiple domain in a similar way; this is, with the same modeling
applications) together with additional information, including perspective. The approach relies on the existence of searchable
any known description of the entity itself. Once a new entity is pool(s) of ontologies where candidate ontologies can be
recognized in any digital document (plain text documents, searched for and pre-selected based on a user-specified set of
relational databases, HTML pages, and so on), OKKAM can desired ontological classes (this set is called fragment). The
be queried to check whether that entity already has a known pre-selection process searches for a specific percentile match
URI, which is returned for reuse; if no match is found, a new of labels and synonyms of them appearing in the user specified
entity is created, and stored with its URI and all available fragment and ontologies in the pool of ontologies. Currently
descriptions. we are using the SWOOGLE (http://swoogle.umbc.edu/)
The second task uses a tool, called CtxMatch2.0 [16] for ontology repository, but any other similar repository can be
schema and ontology matching, which is a VIKEF motivated easily incorporated. The pre-selected ontologies are then
extension of a pre-existing tool. The details of semantic further analyzed in detail by analyzing the labels of classes and
elicitation with CtxMatch2.0 are provided in other papers (see relations in combination with a lexical resource (currently
e.g. [17]). In the scenario described in this paper, the tool is WordNet). Labels are analyzed and word meanings are related
used to match categories and relations extracted from catalogs by various means in
to classes and properties found in any available domain order to compute the
ontology. The matching method uses two sources of likelihood of one or the
information: the hierarchical structure of elements (which is other possible sense of a
particularly important in product catalogs) and lexical word to hold in the
information associated both to catalog labels and ontology given context. This is
elements. then represented in two
The outcome of this second step is a refined RDF file where ways, in a logical
linguistic descriptions of entities are replaced by unique URIs formulae (following the
(which can be used later to merge RDF graphs produced from [18] approach) used for
different catalogs or in general from different collections of detecting similarity of
documents), category names are replaced by the URIs of concepts using a
ontological classes (if available), and relation names are reasoner and as weights
replaced by the URIs of data type or object properties (if of relevance for each
available). We notice that entities may be classified using possible sense in order
complex concepts, compositionally constructed from their to compare concepts
linguistic descriptions (for example, “SANELA” will and decide if they are
Fig. 7. The architecture of the described
correspond to a cushion made of cotton). similar and to which approach for the re-use of existing
A final remark is that the outcome of this elicitation and degree this similarity ontologies.
refinement phase includes a mixture of knowledge coming holds. The relevance
from multimedia sources, namely text and pictures, in a single measures are combined with the logical results in order to give
representation. a measure of the closeness of the considered concepts. The
architecture of this approach is presented in Fig. 7. The
D. Reuse of existing ontologies
ontology engineer can then decide to use any of the proposed
Ontology engineering is a very time consuming and ontologies as a basis for performing the required extension of
subjective task. It is time consuming because it is hard to the initially specified fragment. There exist several different
model not only new, but also already known domains, and techniques for matching ontologies, using very different
subjective since in most of the cases the same domains are approaches. Ontology mapping, ontology alignment and
ontology matching fall all under a broader research area where
2
See next section for what concerns the support we provide to build an correspondences between two schemata need to be discovered.
ontology by reusing existing ontologies.
7
In some cases rules for mapping schemata need to be created, extraction from multimedia sources, in particular from product
in others it is enough to know the relations existing between catalogues. Our methodology tries to enable a virtuous circle
two or more elements from different schemata. The area of by which domain ontologies are used in the extraction process,
alignment or matching of schemata has been long studied and and at the same time the extraction process becomes a way for
many approaches exist in integrating database schemata, XML creating or extending the available ontologies. The result of
schemata in general, query mediation, etc. The closest the extraction process is a semantically rich representation of
approaches to ontology alignment are the approaches for the content of catalogs, where knowledge extracted from texts
schemata alignment, especially taxonomic hierarchies or (e.g. product descriptions) is integrated with knowledge
classification hierarchies in general. A comprehensive study is extracted from pictures, and made available for any service
presented in [19]. In the following we will present some one may want to build on top of it.
approaches that are more closely related to the one presented
in this section. REFERENCES
• The iPrompt [20] tool suite contains tools for ontology [1] M. T. Pazienza, A. Stellato, M. Vindigni, Cross-lingual multi-agent
merging and alignment among others. Its input are a set of retail comparison, AWSS 2003 Workshop on Applications, Products
and Services of Web-based Support Systems, 13 Oct. 2003, Halifax
pairs of related/similar concept from different ontologies, Canada in conjunction with Web-Intelligence international Conference
based on them it proceeds to find other pairs of similar WI 2003.
concepts relying heavily on the structure. [2] V. Svatek, J. Kosek, M. Labsky, J. Braza, M. Kavalec, M. Vacura, V.
Vavra, V. Snasel: Rainbow – multiway semantic analysis of websites, in
• The GLUE approach [21] uses a machine learning Proceedings of the 14th International Workshop on Database and
approach to analyze instances. The similarity of concept Expert Systems Applications (DEXA'03), 2003, p. 635.
meaning is defined based on the joint probability [3] A. Lenci, S. Montemagni, V. Pirrelli, Chunk-it. An Italian shallow
parser for robust syntactic annotation. In A. Zampolli, N. Calzolari, L.
distribution of the involved concepts.
Cignoni, (eds.), Computational Linguistics in Pisa - Linguistica
• The QOM [22] approach is based on a combination of Computazionale a Pisa. Linguistica Computazionale, Special Issue,
several approaches that analyze features of different XVI-XVII, (2003). Pisa-Roma, IEPI. Tomo I, 2003, pp. 353–386.
ontologies. The similarity of concepts is later computed [4] R. Bartolini, A. Lenci, S. Montemagni, V. Pirrelli, Hybrid Cosntraints
for Robust Parsing: First Experiments and Evaluation, in Proceedings of
by aggregation of different similarity. The approach is LREC 2004: Fourth International Conference on Language Resources
iterative; every iteration uses results obtained in the and Evaluation. Lisbon, Portugal, 26th, 27th & 28 May 2004, Volume
previous ones in order to enhance quality of the results. III, Paris, pp. 859-862.
[5] R. Bartolini, D. Giorgetti, A. Lenci, S. Montemagni, V. Pirrelli,
• Cupid [23] presents an approach that considers schemata Automatic Incremental Term Acquisition from Domain Corpora, in
and not instances information for matching XML Proceedings of the 7th International conference on Terminology and
Schemata mainly. It uses linguistic information and the Knowledge Engineering (TKE2005), Copenhagen Business School,
Copenhagen, Denmark, 2005, pp. 17–18.
structure of the schema. This approach uses also keys, [6] J. Coble, R. Rathi, D. J. Cook and L. B. Holder, Iterative Structure
referential constraints and other schema information to Discovery in Graph-Based Data, International Journal on Artificial
derive more precise matching results. Cupid also handles Intelligence Tools, 14(1-2):101-124, 2005
[7] P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth. Object
context dependent matches of shared type definitions that Recognition as Machine Translation: Learning a lexicon for a fixed
are used in several larger structures in the schema. image vocabulary. In European Conference on Computer Vision
• The MoA [24] is an approach to merge and align OWL (ECCV) Copenhagen, 2002
ontologies. This approach uses linguistic methods to [8] C. Carson, S. Belongie, H. Greenspan, and J. Malik. Blobworld: Image
segmentation using expectation-maximization and its application to
disambiguate the meaning of concepts and proposes an image querying. IEEE Transactions on Pattern Analysis and Machine
algorithm to detect and represent in a Semantic Bridge Intelligence, 24(8):1026-1038, 2002.
semantic equivalence between concepts. The Semantic [9] M. Lecca, Object Recognition in Color Images by the Self Configuring
System MEMORI, International Journal of Signal Processing, Vol. 3,
Bridge can represent equivalence of concepts and
No. 3, pp. 176-185, World Enformatika Society, 2006
properties, subconcept and subproperties and identity of [10] C. Andreatta, M. Lecca and S. Messelodi, Memory-based Object
instances. A merging algorithm is also presented. Recognition in digital Images, 10th International Fall Workshop -
• The OMEN [25] approach uses Bayesian Networks (BN) Vision, Modeling, and Visualization - VMV 2005, Erlangen, Germany,
November 16-18, 2005
for deciding the match of concepts based on an initial [11] A. Hoogs, R. Collins, R. Kaucic, and J. Mundy. A common set
hand made match. Based on known matches it analyzes ofperceptual observables for grouping, figure - ground discrimination,
the structure (e.g. domain and range of properties) to and texture classification. IEEE Transaction on Pattern Analysis and
Machine Intelligence, (4):458–474, 2003.
derive further matches. The algorithm operates iteratively [12] B. Ko and H. Byun. Extracting Salient Regions And Learning
and produces in every iteration a new match which is used Importance Scores In Region-Based Image Retrieval. International
in the following interactions. Journal of Patter Recognition and Artificial Intelligence, (17(8)):1349–
1367, 2003.
[13] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library
(COIL-100). In Technical Report CUCS-006-96, Columbia University,
IV. CONCLUSIONS 1996.
[14] S. Obdrzalek and J. Matas. Object recognition using local affine frames
In this paper we presented a methodology for supporting the on distinguished regions. In Paul L. Rosin and David Marshall, editors,
use of background ontologies in the task of information
8
Proceedings of the British Machine Vision Conference, volume 1, pages
113-122, London, UK, September 2002. BMVA.
[15] Ming-Hsuan Yang, D. Roth and N. Ahuja. Learning to Recognize 3D
Objects with SNoW. In ECCV-2000, The Proceedings of the Sixth
European Conference on Computer Vision, volume 1, pages 439-454,
2000.
[16] Bouquet, P., Serafini, L., Zanobini, S.: Peer-to-peer semantic
coordination. Journal of Web Semantics 2 (2005)
[17] Bouquet, P., Serafini, L., Zanobini, S., Sceffer S.: Bootstrapping
semantics on the web. meaning elicitation from schemas. WWW2006,
Edinburgh (Scotland, UK), May 2006
[18] R. Stecher, C. Niederee, P. Bouquet, T. Jaquin, S. Ait-Mokhtar, S.
Montemagni, R. Brunelli, G. Demetriou, "Enabling a knowledge supply
chain: from content resources to ontologies", in Proceedings of the
ESWC2006 workshop on "Mastering the Gap: from information
extraction to semantic representation", CEUR workshop proceedings,
vol. 187, Budva (Montenegro), 11 June 2006.
[19] Pavel Shvaiko, Jérôme Euzenat: “A Survey of Schema-Based Matching
Approaches”, in Journal of Data Semantics IV, 2005.
[20] N. Noy and M. Musen: “The PROMPT suite: Interactive tools for
ontology merging and mapping”, in Technical report, SMI, Stanford
University, CA, USA (2002)
[21] A. Doan, P. Domingos and A. Halevy: “Learning to match the schemas
of data sources: A multistrategy approach”, 2003.
[22] Marc Ehrig and Steffen Staab: „QOM - Quick Ontology Mapping“, in
Proceedings of the International Semantic Web Conference, 2004.
[23] Jayant Madhavan, Philip A. Bernstein and Erhard Rahm: “Generic
Schema Matching with Cupid”, in The VLDB Journal, 2001.
[24] Jaehong Kim, Minsu Jang, Young-Guk Ha, Joo-Chan Sohn and Sang Jo
Lee: “MoA: OWL Ontology Merging and Alignment Tool for the
Semantic Web” in Innovations in Applied Artificial Intelligence: 18th
International Conference on Industrial and Engineering Applications of
Artificial Intelligence and Expert Systems, 2005.
[25] Prasenjit Mitra, Natasha F. Noy and Anuj Jaiswal: “OMEN: A
Probabilistic Ontology Mapping Tool”, in Proceedings of the
International Semantic Web Conference, 2005.