=Paper=
{{Paper
|id=None
|storemode=property
|title=Cross-Lingual Ontology Mapping and Its Use on the Multilingual Semantic Web
|pdfUrl=https://ceur-ws.org/Vol-571/paper3.pdf
|volume=Vol-571
|dblpUrl=https://dblp.org/rec/conf/www/FuBO10
}}
==Cross-Lingual Ontology Mapping and Its Use on the Multilingual Semantic Web==
Cross-Lingual Ontology Mapping and Its Use on the
Multilingual Semantic Web
Bo Fu, Rob Brennan, Declan O’Sullivan
Knowledge and Data Engineering Group, School of Computer Science and Statistics,
Trinity College Dublin, College Green, Dublin 2, Ireland
{bofu, rob.brennan, declan.osullivan}@scss.tcd.ie
ABSTRACT personalised querying of multilingual knowledge repositories are
Ontology-based knowledge management systems enable the presented in section 4. An overview of the initial implementation
automatic discovery, sharing and reuse of structured data sources of the proposed framework is given in section 5. Section 6
on the semantic web. With the emergence of multilingual presents an experiment that engages the integrated framework in a
ontologies, accessing knowledge across natural language barriers mapping scenario that involves ontologies labelled in English and
has become a pressing issue for the multilingual semantic web. In Chinese, and discusses the evaluation results and findings from
this paper, a semantic-oriented cross-lingual ontology mapping this experiment. Finally, work in progress is outlined in section 7.
(SOCOM) framework is proposed to enhance interoperability of
ontology-based systems that involve multilingual knowledge 2. STATE OF THE ART
repositories. The contribution of cross-lingual ontology mapping Current CLOM strategies can be grouped into five categories,
is demonstrated in two use case scenarios. In addition, the notion namely manual processing, corpus-based approach, instance-
of appropriate ontology label translation, as employed by the based approach, linguistic enrichment of ontologies and the two-
SOCOM framework, is examined in a cross-lingual ontology step generic approach. A costly manual CLOM process is
mapping experiment involving ontologies with a similar domain documented in [13], where the English version of the
of interest but labelled in English and Chinese respectively. AGROVOC 1 thesaurus is mapped to the Chinese Agriculture
Preliminary evaluation results indicate the promise of the cross- Thesaurus. Given large and complex ontologies, such an approach
lingual mapping approach used in the SOCOM framework, and would be infeasible. Ngai et al. [16] propose a corpus-based
suggest that the integrated appropriate ontology label translation approach to align the English thesaurus WordNet 2 and the
mechanism is effective in the facilitation of monolingual matching Chinese thesaurus HowNet3. As bilingual corpora are not always
techniques in cross-lingual ontology mapping scenarios. available to domain-specific ontologies, it is difficult to apply
their approach in practice. The instance-based approach proposed
Keywords by Wang et al. [24] generates matching correspondences based on
Cross-Lingual Ontology Mapping; Appropriate Ontology Label the analysis of instance similarities. It requires rich sets of
Translation; Matching Assessment Feedback; Querying of instances embedded in ontologies, which is a condition that may
Multilingual Knowledge Repositories. not always be satisfied in the ontology development process.
Pazienza & Stellato propose a linguistically motivated mapping
method [17], advocating a linguistic-driven approach in the
1. INTRODUCTION ontology development process that generates enriched ontologies
The promise of the semantic web is that of a new way to organise, with human-readable linguistic resources. To facilitate this
present and search information that is based on meaning and not linguistic enrichment process, a plug-in for the Protégé4 editor –
just text. Ontologies are explicit and formal specifications of OntoLing 5 was also developed [18]. Linguistically enriched
conceptualisations of domains of interests [11], thus are at the ontologies may offer strong evidence when generating matching
heart of semantic web technologies such as semantic search [8] correspondences. However, as such enrichment is not currently
and ontology-based information extraction [2]. As knowledge and standardised, it is difficult to apply the proposed solution.
knowledge representations are not restricted to the usage of a Trojahn et al. [23] present a multilingual ontology mapping
particular natural language, multilinguality is increasingly evident framework, where ontology labels are first represented with
in ontologies as a result. Ontology-based applications therefore collections of phrases in the target natural language. Matches are
must be able to work with ontologies that are labelled in diverse then generated using specialized monolingual matching agents
natural languages. One way to realise this is by means of cross- that use various techniques (i.e. structured-based matching
lingual ontology mapping (CLOM). algorithms, lexicon-based matching algorithms and so on).
In this paper, a summary of current CLOM approaches is However, as Shvaiko & Euzenat state in [20], “despite the many
presented in section 2. A semantic-oriented cross-lingual ontology component matching solutions that have been developed so far,
mapping (SOCOM) framework that aims to facilitate mapping there is no integrated solution that is a clear success”. Often
tasks carried out in multilingual environments is proposed and various techniques are combined in order to generate high quality
discussed in section 3. To illustrate possible applications of the matching results [12], searching for globally accepted matches
SOCOM framework on the multilingual semantic web, two use
case scenarios including cross-language document retrieval and 1
http://aims.fao.org/website/AGROVOC-Thesaurus/sub
2
http://wordnet.princeton.edu
3
Copyright is held by the author/owner(s). http://www.keenage.com/html/e_index.html
4
WWW 2010, April 26-30, 2010, Raleigh, North Carolina, USA. http://protege.stanford.edu
5
http://art.uniroma2.it/software/OntoLing
13
can lead to a limited matching scope. In 2008, an OAEI6 test case position in the ontology structure. Thus, the ontology rendering
that involves the mapping of web directories written in English process should not modify the position of a node, because doing
and Japanese was designed. Only one participant – the RiMOM so would effectively alter the semantics of the original ontology.
tool – was able to submit results [26], by using a Japanese-
English dictionary to translate labels from the Japanese web
directory into English first, before applying monolingual matching
procedures. This highlights the difficulty of exercising current
monolingual matching techniques in CLOM scenarios.
Trojahn et al’s framework and RiMOM’s approach both
employ a generic two-step method, where ontology labels are
translated into the target natural language first and monolingual
matching techniques are applied next. The translation process
occurs in isolation of the mapping activity, and takes place
independently of the semantics in the concerned ontologies. As a
result, inadequate and/or synonymic translations can introduce
“noise” into the subsequent matching step, where matches may be
neglected by matching techniques that (solely) rely on the
discovery of lexical similarities. This conception is further
examined in [9], where strong evidence indicates that to enhance
the performance of existing monolingual matching techniques in
CLOM scenarios, appropriate ontology label translation is key to
the generation of high quality matching results. This notion of
selecting appropriate ontology label translations in the given
mapping context is the focus of the SOCOM framework and the
evaluation shown in this paper.
Notable work in the field of (semi-)automatic ontology label
translation conducted by Espinoza et al. [7] introduces the
LabelTranslator tool, which is designed to assist humans during
the ontology localisation process. Upon selecting the labels of an
ontology one at a time, ranked lists of suggested translations for
each label are presented to the user. The user finally decides
which suggested translation is the best one to localise the given
ontology. In contrast to the LabelTranslator tool, the ontology
rendition process of the SOCOM framework presented in this
paper differs in its input, output and design purpose. Firstly, our
rendering process takes formally defined ontologies (i.e. in RDF/
OWL format) as input, but not the labels within an ontology.
Secondly, it outputs formally defined ontologies labelled in the
Figure 1. SOCOM Framework Workflow Overview
target natural language, but not lists of ranked translation
suggestions. Lastly, our rendering process is designed to facilitate In contrast to the generic approach, where the translation of
further machine processing (more precisely, existing monolingual ontology labels takes place in isolation from the ontologies
ontology matching techniques), whereas the LabelTranslator tool concerned, the SOCOM framework is semantic-oriented and aims
aims to assist humans. to identify the most appropriate translation for a given label. To
achieve this, firstly, suitable translation tools are selected at the
translator selection point to generate candidate translations. This
3. THE SOCOM FRAMEWORK selection process is influenced by the knowledge domain of the
Given ontologies O1 and O2 (see Figure 1) that are labelled in concerned ontologies. For general knowledge representations, off-
different natural languages, O1 is first transformed by the SOCOM the-shelf machine translation (MT) tools or thesauri can be
framework into an equivalent of itself through the ontology applied. For specific domains such as the medical field,
rendering process as O1'. O1' contains all the original semantics of specialised translation media are more appropriate. Secondly, to
O1 but is labelled in the natural language that is used by O2. O1' is identify the most appropriate translation for a label among its
then matched to O2 using monolingual matchers to generate candidate translations, the appropriate translation selection
candidate matches, which are then reviewed by the matching process is performed. This selection process is under the influence
assessment mechanism in order to establish the final mappings. of several information sources including the source ontology
Ontology renditions are achieved by structuring the translated semantics, the target ontology semantics, the mapping intent, the
ontology labels in the same way as the original ontology O1, and operating domain, the time constraints, the resource constraints,
assigning these translation labels to new namespaces to create the user and finally the matching assessment result feedback.
well-formed resource URIs in O1' (for more details, please see These influences are explained next.
[9]). Note that the structure of O1 is not changed during this The semantics defined in O1 can indicate the context that a to-
process, as Giunchiglia et al. [10] point out, the conceptualisation be-translated label is used in. Given a certain position of the node
of a particular ontology node is captured by its label and its with this label, the labels of its surrounding nodes (referred to as
surrounding semantics in this paper) can be retrieved and studied.
6
http://oaei.ontologymatching.org For example, for a class node, its surrounding semantics can be
14
represented by the labels of its super/sub/sibling-classes. For a class Summary contains: {BookChapter, Reference}, and the
property node, its surrounding semantics can be represented by surrounding semantics of the class Abstract would include:
the labels of the resources which this property restricts. For an {Mathematics, Applied}. Using string comparison techniques, one
individual, the surrounding semantics can be characterised by the can determine that the strings in the surroundings of the target
label of the class it belongs to. Depending on the granularity of class Summary are more similar to those of the source class.
the given ontologies in a mapping scenario, an ontological Summary therefore would be the appropriate translation in such a
resource’s surrounding semantics should be modelled with case. Note that the SOCOM framework is concerned with
flexibility. For example, if the ontologies are rich in structure, searching for appropriate translations (from a mapping point of
immediate surrounding resource labels (e.g. direct super/sub view) but not necessarily the most linguistically correct
relations) alone can form the content of the surrounding translations (from a natural language processing point of view),
semantics. If the ontologies are rich in instance, where the because our motivation for translating ontology labels is so that
immediate surrounding label (e.g. the class an instance belongs to) the ontologies can be best mapped8. This should not be confused
alone is weak to provide the instance’s context of use, indirect with translating labels for the purpose of ontology localisation,
(e.g. all super/sub classes declared in the ontology) resource labels where labels of an ontology are translated so that it is “adapted to
should be included in the surrounding semantics. The goal of a particular language and culture” [21].
obtaining surrounding semantics of a given resource is to provide
the translation selection process with additional indications of the
context a resource is used in7.
As O1 is transformed so that it can be best mapped to O2, the
semantics defined in O2 therefore can act as broad translation
selection rules. When several translation candidates are all
linguistically correct for a label in O1, the most appropriate
translation is the one that is most semantically similar to what is
used in O2. An example of appropriate ontology label translation
摘
is shown in Figure 2, where the source ontology is labelled in
Chinese and is mapped to an English target ontology. The class
要 from the source ontology has translation candidates abstract
and summary. To determine the most appropriate translation, the Figure 2. Examples of Appropriate Label Translation
defined semantics of the target ontology can influence the
translation selection process. To understand how this is possible, In addition to using the embedded semantics of the given
consider three scenarios. Figure 2a demonstrates a situation where ontologies, task intention can also influence the outcome of the
a class named Summary exists in the target ontology. In this case, translation selection process as it captures some of the mapping
Summary would be considered as more appropriate than abstract motives. Consider a CLOM scenario where the user is not
since it is the exact label used by the target ontology. Figure 2b comfortable with all the natural languages involved, and would
illustrates another scenario where the target ontology contains a like to test just how meaningful/useful it is to map the given
class named Sum. From a thesaurus or a dictionary, one can learn ontologies. In such a case, the selection of translation candidates
that Sum is a synonym of summary, therefore, instead of using need not be very sophisticated, thus results returned from off-the-
either abstract or summary, Sum will be chosen as the appropriate shelf MT tools can be acceptable. The domain of the ontologies is
translation in this case. Figure 2c shows a third scenario where another influence on the translation selection process. For
both Abstract and Summary exist in the target ontology, the example, if O1 and O2 are domain representations where each one
is associated with collections of documents in different natural
摘要 出
appropriate translation is then concluded by studying the
surrounding semantics. The source class has a super-class languages, lists of frequently used words in these documents can
版物 (with translation candidates publication and printing), two be collected. The translation candidate that is ranked highest on
sibling-classes 章节 (with translation candidates chapter and
these lists would be deemed as the most appropriate translation.
section) and 书籍 (with translation candidates book and
Moreover, time constraints can influence the translation selection
process. If the mappings must be conducted dynamically such as
literature). Its surrounding semantics therefore include: the work presented in [5], the translation selection consequently
{publication, printing, chapter, section, book, literature}. must be fast, where it might not make use of all the resources that
Similarly, in the target ontology, the surrounding semantics of the are available to it. On the other hand, not all of the
aforementioned resources will be available in every CLOM
7 scenario. Resource constraints therefore can have an impact on
The generation of surrounding semantics presented in this paper
does not attempt to estimate the semantic relatedness between the outcome of the translation selection process. Furthermore,
concepts, it is a procedure performed within readily defined users, at times, can have the expertise that is not obtained by the
ontologies in a cross-lingual ontology mapping scenario that system, and should influence the translation selection process
aims to gather the context of use for a particular resource in the when necessary. Lastly, matching result feedback can influence
given ontologies. Though one might assume that the SOCOM the future selection of appropriate translations (discussed next).
framework would work best when ontologies with similar
granularity are presented, this however, is not a requirement of
8
the framework. As already mentioned, the surrounding Note that the appropriate ontology label translation mechanism
semantics are modelled with flexibility, where indirectly related presented in this paper does not attempt to disambiguate word
concepts in the ontology would be collected as long as the senses, as the appropriateness of a translation is highly restricted
surrounding well illustrates the context of use for a particular to the specific mapping scenarios, thus it is not a form of natural
ontological resource. language processing technique.
15
Once O1' is generated, various monolingual matching mapping scenario. Hence alternative solutions are in need. The
techniques can be applied to create matches between O1' and O2. SOCOM framework presented in this paper can contribute
The selection of these monolingual matchers depends on the towards this need. Its contribution can be demonstrated through
feedback generated from the mapping result assessment. two use cases as shown in Figures 3 & 4.
Assessment feedback can be implicit (i.e. pseudo feedback) or User generated content such as forums often contain
explicit. Pseudo feedback is obtained automatically, where the discussions on how to solve particular technical problems, and a
system assumes matches that meet certain criteria are correct. For large amount of content of this type is written in English.
example, “correct” results may be assumed to be the ones that Consider a scenario illustrated in Figure 3, where the user whose
have confidence levels of at least 0.5. The precision of the preferred natural language is Portuguese is searching for help on a
matches generated can then be calculated for each matching forum site, but the query in Portuguese is returning no satisfactory
algorithm used, which will allow the ranking of these algorithms. results. Let us assume that the user also speaks English as a
The ranking of the MT sources can also be determined upon second language and would like to receive relevant documents
establishment of the usage of each MT source (i.e. as percentages) that are written in English instead. To achieve this, domain
among the “correct” matches. Based on these rankings, the top ontologies in Portuguese and English can be extracted based on
performing MT tools and matching algorithms can then be text presented in the documents using such as Alani et al.’s
selected for the future executions of the SOCOM framework. approach [1]. Mappings can then be generated pre-runtime using
Explicit feedback is generated from users and is more reliable the SOCOM framework between the Portuguese ontology and the
than pseudo feedback, which can aid the mapping process in the English ontology, and stored as RDF triples. At run time, once a
same way as discussed above. query is issued in Portuguese, it is first transformed using such as
Matching assessment feedback allows insights into how the Lopez et al.’s method [14] to associate itself with a concept in the
correct mappings are generated, in particular, which translation Portuguese domain ontology. This Portuguese concept’s
tool(s) and matching algorithm(s) are most suitable in the corresponding English concept(s) can then be obtained by looking
specified CLOM scenario. Such feedback in turn could influence it up in the mapping triplestore. Once the system establishes
the future selection of appropriate label translations and the which English concepts to explore further, their associated
monolingual matching techniques to use. Finally, the feedback documents in English can be retrieved.
should be influenced by the selection rationale employed during
the translation selection process and the monolingual matching
process. Such rationale can be captured as metadata as part of the
mapping process and include information such as the influence
sources used, translation tools used, monolingual matching
techniques used, similarity measures of semantic surroundings
and so on. The use of matching assessment feedback addresses
one of the scalability issues that arise. Consider a mapping
scenario where the concerned ontologies contain thousands of
entities, one way to rapidly generate mapping results and improve
mapping quality dynamically is to use the pseudo feedback. For
the first, e.g. 100 mapping tasks, assume the ones that satisfy
certain criteria are correct, detect how they are generated, and
keep using the same techniques for the remaining mapping tasks.
This assessment process can also be recursive where the system is
adjusted for every few mapping tasks. Finally, explicit feedback
involves users in the mapping process, which contributes towards
addressing one of the challenges, namely user involvement in
ontology matching as identified by Shvaiko & Euzenat in [20]. Figure 3. SOCOM Enabled Cross-Language Document
Retrieval
4. USE CASES Personalisation can also be enhanced with the integration of
The notion of using conceptual frameworks such as thesauri and the SOCOM framework in scenarios such as the one shown in
ontologies in search systems [6] [4] for improved information Figure 4, where a user is bi/multi-lingual and would like to
access [19] and enhanced user experiences [22] is well researched receive documents in a restricted knowledge domain in various
in the information retrieval (IR) and the cross-lingual IR (CLIR) natural languages as long as they are relevant. To achieve this,
community. However, the use of ontology mapping as a technique ontology-based user models9 containing knowledge such as user
to aid the search functions in IR has been relatively limited. The interests and language preferences can be generated pre-runtime
most advanced work of using ontology alignment in CLIR, to the using approaches such as [3]. Similar to the previous scenario,
best of our knowledge, is Zhang et al.’s statistical approach domain ontologies labelled in different natural languages can be
presented in [25], which does not involve translations of ontology obtained from sets of documents. In Figure 4, knowledge
labels. To avail statistical analysis such as latent semantic representations in English, French, German and Spanish are
indexing, singular value decomposition, directed acyclic graphs obtained in ontological form. Mappings of the user model and the
and maximal common subgraph on document collections, parallel various domain ontologies can then be generated using the
corpora must be generated beforehand. However, this often is an
expensive requirement and may not always be satisfied. Also, by 9
applying statistical techniques only, such an approach ignores the User modelling is a well researched area particularly in adaptive
existing semantic knowledge within the given ontologies in a hypermedia and personalised search systems, however, this is
outside the scope of this paper.
16
SOCOM framework. At run time, a user query is transformed to translations are stored in a translation repository, whereas the
be associated with a concept or concepts in the user model. By synonyms are stored in a lexicon repository. Both repositories are
looking up in the mapping triplestore, the matched concepts in stored in the eXist16 1.0rc database.
various knowledge repositories (the German and the Spanish The appropriate translation selection process invokes the
knowledge repositories in the case of Figure 4) can be obtained, repositories in the database via the XML:DB 17 1.0 API, to
which will then lead to the retrieval of relevant documents in compare each candidate translation of a given source label to what
different natural languages. is stored in the lexicon repository. An overview of this appropriate
translation selection process can be seen in Figure 5. If a one-to-
one match (note that the match found in the lexicon repository can
be either a target label used in O2, or a synonym of a target label
that is used in O2) is found, the (matched target label or the
matched synonym’s corresponding) target label is selected as the
appropriate translation. If one-to-many matches (i.e. when several
target labels and/or synonyms in the lexicon repository are
matched) are found, the surrounding semantics (see section 3) of
the matched target labels are collected and compared to the
surrounding semantics of the source label in question. Using a
space/case-insensitive edit distance string comparison algorithm
based on Nerbonne et al.’s method [15], the target label with
surrounding semantics that are most similar to those of the source
resource is chosen as the most appropriate translation. If no match
is found in the lexicon repository, for each candidate translation, a
set of interpretative keywords are generated to illustrate the
meaning of this candidate. This is achieved by querying
Wikipedia 18 via the Yahoo Term Extraction Tool 19 . Using the
Figure 4. Personalised Querying of Multilingual Knowledge same customised string comparison algorithm, the candidate with
Repositories with SOCOM keywords that are most similar to the source label’s surrounding
semantics is deemed as the most appropriate translation.
5. IMPLEMENTATION
To examine the soundness of the appropriate ontology label
translation selection process proposed in the SOCOM framework,
an initial implementation of the proposal has been completed that
uses just the semantics within the given ontologies in a CLOM
scenario. This light-weight translation selection process (i.e. one
that includes semantics in O1 and semantics in O2, but excludes
the six other influence sources as shown in Figure 1) is the focus
of the implementation and the evaluation presented in this paper.
This initial SOCOM implementation integrates the Jena 2.5.5
Framework10 to parse the formally defined input ontologies. To
collect candidate translations for ontology labels in O1, the
GoogleTranslate11 0.5 API and the WindowsLive12 translator are
used 13 . Synonyms of ontology labels in O2 are generated by
querying WordNet14 2.0 via the RiTa15 API. Ontology labels are
often concatenated to create well-formed URIs (as white spaces
Figure 5. Overview of the Appropriate Ontology Label
are not allowed), e.g. a concept associate professor can be
Translation Selection Process
labelled as AssociateProfessor in the ontology. As the integrated
MT tools cannot process such concatenated labels, they are split Once appropriate translations are identified for each label in
into sequences of their constituent words before being passed to O1, given the original source ontology structure, O1' is generated
the MT tools. This is achieved by recognising concatenation using the Jena Framework. Finally, O1' is matched to O2 to
patterns. In the previous example, white spaces are inserted before generate candidate matches via the Alignment API20 version 3.6.
each capital letter found other than the first one. The candidate
6. EVALUATION
10 To evaluate the effectiveness of the integrated appropriate
http://jena.sourceforge.net
11 translation selection process, this initial implementation of the
http://code.google.com/p/google-api-translate-java
12 SOCOM framework is engaged in a CLOM experiment that
http://www.windowslivetranslator.com/Default.aspx
13
One could use a dictionary/thesaurus here, however, as the
appropriate ontology label translation selection process in the 16
http://exist.sourceforge.net
SOCOM framework is not a word sense disambiguation 17
http://xmldb-org.sourceforge.net/index.html
mechanism (see section 3), off-the-self MT tools are efficient to 18
http://www.wikipedia.org
collect candidate translations. 19
http://developer.yahoo.com/search/content/V1/
14
http://wordnet.princeton.edu termExtraction.html
15 20
http://www.rednoise.org/rita http://alignapi.gforge.inria.fr
17
involves ontologies labelled in Chinese and English describing Legend (Figure 7 & Table 1):
1 NameAndPropertyAlignment 5 SMOANameAlignment
the research community domain, against a baseline system – the 2 StructSubsDistAlignment 6 SubsDistNameAlignment
generic approach, where labels are translated in isolation using 3 ClassStructAlignment 7 EditDistNameAlignment
4 NameEqAlignment 8 StringDistAlignment
just the GoogleTranslate 0.5 API and matches are generated using Generic Approach SOCOM Approach
21 Precision
Generic Avg. = 0.5914 SOCOM Avg. = 0.6100
the Alignment API version 3.6 (see [9] for more technical 1.00
details of the implementation of the generic approach).
0.75
6.1 Experimental Setup
0.50
Figure 6 gives an overview of the experiment. A Chinese ontology
22
CSWRC is created manually by a group of domain experts 0.25
(excluding the authors of this paper) based on the English
23
SWRC ontology. It contains 54 classes, 44 object properties and 0.00
30 data type properties. This Chinese ontology is matched to the 1 2 3 4 5 6 7 8
Algorithm Matching Results
24
English ISWC ontology (containing 33 classes, 18 object
properties, 17 data type properties and 50 instances) using the (a) Precision
generic approach and the SOCOM approach, generating results Recall Generic Approach
Generic Avg. = 0.4561
SOCOM Approach
SOCOM Avg. = 0.5067
1.00
M-G and M-S respectively.
0.75
0.50
Figure 6. Cross-Lingual Ontology Mapping Experiments
0.25
As the CSWRC ontology is formally and semantically
equivalent (with the same structured concepts but labelled in 0.00
Chinese) to the SWRC ontology, a reliable set of gold standard 1 2 3 4 5 6 7 8
Algorithm Matching Results
(referred to as Std. in Figure 6) can be generated as matches found
between the SWRC ontology and the ISWC ontology using the (b) Recall
Alignment API25. By comparing results M-G and M-S to Std., this Figure 7. Overview of Precision and Recall
experimental design aims to find out which approach can generate when Disregarding Confidence Levels
higher quality matching results, when the concerned ontologies
hold distinct natural languages and varied structures. Figure 7a shows that except the NameEqAlignment and the
StringDistAlignment algorithm, all other matching methods
indicate equal or higher precision when using the SOCOM
6.2 Experimental Results approach. The aforementioned two algorithms employ strict string
26
Precision and recall scores of M-G and M-S are calculated, see comparison techniques, where no dissimilarity between two labels
Figure 7, where a match is considered correct as long as the is overlooked. Though this is a desirable characteristic at times, in
identified pair of corresponding resources is included in the gold this particular experiment setting, some matches are neglected in
standard Std., regardless of its confidence level. Std.. E.g. when using the StringDistAlignment algorithm, the gold
standard was unable to establish a match between the class
AssociateProfessor (in SWRC) and the class Associate_ Professor
21
The Alignment API 3.6 contains eight matching algorithms, (in ISWC) because these labels are not identical, although this
namely NameAndPropertyAlignment, StructSubsDistAlign- would have been a sound match if a human was involved or if
ment, ClassStructAlignment, NameEqAlignment, SMOAName- preprocessing was undertaken. When the SOCOM approach is
Alignment, SubsDistNameAlignment, EditDistNameAlignment used to match CSWRC to ISWC, the most appropriate translation
and StringDistAlignment. For each correspondence found, a for the class 副教授 (associate professor) in the source ontology
matching relationship is given and is accompanied by a was determined as Associate_Professor since this exact English
confidence measure that range between 0 (not confident) and 1 label was used in the target ontology. Consequently, a match with
(confident). 1.00 confidence level between the two was generated in M-S.
22
http://www.scss.tcd.ie/~bofu/SOCOMExperimentJuly2009/ However, as this correspondence was not included in Std., such a
Ontologies/CSWRC.owl result is deemed as incorrect. Similar circumstances led to the
23
http://ontoware.org/frs/download.php/298/swrc_v0.3.owl lower precision scores of the SOCOM approaches in cases that
24
http://annotation.semanticweb.org/iswc/iswc.owl involve the NameEqAlignment and the StringDistAlignment
25
Based on the assumption that the CSWRC ontology is algorithms. Nevertheless, on average, with a precision score at
equivalent to the SWRC ontology, this experimental design 0.61, the SOCOM approach generated more correct matching
aims to validate whether matches generated using the exact results than the generic approach overall. Furthermore, at an
same matching algorithms would result the same or highly average recall score of 0.5067 (see Figure 7b), the SOCOM
similar corresponding concepts. approach demonstrates that its correct results are always more
26
Given a gold standard with R number of matching results, and complete than those generated by the generic approach.
an evaluation set containing X number of results, if N number As precision and recall each measures one aspect of the match
of them are correct based on the gold standard, then for this quality, f-measure scores are calculated to indicate the overall
evaluation set precision = N/X, recall = N/R and f-meaure =
2/(1/precision + 1/recall).
18
27
quality . Table 1 shows that the SOCOM approach generated and f-measure scores than the generic approach no matter what
results with at least equal quality compared to the generic the threshold is28. This finding further confirms that the matches
approach. In fact, the majority of algorithms were able to generate generated using the SOCOM approach are of higher quality.
higher quality matches when using the SOCOM approach, leading 1.0 Precision Generic
to an average of 0.5460 in its f-measure score. The differences in SOCOM
the two approaches’ f-measure scores (when they exist) range
from a smallest 1.9% (when using the NameAndPropertyAlign-
ment algorithm) to a highest of 11.4% (when using the EditDist-
NameAlignment algorithm). Additionally, when using the 0.5
SOCOM approach, bigger differences in f-measure can be seen in
lexicon-based algorithms. Such a finding indicates that
appropriate ontology label translation in the SOCOM framework
contributes positively to the enhanced performances of matching 0.0
algorithms, particularly those that are lexicon-based. 0 0.25 0.5 0.75 1
Table 1. F-measure Scores when Disregarding Confidence Threshold
Levels (a) Precision Trend
Generic SOCOM
1 .5233 .5421 1.0 Recall Generic
SOCOM
2 .4574 .4574
3 .4651 .4884
4 .6000 .6667
5 .5020 .5714
6 .5039 .5039
7 .3571 .4714
0.5
8 .6000 .6667
Avg. .5011 .5460
So far, the confidence levels of matching results have not been
taken into account. To include this aspect in the evaluation,
confidence means of the correct matches and their standard
0.0
deviations are calculated. The mean is the average confidence of 0 0.25 0.5 0.75 1
the correct matches found in a set of matching results, where the Threshold
higher it is, the better the results. The standard deviation is a (b) Recall Trend
measure of dispersion, where the greater it is, the greater the Generic
1.0 F-
spread in the confidence levels. Higher quality matching results Measure SOCOM
therefore are those with higher means and lower standard
deviations. On average, when using the SOCOM framework, the
confidence mean is 0.7105. Whereas, a lower mean of 0.6970 is
found in the generic approach. The standard deviation when using 0.5
the SOCOM framework is 0.2134, which is lower than 0.2161 as
found in the generic approach. These findings denote that matches
generated using the SOCOM approach are of higher quality,
because they are not only more confident but also less dispersed.
Moreover, average precision, recall and f-measure scores are 0.0
collected at various thresholds. These scores are calculated when 0 0.25 0.5 0.75 1
Threshold
the conditions a correct result must satisfy adjust, i.e. a matching
result is only considered correct when it is included in the gold (c) F-Measure Trend
standard, and it has confidence level of at least 0.25, 0.50, 0.75 or Figure 8. Trend Overview in Average Precision, Recall
1.00. An overview of the trends is shown in Figure 8. As the and F-Measure
requirement for a correct matching result become stricter, the
Lastly, one can argue that the differences in the f-measure
precision (Figure 8a) and recall (Figure 8b) scores both decline as
scores found between the generic and the SOCOM approach are
a result, leading to a similar decreasing trend in the f-measure
rather small and therefore can be ignored. To validate the
(Figure 8c) scores. The differences in the recall scores of the two
difference (if it exists) of the two approaches, paired t-tests are
approaches are greater than the differences of their precision
carried out on the f-measure scores collected across various
scores. This finding suggests that the matches generated using the
thresholds, and a p-value of 0.001 is found. At a significance level
two approaches may appear similar in their correctness, but the
of α=0.05, it can be concluded that the f-measure scores are
ones generated by the SOCOM approach are more complete.
statistically significant, meaning that the SOCOM approach
Overall, the SOCOM approach always has higher precision, recall
generated higher quality matches than the generic approach.
27
Note that neither precision nor recall alone is a measurement of
28
the overall quality of a set of matching results, as the former is a Dotted lines of the generic and the SOCOM approach shown in
measure for correctness and the latter is a measure for Figure 8 are almost parallel to one another, this may be in part a
completeness. One can be sacrificed for the optimisation of the result of the engineering approach deployed in the experiment
other, for example, when operating in the medical domain, (i.e. using the same tools in the implementation for both
recall may be sacrificed in order to achieve high precision; when approaches). Further research, however, is needed to confirm
merging ontologies, the opposite may be desired. the validity of this speculation.
19
7. CONCLUSIONS & FUTURE WORK [9] Fu B., Brennan R., O’Sullivan D.. Cross-lingual ontology
A semantic-oriented framework to cross-lingual ontology mapping – an investigation of the impact of machine
mapping is presented and evaluated in this paper. Preliminary translation. In Proceedings of ASWC, LNCS 5926, 1-15,
evaluation results of an early prototype implementation illustrate 2009
the effectiveness of the integrated appropriate ontology label [10]Giunchiglia F., Yatskevich M., Shvaiko P.. Semantic
translation mechanism, and denote a promising outlook for matching: algorithms and implementation. Journal on Data
applying CLOM techniques in multilingual ontology-based Semantics, vol. IX, 1-38, 2007
applications. The findings also suggest that a fully implemented [11]Gruber T.. A translation approach to portable ontologies.
SOCOM framework – i.e. one that integrates all the influence Knowledge Acquisition 5(2):199-220, 1993
factors (discussed in section 2) – would be even more effective in [12]Li J., Tang J., Li Y., Luo Q.. RiMOM: A dynamic
the generation of high quality matches in CLOM scenarios. multistrategy ontology alignment framework. IEEE
The implementation of such a comprehensive SOCOM Transactions on Knowledge and Data Engineering, Vol. 21,
framework is currently on-going. It is planned to be evaluated No. 8, 1218-1232, 2009
using the benchmark datasets from the OAEI 2009 campaign, [13]Liang A. C., Sini M.. Mapping AGROVOC and the Chinese
engaging the proposed framework in the mapping of ontologies agricultural thesaurus: definitions, tools, procedures. New
that are written in very similar natural languages, namely English Review of Hypermedia and Multimedia, 12:1, 51-62, 2006
and French. In addition, the SOCOM framework is to be [14]Lopez V., Uren V., Motta E., Pasin M.. AquaLog: an
embedded in a demonstrator cross-language document retrieval ontology-driven question answering system for organizational
system as part of the Centre for Next Generation Localisation, semantic intranets. Web Semantics. 5, 2, 72-105, Jun. 2007
which involves several Irish academic institutions and a [15]Nerbonne J., Heeringa W., Kleiweg P.. Edit distance and
consortium of multi-national industrial partners aiming to develop dialect proximity. Time Warps, String Edits and
novel localisation techniques for commercial applications. Macromolecules: The Theory and Practice of Sequence
Comparison, 2nd ed. CSLI, Stanford, v-xv, 1999
[16]Ngai G., Carpuat M., Fung P.. Identifying concepts across
8. ACKNOWLEDGMENT languages: a first step towards a corpus-based approach to
This research is partially supported by Science Foundation Ireland automatic ontology alignment. In Proceedings of the 19th
(Grant 07/CE/11142) as part of the Centre for Next Generation International Conference on Computational Linguistics, vol.1,
Localisation (http://www.cngl.ie) at Trinity College Dublin. 1-7, 2002
[17]Pazienta M., Stellato A.. Linguistically motivated ontology
9. REFERENCES mapping for the Semantic Web. In Proceedings of the 2nd
[1] Alani H., Kim S., Millard D. E., Weal M. J., Hall W., Lewis Italian Semantic Web Workshop, 14-16, 2005
P. H., Shadbolt N. R.. Automatic ontology-based knowledge [18]Pazienza M. T., Stellato A.. Exploiting linguistic resources
extraction from Web documents. IEEE Intelligent Systems 18, for building linguistically motivated ontologies in the
1, 14-21, Jan. 2003 Semantic Web. In Proceedings of OntoLex Workshop, 2006
[2] Buitelaar P., Cimiano P., Frank A., Hartung M., Racioppa S.. [19]Shuang L., Fang L., Clement Y., Weiyi M.. An effective
Ontology-based information extraction and integration from approach to document retrieval via utilizing WordNet and
heterogeneous data sources. International Journal of Human recognizing phrases. 27th Annual international ACM SIGIR
Computer Studies, 66, 11, 759-788, Nov. 2008 Conference on Research and Development in information
[3] Cantador I., Fernández M., Vallet D., Castells P., Picault J., Retrieval, 266-272, ACM Press, 2004
Ribière M.. A multi-purpose ontology-based approach for [20]Shvaiko P., Euzenat J.. Ten challenges for ontology matching.
personalised content filtering and retrieval. Advances in In Proceedings of ODBASE, 1164-1182, 2008
Semantic Media Adaptation and Personalization. Studies in [21]Suárez-Figueroa M. C., Gómez-Pérez A.. First attempt
Computational Intelligence, vol. 93, 25-51, 2008 towards a standard glossary of ontology engineering
[4] Castells P., Fernández M., Vallet D.. An adaptation of the terminology. In Proceedings of the 8th International
vector-Space model for ontology-based information retrieval. Conference on Terminology and Knowledge Engineering
IEEE Transactions on Knowledge and Data Engineering 19(2), (TKE'08), 2008
Special Issue on Knowledge and Data Engineering in the [22]Stamou, S., Ntoulas, A.. Search personalization through
Semantic Web Era, 261-272, Feb. 2007 query and page topical analysis. User Modeling and User-
[5] Conroy C., Brennan R., O’Sullivan D., Lewis D.. User Adapted Interaction 19, 1-2, 5-33., Feb. 2009
evaluation study of a tagging approach to semantic mapping. [23]Trojahn C., Quaresma P., Vieira R.. A framework for multi-
In Proceedings of ESWC, 623-637, 2009 lingual ontology mapping. In Proceedings of LREC, 1034-
[6] De Luca E. W., Eul M., Nürnberger A.. Multilingual query- 1037, 2008
reformulation using an RDF-OWL EuroWordNet [24]Wang S., Englebienne G., Schlobach S.. Learning concept
representation. In Proceedings of the Workshop on Improving mappings from instance similarity. In Proceedings of ISWC,
Web Retrieval for Non-English Queries (iNEWS07), at SIGIR 339-355, 2008
2007, ISBN 978-84-690-6978-3, 55-61, 2007 [25]Zhang L., Wu G., Xu Y., Li W., Zhong Y.. Multilingual
[7] Espinoza M., Gómez-Pérez A., Mena E.. LabelTranslator – a collection retrieving via ontology alignment. In Proceeding of
tool to automatically localize an ontology. In Proceedings of ICADL 2004, LNCS 3334, 510-514, Springer-Verlag, 2004
ESWC, 792-796, 2008 [26]Zhang X., Zhong Q., Li J., Tang J., Xie G., Li H.. RiMOM
[8] Fernandez M., Lopez V., Sabou M., Uren V., Vallet D., Motta results for OAEI 2008. In Proceedings of the OM Workshop,
E., Castells P.. Semantic search meets the Web. In 182-189, 2008
Proceedings of IEEE ICSC, 253-260, 2008
20