=Paper=
{{Paper
|id=Vol-316/paper-4
|storemode=property
|title=Similarity Measurement about Ontology-based Semantic Web Services
|pdfUrl=https://ceur-ws.org/Vol-316/Paper-4.pdf
|volume=Vol-316
|dblpUrl=https://dblp.org/rec/conf/ecows/WangDZ06
}}
==Similarity Measurement about Ontology-based Semantic Web Services==
Similarity Measurement about Ontology-based Semantic Web Services
Xia Wang Yihong Ding
Digital Enterprise Research Institute Dept. of Computer Science
IDA Business Park, Brigham Young University
Lower Dangan Galway, Ireland Provo, Utah, USA
xia.wang@deri.org ding@cs.byu.edu
Yi Zhao
Chair of Computer Engineering
Fernuniversität
Hagen, Germany
yi.zhao@fernuni-hagen.de
Abstract vice discovery, therefore, focuses on the matchmaking of
service capability [15] and QoS, while less work is done on
Measurement of semantic similarity between Web ser- ontology-based services selection.
vices is an important factor for Web service discovery, com- Moreover, Ontology receives great attention in the pro-
position, and even execution. Semantic Web services (SWS) gressively emerging Semantic Web [2] and Semantic Web
are usually specified based on ontologies. The measurement Services [6], by formally defining the concepts and relation-
of semantic similarity between Web services thus can be ships in a machine understandable way and enabling knowl-
reduced to computing semantic distances between ontolo- edge sharing and reuse. As the elements of the representa-
gies. In this paper, we briefly surveyed three major existing tion of semantic services, the similarity of the ontologies
ontology-distance-computation algorithms and enhanced used is crucial to service similarity, especially when con-
them to measure the single and multiple ontolgies similar- sidering the discovery and execution.
ity in SWS context. Based on this survey, we summarized a
Ontology similarity which is related to Ontology map-
new hybrid ontology-similarity-measurement methodology
ping is a well known topic in information retrieval, database
that measures similarity between Semantic Web Services.
integration systems, and artificial intelligence fields. Also,
there is a wealth of work on similarity measures of Ontol-
ogy concepts and concept-related notions [19]. However,
1. Introduction the measurement of similarity of ontologies and concepts
itself is not easy, additionally many specific features (see
Service similarity is crucial to service discovery, selec- section 2.2) in Semantic Web Service description environ-
tion, composition, and even execution. Especially seman- ment.
tic service discovery aims to locate the best matched ser- After surveying the previous Ontology similarity mea-
vice, it mostly depends on the measurement of the similarity sures and their application situations, in our SWS context
between an user’s service requirements and the profiles of two methods are combined to adapt to calculate the se-
published services. Currently, the semantic Ontology lan- mantic distance of single formal Ontology concepts, e.g.
guages for services, such as the OWL-S1 and the WSMO2 , code and zip in the examples of figure 1. The approaches
are required to semantically represent service capabilities, are a fuzzy-weighted associative network (edge-based mea-
including non-function information (including Qos of ser- sure) and an information-theoretical approach (content-
vice), functional information (IOPE of services operations, based measure).
denoting input, output, precondition and effect). The ser- Also regarding the compound ontology concepts in se-
1 OWL-S, http : //www.w3.org/Submission/OW L − S/ mantic service context, for example, in figure 1 the concepts
2 WSMO, http : //www.wsmo.org findZipCodeDistance and CalcDistTwoZipsKm, are not the
formal single terms as the ones in WordNet4 , in this case
the traditional method (e.g. edit distance of strings) is not
useful to measure their semantic similarity. Therefore, we
refine the hierarchical clustering algorithm to calculate the
distance of two compound concept terms, similar to [7, 4].
In this paper we aim to solve the ontology similarity in
a semantic service environment. First, we differentiate two
cases of service ontology concepts: single and compound
ontology concept to measure their similarity in service con-
text. Then, a hybrid Ontology-similarity measurement is
proposed by combining and refining three existing methods. Figure 1. Snatch of Semantic Web Services
Finally, we define our ontology similarity-based model as Description
simS = ΣsimO ∈ [0, 1] to improve the service selection;
This model fuzzily and quantitatively measures the service
similarity basing on service ontology similarity.
This paper is structured as follows. In Section 2 we state both can provide detailed information of a city according to
the occurring problems of Ontology in Semantic Web Ser- the given zip code.
vice description context, and investigate the specific name Further, if we assume that a machine can understand
features of ontology concept. A Ontology concept distance some similarity between {zip, ZipCode, Zip Code 1,
definition and three refined ontology similarity algorithms code, code1} and {CaleDisT woZipsKm, f indZipCode
are discussed with examples of single formal term and com- Distance}, then intuitively and naively from the above ex-
pound term in Section 3. The service similarity measure- ample services of Figure 1, we know that sws1:operation1
ment is defined in Section 4. In Section 5 and 6 we re- is similar to sws5:operation1; sws2:operation1 is similar to
spectively discuss related works, and give conclusion and sws3:operation2 and sws4:operatio2; and sws3:operation1
indications for future work. is similar to sws4:operation1.
Obviously, similarity, whether syntactic or semantic, the
matching of ontology concepts used in service description
2 Ontology in SWS Description Context is a critical challenge. If a machine can not understand the
meaning of service concepts, it also cannot infer the imply
2.1 Problem Statements relationships, then the automatic matching and discovery of
services is impossible.
In order to illustrate the challenge of measuring the simi-
larity of semantic Web services, we extract a set of zip code 2.2 Naming Conventions for Ontologies
related services from the dataset of OWL-S annotated Web
Services of the University College Dublin3 . In Figure 1., Intuitively and in ontology-related work, when ontol-
there are snatch description of four services, which are used ogy terminology is mentioned, it mostly means the terms
for looking up a zip code or calculating the distance be- in thesauri, e.g., Wordnet. On the other hand, the ontology
tween two places according to the given zip codes. The terms defined in applications is very different from the for-
information shown is retrieved from the wsdl documents of mal words. Generally, the ontology concept used in service
the respective service. semantic descriptions are most compound terms, which are
Current service matchmaking algorithms normally focus named depending on service developers by their ontology
on measuring the syntactic (as service name, service text knowledge, experience and wonted. The situation is made
description and so on) and semantic (as service capabilities) worse by the following practices (parts of examples from
of service. Taking sws4 and sws5 of Figure 1 as examples, Fig.1.):
if we assume that zip and code have the similar meaning,
intuitively, by comparing service name and operation name, Abbreviations Names are not given in their correct forms,
service sws4 and sws5 are regarded as similar from the sig- but shortened, e.g. CalcDistTwoZipsKm;
nature level; and by matching their operation, as both opera-
tion2, they also have similar inputs and outputs, so that sws4 Associated words with capitalization or delimiters
and sws5 are concluded as similar services. This means that Words have the form of associations of several words
4 WordNet,
parts (full word or abbreviation) with delimiters, nor-
an online lexical reference system, http :
//wordnet.princeton.edu/
mally a part’s first letter capitalized, and sometimes
3 The Semantic Web Services Repository at the Smart Media Institute also using underscore, dash or space, e.g., LogIn,
in University College Dublin, http : //moguntia.ucd.ie/repository/. AcctName, ArrivalAirport In.
Words with suffix and prefix Examples are hasFlavour, fuzzy-weighted value on each link can be constructed, in
locatedIn. which the similarity of concepts can be measured by the
shortest distance as [5] and [17], which is defined as Diss
Variations or misspelling Names may be variations of in our context.
word often due to grammatical flexion, e.g., Book- As the detailed explanation by [5] and correspon-
ing, madeFromGrape; And defined words are in mis- dence to OWL-Lite, we define four concept relations as
spelling format for machine. generalization (e.g., superclass), specification (e.g., sub-
Free inventions Any other cases the traditional similarity class), negative association (e.g., disjoined) and positive as-
measures (based, e.g., on WordNet) are prevented to sociation (e.g., equivalent).
work. Therefore, the distances of arbitrary two nodes in the net-
work can be calculated based on Tables 1–3 [17]. In Table.1
Considering the above compound concept terms, the ex- s, g, p and n represent explicit relationships, that is, each
isting ontology measure algorithms can not work. More- two notes relationship can be evaluated basing on triangu-
over, the data clustering algorithm from data mining field lar norms. τ in Table 2. are the triangular norms (t-norms),
can be borrowed to apply to this case. This paper en- which is defined in Table3, where α or β are fuzzy-weighted
hances the clustering algorithm in [4] to measure the se- strength values of relations (0 ≤ α, β ≤ 1), n is the de-
mantic closeness of composed terms. gree of dependence (−∞ ≤ n ≤ ∞) between the relation-
ships, details please refer to [9]. In the tables those fields
are marked with X for which there is no definition. There-
3 Ontology Similarity fore, the relationship of two arbitrary concepts can easily be
inferred by traveling through the associative network.
3.1 Ontology Concept Distance
g s p n g s p n
To semantically measure Ontology concept distance, we g τ3 τ1 τ2 τ2
g g p p n
should consider both concept structure and concept content. s τ1 τ3 τ2 τ2
s p s p n
Fortunately, both of these information are prolifically pro- p τ2 τ2 τ3 τ3
p p p p n
vided by service description. Here, we define the semantic n τ2 τ2 τ3 X
n n n n X
distance dis of the assumed concepts C and D (which could
be single formal term or compound term) as:
Table 1. Kind of Table 2. Strength
paths of paths
3
X
dis = w1 ∗ Diss + w2 ∗ Disi + w3 ∗ Disc , wi = 1 (1) τ1 (α, β) = max(0, α + β − 1) n = −1
i=1
τ2 (α, β) = αβ n=0
where Diss is the distance basing on the structure of τ3 (α, β) = min(α, β) n=∞
concept in service Ontology, the Disi basing on the com-
mon contents shared by concepts, and the Disc is only used Table 3. T-norms function
to measure the compound concept terms by clustering con-
cepts, basing on the concept elements co-occurrence. For- 3.3 Information-theoretical Approach
mulae 1 not only considers the different concept naming
features, but also make up the loss of any single approach, The definition of similarity between the concepts C and
because the service description context is just a structure D relates to the concepts’ commonality and difference [11]:
and a short piece of text, not a corpus or thesaurus.
I(common(C, D)) log P (common(C, D))
In the following sections, we will present the detail ex- sim(C, D) = =
I(description(C, D)) log P (description(C, D))
planation of every distance measurement. (2)
where common(C, D) is a proposition that states the com-
3.2 Fuzzy-weighted Associative Network monalities between C and D, I(common(C, D)) is the
amount of information contained in this proposition and,
Concepts in a hierarchical taxonomy are all related by similarly, I(description(C, D)) is a proposition describing
certain relationships, based on which concepts can be rep- what C and D are. In our service context, we refine the sim-
resented in an associative network consisting of nodes and ilarity expression as follows to calculate the distance Disi :
edges, where nodes denote concepts, edge denotes the bi-
nary relationship of the two linked concepts. Also for ser- |C ∩ D|
vice description Ontology, such associative network with Disi = , γ, δ ∈ [0, 1] (3)
|C ∩ D| + γ|D/C| + δ|C/D|
where C and D are two Ontology concept classes of OWL- 3.5 Concept Clustering for Compound
Lite, |C ∩ D| is the number of common elements of C and Term
D, e.g., the number of shared attributes, instances and rela-
tional classes, γ and δ are weight values defining the relative Clustering is also a well known approach to group data
importance of their non-common characteristics. on the basis of a certain similarity criteria. We adapt this
clustering mechanism here to group the compound Ontol-
ogy concept terms, which are from different service de-
3.4 Ontology Distance for Single Formal
scription Ontologies, e.g. findZipCodeDistance and Cal-
Term
cDistTwoZipsKm, in order to calculate their similarity by
the distance disc .
Given two single-form Ontology concept terms from In the clustering algorithm, the association rule of two
two differen Web service description, as t1 and t2 , which terms t1 and t2 is defined as follows [4]:
are respectively described by a set of other class terms
t1 −→ t2 (s, c)
as their properties, instances and relational members (e.g.,
“g,s,n,p”). There are two cases: where, the support s is the probability s = P (t1 ) = kTt1k
kT k
that t1 occurs in T , kT k is the cardinality of the ontology
terms’ domain, kTt1 k is the cardinality of the set which con-
tains t1 , the confidence c is the occurrence probability of t2
kT k
in the case that t1 occurred, i.e., c = P (t2 |t1 ) = kTt1t,tk2 ,
1
with kTt1 ,t2 k is the cardinality of the set containing both
t1 and t2 . The distance of two terms is weighted by their
conditional probability c. The center of a cluster is the term
which has the highest occurrence probability of the cluster.
Figure 2. Example of distance in Associative In detail, including the natural language term extraction
Network the clustering algorithm is used by us as follows:
1. Read service description document .owl, move all
OWL-Lite tags, extract names and parameters, and
• Two terms t1 and t2 are organized in one hierarchical delete redundancies in the vocabularies. The result is
structure, which is transformed to a fuzzy weighted as- a bag of unique words including composted concept
sociative network of Section 3.2. terms, denoting T = {t1 , t2 , ...}.
Reconsidering the example in Fig. 1, it assumes 2. Preprocess all composted terms in T as follows.
that in the Zip service application domain, terms Suppose that ti ∈ T is a composite term, we split
have the relationships (which are all experimental it up on the basis of its delimiters, such as capital
data, not the real value) cp.Fig. 2. For example, letters, into several parts. Then, we deal with each
the distance of term State and Zip is examined, the part towards extracting the word stem by removing
shortest path is path = {State, P lace, Code, Zip} stop words, suffixes and prefixes, restituting abbrevia-
with State =⇒g,0.9 P lace, P lace =⇒p,0.9 Code tions or correcting misspelling, deleting redundant vo-
and Code =⇒s,0.9 Zip. So that it hold that cabulary terms and so on, resulting in the set ti =
τ2 (τ2 (0.9, 0.9)0.9) = 0.729), following Table 1-3, {ti1 , ti2 ...}. Substituting ti by all tij ∈ ti for all i,
that means State =⇒p,0.729 Zip, finally we get ultimately yields T 0 .
Diss (t1 , t2 ) = 0.729.
3. Compute the values s and c for any two terms in T 0 ,
store them into a table in descending order, cluster
them on the basis of their confidence c ≥ τc and
• Terms, t1 and t2 , are concept classes, respectively con- support s ≥ τs (τs and τc are thresholds either
sisting of a set of properties and instances as ontology assigned or obtained experimentally), resulting in the
vocabulary according to Section 3.3. set T 00 = (X1 , X2 , ..., Xk ) of k clusters.
Assuming that their cardinality are |t1 | = 9, |t2 | = 6,
and they share the number of elements |t1 ∩ |t2 | = 5, Roughly speaking, Xi , 1 ≤ i ≤ k is a cluster includ-
5
we obtain Disi (t1 , t2 ) = 5+4+1 = 0.5, where γ, δ = ing those terms whose co-occurrence probabilities
0.5. exceed the threshold τc . In traditional agglomeration
clustering algorithms, T 00 is an intermediate result, functional (basing on logical subsumption computing) to-
while in our context we should improve it in order to gether with qualities of services. Obviously, either Non-
find an optimal clustering for our computation. This is function or function-based based selection, the ontology
the rationale of the algorithm’s further steps. concept similarity is critical fact for service selection.
Under this selection model, we define an ontology-based
service similarity algorithm. Especially when the non-
4. In each Xi ⊆ T 00 , remove the frequent and rare pa- function properties are considered during service selection,
rameters to avoid the query expansion and over-fitting because the non-functional related service selection is on-
problems, which are discussed in the field of informa- tology based.
tion retrieval [8]. The idea is to measure the service similarity by the sim-
5. Split and merge the clusters in T 00 , in order to wipe ilarity of service name, service operations name, which
off the noise terms and optimize clusters by agglom- are defined as Ontology concepts. We do not compare
erating terms according to concentric circularities with the whole piece service Ontologies, for example simSO :
different radii. (SOi ) × (SOj ) → [0..1], where SOi is the service ontol-
ogy for service si ; We only consider how similar two sin-
The inner circularity consists of those terms, which gle ontology concepts are in service ontology context, as
are, at least, close to half of the other terms. Similarly, sim(ci , cj ) = {f (ci , cj ) | ci ∈ SOi ∧ cj ∈ SOj } and
the terms in the outer circularity are, at least, close to the function f (ci , cj ) = mink=1,...,j dis(ci , ck ). Therefore,
a quarter of the other ones. They are called them 21 our work is different from Ontology mapping.
radius [4]. And wiping off the terms, which are not in The proposed Ontology-based service selection basi-
any circularity. cally measure by the service name concepts and opera-
For example, to merge two cluster X1 and X2 , when tions similarity, called lexical semantic level. It is de-
∀i ∈ X1 ∪ X2 , fined as simService = simConcepts + simoP eration , where
1
simConcept is the sum similarity of all the concepts of ser-
k j|j ∈ X1 ∪ X2 , i 6= j, i −→)j(c > τc1 ) k≥
2
(k X1 k + k X2 k −1)
vices, and simoP eration is the sum similarity of the opera-
(4)
tion parameters with their data types.
Now, when calculating the distance between two random
composite terms, here we used c1 and c2 distinctively, first, 5 Related Work
preprocess them using step (2) to obtain c1 = {c11 , c12 , ...}
and c2 = {c21 , c22 , ...}, and then measure their similarity Similarity of ontologies has widely been researched, e.g.,
disc by the probability of pairs of two terms to occur in the in the fields of information retrieval, artificial intelligence,
same cluster. As measure the maximum, minimum or or databases, and especially in data mining and web mining.
mean may be employed. Here we take the maximum as the Many similarity measures are applied, e.g., Bernstein et al.
optimistic way, the formula is as follows, in [3] use two ways to measure the semantic similarity of
n objects in an ontology, which are organized in a hierarchi-
max(sim(t1i , t2j )|∀t1i ∈ t1 , t2j ∈ t2 ), if t1i , t2j ∈ Xk
Disc = 0, otherwise. cal ontology structure, viz., the edge-based [10] (a shorter
(5) path from one node to the other) and the node-based [14]
(the notion of shared information content) approach. Ac-
Obviously, such a formula implies as extreme case, that is,
tually, they present five different distance measures of on-
all of the sub-terms of c1 and c2 have been wiped off as the
tologies, where ontology distance stands for the shortest
noise words, such case have no way to scale the distance
path through a common ancestor in a directed acyclic graph.
of ontology concepts. This part of work is right what our
However, computational degree and weight of edge are not
experiment will analysis, to evaluate the frequency of its
considered. The vector space approaches computing co-
occurrence.
sine or Euclidean distances of k-dimensional vectors [1, 13]
do not easily apply to nominal concepts, as it is difficult to
4 Service Similarity represent them as vectors. The Full-text Retrieval Method
(TF/IDF) is mostly used in information retrieval [1] to com-
In our previous work [16], a semantic service model for pare documents, which are considered as bags of words.
selection is proposed as s = (N F, F, Q, C). By this model, However, it is inadequate for structure concepts as semantic
the service selection can happen by filtering single property relations between them are ignored.
as Non-functional (in this model, only the service name and The work most closely related to ours are the studies on
service category and short service text description defined ontologies in the semantic web or in semantic web services,
as non-function) or combined properties as Non-functional, such as [7, 4] and [12]. While they consider to cluster the
similar terms, and most recur to TF/IDF to measure con- [5] Fankhauser, P., Kracker, M., and Neuhold, E.J., Se-
cept similarity, we follow Dong’s notion of name clustering mantic vs. structural resemblance of classes, ACM
agglomeration algorithms. Maedche et al. also propose an SIGMOD RECORD, vol(20), 4:59-63, 1991.
approach to cluster ontology-based data, using the hierar-
chical clustering algorithm to consider instances of concept [6] Gomez-Perez, A., and Corcho, O., Ontology lan-
similarity. Hau et al. elaborate a metric to measure the simi- guages for the Semantic Web, Intelligent Systems,
larity of semantic services annotated with OWL ontologies. IEEE, Jan/Feb pp.54-60, 2002.
They mainly depend on the information-theoretic approach [7] Hau, J., Lee, W., and Darlington, J., A Semantic Sim-
to match similar ontology instances. Doan et al. computes ilarity Measure for Semantic Web Services, Confer-
the common information content of ontologies to scale their ence WWW2005, Japan 2005.
similarity. We combine multiple approaches to adapt to
SWS environments. Based on a study of definitions and [8] Jones, K.S., Automatic keyword classification for in-
features of ontologies expressed in OWL, and from a com- formation retrieval, Archon Books, 1971.
putational point of view, we calculate the distance of two
[9] Klir, G.L., and Folger, T.A., Fuzzy Sets, Uncertainty
ontologies.
and Information, Prentice Hall, 1988.
6 Conclusions [10] Lee, J.H., Kim H., and Lee, Y.J., Information Retrieval
Based on Conceptual Distance in IS-A Hierarchies. J.
of Documentation, 49:188-207, 1993.
Ontology similarity is unquestionable important for Se-
mantic Web Service similarity when we consider the seman- [11] Lin, D., An Information-Theoretic Definition of Simi-
tic service discovery, selection, composition, and even ex- larity, Fifteenth International Conference on Machine
ecution. This paper tries to propose a ontology similarity- Learning, 1998.
based approach to measure service similarity and presents
[12] Maedche, A., and Zacharias, V., Clustering Ontology-
the primary work on it. The contributions of this paper are
based Metadata in the Semantic Web, Pro. of the Joint
summarized as, 1) analysis the ontology similarity problem
Conferences ECML’02 and PKDD’02, 2002.
in semantic service context, and classify the ontology con-
cept name features used by service description; 2) present [13] Mitchell, T. M., Machine Learning, McGraw-Hill,
a hybrid ontology concept distance method, and further to New York, 1997.
measure the service similarity.
As the complexity of ontology-based service similarity, [14] Resnik, P., Semantic Similarity in a Taxonomy: An
under our model, there is still a lot left for our future work, Information-Based Measure and its Application to
including the set matching of the ontology-based concept Problems of Ambiguity in Natural Language, J. of Ar-
and its type, also the detailed implementation and evalu- tificial Intelligence Research, 11:95-130, 1999.
ation. However, fortunately the preliminary experiments [15] Paolucci, M., Kawmura, T., Payne, T., and Sycara,
show that this new methodology works well. K., Semantic Matching of Web Services Capabilities,
ISWC, Italy, 2002.
References [16] Wang, X., Zhao, Y., and Wolfgan, H., Selection Model
of Semantic Web Service, 7th International FLINS
[1] Baeza-Yates, R., Ribeiro-Neto, B., Modern Informa- Conference on Applied Artificial Intelligence, 2006.
tion Retrieval, ACM Press 1999.
[17] Sycara, K., Widoff, S., Klusch, M., and Lu, J.G.,
[2] Berners-Lee, T., Hendler, J., and Lassila, O., The Se- LARKS: Dynamic Matchmaking Among Heteroge-
mantic Web, Scientific American, 284(5):34-43, 2001. neous Software Agents in Cyberspace. Autonomous
Agents and Multi-Agent Systems, 5:173-203, 2002.
[3] Bernstein, A., Kaufmann, E., Buerki, C., and Klein,
[18] Witten,I., and Frank, E., Data Mining: Practical Ma-
M., How Similar Is It? Towards Personalized Sim-
chine Learning Tools and Techniques with Java Imple-
ilarity Measures in Ontologies, International Tagung
mentations, Morgan Kaufmann, 1999.
Wirtschaftsinformatik, February 2005.
[19] Weinstein, P., and Birmingham, W., Comparing con-
[4] Dong, X., Alon Y. Halevy, Madhavan, J., Nemes, E., cepts in differentiated ontologies, In Proc. of KAW-99,
and Zhang, J., Simlarity Search for Web Services, 1999.
VLDB 2004.