<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Measuring Semantic Similarity using a Multi-Tree Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Behnam Hajian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tony White</string-name>
          <email>arpwhiteg@scs.carleton.ca</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science Carleton University</institution>
          ,
          <addr-line>Ottawa</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recommender systems and search engines are examples of systems that have used techniques such as Pearson's product-momentum correlation coefficient or Cosine similarity for measuring semantic similarity between two entities. These methods relinquish semantic relations between pairs of features in the vector representation of an entity. This paper describes a new technique for calculating semantic similarity between two entities. The proposed method is based upon structured knowledge extracted from an ontology or a taxonomy. A multitree concept is defined and a technique described that uses a multi-tree similarity algorithm to measure similarity of two multi-trees constructed from taxonomic relations among entities in an ontology. Unlike conventional linear methods for calculating similarity based on commonality of attributes of two entities, this method is a non-linear technique for measuring similarity based on hierarchical relations which exist between attributes of entities in an ontology. The utility of the proposed model is evaluated by using Wikipedia as a collaborative source of knowledge.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Similarity refers to psychological nearness between two
concepts. Similarity has roots in psychology, social sciences,
mathematics, physics and computer science [Larkey and
Markman, 2005]. In social psychology, similarity points to
how closely attitudes, values, interests and personality match
between people which can lead to interpersonal attraction.
This can be explained by the fact that similar people tend to
place themselves in similar settings and this consequently
decreases potential conflicts between them. Furthermore,
finding a person with similar tastes helps to validate values or
views held in common. With a mental representation, the
similarity between two concepts is defined as a function of
the distance between two concepts represented as different
points in the mental space [Tversky and Shafir, 2004].</p>
      <p>Semantic similarity is used to refer to the nearness of two
documents or two terms based on likeness of their
meaning or their semantic contents [Tversky and Shafir, 2004].
Conventionally, statistical means (e.g., a vector space model)
can estimate the distance between two entities by
comparing features representing entities [Salton et al., 1975]. For
example, in order to compare two documents, the frequency
of co-occurrence of words in the text corpus represents the
similarity between the two documents. Semantic relatedness
is a broader term than semantic similarity, with the former
including other concepts such as antonymy and meronymy.
However, in certain literature these two terms are used
interchangeably .</p>
      <p>We can compare the similarity of two concepts by
measuring the commonality of their features. Since each
concept is represented by the features describing its properties,
a similarity comparison involves comparing the feature lists
representing that concept. Simply put, concepts which are
near to each other are more similar than points which are
conceptually distant. There are several mathematical techniques
for estimating this distance, such as latent semantic analysis
(LSA) [Landauer et al., 1998]. Measuring similarity among
entities has applications in many areas such as:
recommendation systems, e-commerce, search engines, biomedical
informatics and in natural language processing tasks such as word
sense disambiguation.</p>
      <p>For instance, in user-based collaborative filtering, the
system tries to find people with similar tastes and recommend
items highly ranked by the people which might be
interesting to their peers. Finding people with similar tastes involves
processing of their historical transactions (i.e., items viewed
and ranked by them in their previous transactions) and
calculating similarity between them using one of the methods
described above. On the other hand, in content-based
recommender systems and search engines, the system finds items
which are more similar to show to a user based on his/her
query and the similarity of the items (i.e., products). This
category of system calculates similarity between products based
on the commonality of the features of different products.</p>
      <p>In information retrieval (IR) and search engines, words are
considered as features in a document or a query. In IR
systems, it is conventional to represent a document by a
bagof-words (BOW). A Vector Space Model (VSM) is
generally used to estimate the similarity between two documents
in classification/clustering tasks or to estimate similarity
between a query and documents in keyword-based search
engines.
In the Vector Space Model (VSM), a document or a query
is represented as a vector of identifiers such as index terms.
However, in many cases, conventional methods such as Dices
coefficient, Pearson’s correlation coefficient, Jaccards index
or cosine similarity, which use VSM to represent a
document, do not perform well. This is due to a document
being represented in a linear form (i.e., a vector of features) in
which semantic relations among features are ignored.
Examples of such problems are: ignoring polysemy (terms having
different sense with a same spelling such as apple as a fruit
and apple as a company) and synonymy (terms with
different spelling having a same sense such as big and large). An
example of the latter problem can be found in recommender
systems which find people with similar tastes according to
their previous transactions. An example of this problem is
demonstrated in the following example in which similarity of
tastes for two people are estimated based on their previous
transactions:</p>
      <p>T1 (Clothes, Boxspring, Mp3Player, Mattress, LCD TV)
T2 (Dress, Bed, Mattress, iPod Touch, LED TV)</p>
      <p>Using the VSM-based method for computing similarity
between the above transactions, these transactions are no longer
similar at all. However, intuitively we have a feeling that LED
TV and LCD TV are related to each other since both are
subclasses of TV. This observation is also true when comparing
iPod Touch and Mp3 Player and for the relationship between
Clothes and Dress.</p>
      <p>In Recommender systems, a basic problem to be solved is
that of the similarity of one item or set of items to another
item or set of items. Either an individual is to be compared
to another or one set of items is to be compared to another
set. As the above example demonstrates, the two individuals
have some similarity in their purchasing profiles and that the
purchasing profile of person P1 (responsible for T1) has some
similarity to that of person P2 (responsible for T2). A survey
on next generation recommender systems [Adomavicius and
Tuzhilin, 2005] proposes ”improved modelling of users and
items” that arguably includes the type of semantic extensions
proposed in this paper.</p>
      <p>This paper proposes replacing the VSM with a non-linear
representation for an entity. The proposed representation
models an entity by its features in a hierarchical format
using an ontology called a semantic multi-tree. Multi-tree
similarity considers semantic relations among the features of
entities in a hierarchical structure using the ontology
classification. This method enhances conventional information
retrieval techniques by computing similarity regarding
commonality of not only the features but also commonality of
semantic relations among features and their parents.</p>
      <p>The rest of the paper is organized as follows: The next
section provides background information on the VSM including
analysis of its limitations as well as related work. In
Section 3, we define our model for representing entities called
the semantic multi-tree. Section 4 concentrates on the
definition of the proposed method for measuring similarity by
use of a semantic multi-tree model. In the following
section, the technique is validated against human judgment in
WordSimilarity-353 and Rubenstein and Goodenough using
Wikipedia categories as a taxonomy. A discussion follows,
with conclusions and future work provided in the final
section.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background and Related Work</title>
      <p>The Vector Space Model is defined as an algebraic model
in which a document or an entity is represented as a
vector of its features; for example, a document which is
represented as a vector of index terms [Salton and McGill, 1983]:
dj = (w1j ; :::; wnj ). In these vectors, wij represents the
number of occurrences of the ith term in the jth document.
There are two model representation schemes. The superior
scheme for representation of vectors in this model is term
frequency-inverse document frequency (tf-idf). In the other
scheme, wij is a binary representation of the occurrence of a
corresponding term. In order to retrieve a document among
a collection of documents, we have to calculate the similarity
between our query and all of the documents in the collection
and choose the most relevant documents among them. A
frequently used method for estimating the similarity is
calculating the cosine of the angle between the vectors representing
query and a document. The higher the cosine of the angle
between two vectors the more similar the vectors and, therefore,
the more similar the entities represented by the vectors.
di:q
(1)
(2)
cos = sim(di; q) =
sim(di; q) =
jdijjqj</p>
      <p>Another method for calculating similarity is
Pearson product-moment correlation coefficient (PMCC). This
method calculates the correlation (linear dependence)
between two vectors. The PMMC of two vectors is defined as:
cov(di:q)
di q</p>
      <p>Calculation of similarity is straightforward with these
methods but a disadvantage is that neither of them considers
semantic relations among features.</p>
      <p>A conventional method for computing semantic
relatedness is the corpus-based technique that relies on the tendency
for related words to appear in similar texts called Latent
Semantic Analysis (LSA). Unfortunately, LSA is only able to
provide accurate results when the corpus is very large.</p>
      <p>In recent years, several techniques have been developed –
such as the work proposed by Ted Pedersen et. al – for
estimating similarity by measuring semantic distance between
two words in WordNet [Pedersen et al., 2005]. A limitation of
using lexical databases such as WordNet or similar resources
is that they have been created by one or a group of linguists
rather than experts in different subjects. Furthermore,
WordNet is rarely revised when compared to collaborative
knowledge sources such as Wikipedia. As a result, WordNet does
not include some special proper nouns in different areas of
expertise (e.g., Obama, Skyneedle). Recently, Wikipedia has
compensated for this lack of knowledge by providing a
mechanism for collaboratively creating knowledge. Wikipedia
includes a wide range of articles about almost every entity in
the world by using human expertise in different areas. In
addition, as of May 2004, Wikipedia articles have been
categorized by providing a taxonomy; namely, categories. This
facility provides hierarchical categorization with multiple
parents for a node by means of a multi-tree structure. This
observation motivates us to use Wikipedia as a resource of
knowledge in this paper.</p>
      <p>There are several approaches for measuring semantic
relatedness using resources such as WordNet or Wikipedia
categories as a graph or network by considering the number or
length of paths between concepts. In the WikiRelate project,
Ponzetto and Strube used three measures for computing
semantic relatedness: First, a path-based measure using the
length of the path between two concepts; second, an
information content-based measure and third, the overlap-based
measure which applies the Lesk algorithm that defines the
relatedness between two words as a function of the overlap
between two contexts defining the corresponding words. In
WikiRelate [Strube and Ponzetto, 2006], a pair of Wikipedia
pages is first retrieved, then categories they refer to are
extracted and finally, the relatedness between two concepts is
computed regarding the paths found between two concepts in
the Wikipedia categories. In the last step, Ponzetto and Strube
calculate relatedness by selecting the shortest path and the
paths which maximize the information content-based
measure.</p>
      <p>In contrast with statistical methods for computing
relatedness such as LSA, Gabrilovich and Markovitch proposed
Explicit Semantic Analysis (ESA) using meaning in natural
concepts derived from Wikipedia [Gabrilovich and Markovitch,
2007]. In this work, they used Wikipedia articles for
augmenting the text representation and constructing a weighted
list of concepts. They finally used tf-idf and conventional
machine learning methods to calculate relatedness between the
weighted vectors constructed in the previous steps.</p>
      <p>Milne and Witten used cross referencing in the Wikipedia
link database in order to obtain semantic relatedness between
two concepts called Wikipedia Link-based Measure (WLM)
[Witten and Milne, 2008]. In WLM, they used tf-idf using
link counts weighted by the probability of occurrence of a
term in an article. Almost all of the previous research has
used tf-idf and VSM to calculate the relatedness between two
sets of features.</p>
      <p>Ruiz-Casado et al. [Ruiz-Casado et al., 2005] proposed
a procedure for automating the extension of an existing
ontology or lexical semantic network using Wikipedia. In
their work, WordNet was used in conjunction with the
english Wikipedia to generate very high accuracy for
polysemous words. Furthermore, there are several works based on
computation of similarity between two ontologies, such as
[Rodr´ıguez and Egenhofer, 2003] in which the authors used
the set theoretic Tversky similarity measure in the matching
process. The model proposed focuses on the matching
between classes and entities in two ontologies. [Maguitman et
al., 2005] proposes a graph-based algorithm for calculating
similarity between entities in a graph. This algorithm is based
on the Open Directory Project (ODP) which is a large
humanconstructed directory of the Web, using portals and search
engines. One of the models which proposes a new
similarity model based on the structure of an ontology is
[SchickelZuber and Faltings, 2007]. This paper proposes Ontology
Structure based Similarity (OSS) in which an a-priori score is
calculated to measure similarity between two concepts in an
ontology. [Lee et al., 2008] also concentrated on measuring
similarity between two concepts in a single ontology. None
of these works tries to model an entity or a text document by
its features by a structured dictionary such as Wikipedia
categories in order to measure semantic similarity between two
documents.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The Semantic Multi-Tree Model</title>
      <p>Semantic tree is a term with several different meanings in
computer science. The term is regularly used as an
alternative term for a semantic tableaux which is a very well known
method in logic (i.e., a resolution method for mechanized
reasoning) [Annates, 2005]. Semantic tree in this paper is
interpreted as the taxonomy of entities in an ontology.</p>
      <p>Taxonomy in the literature is defined as the practice and
science of classification. Almost all objects, places,
concepts, events, properties and relations can be classified into
a taxonomic hierarchy. In the ontological literature,
taxonomy refers to a set of concepts with is-a (i.e.,
SubClassOf/InstanceOf) relations between them. Therefore, we
consider taxonomy as a narrower concept than ontology since
ontology includes broader relations such as part-of, has-a, rules,
axioms and events as well as classes, hierarchical relations
and attributes.</p>
      <p>Definition 1: A taxonomy, O, is defined as a set of classes,
and is-a relations between them, O = (CO; RO). In formal
logic, the is-a relation is defined as a subclass/instance-of
relation in an ontology:</p>
      <p>Subclass/Instance-of: 8x : ci(x) ! cj (x). Such that
8ci; cj 2 CO; is-a(ci; cj ) 2 RO:</p>
      <p>Definition 2: A Multi-Tree is defined as a tree data
structure in which each node may have more than one parent. A
multi-tree is often used to describe a partially ordered set. In
this paper, a taxonomy of concepts is modelled by a
multitree structure in which each concept may refer to multiple
super-concepts. It should be noted that in taxonomies such as
Wikipedia Categories and WordNet cycles do exist; however,
we have avoided capturing them by breaking edges creating
directed cycles and capturing the rest of the graph by using a
multi-tree structure. It should also be noted that the definition
used here is more general in that the algorithms used allow
for the existence of multiple paths between a leaf and root
node; i.e., the diamond-free poset requirement is relaxed.</p>
      <p>Formally, in this paper, a multi-tree is a directed acyclic
graph, T = (V; E; C; L; M; W; P ), with hierarchical
categorization of its nodes in different levels such that:</p>
      <p>V is a set of vertices (nodes), V = fv1; :::; vng. Each
vertex corresponds to a concept in the taxonomy.
E is a set of edges, E = fe1; :::; eng, (in which e =
hvi; vj i is an ordered set representing an edge from node
vi to node vj . Each edge represents an is-a relation
between two concepts ci; cj which means (ci is-a cj ). The
direction in this digraph is always from a concept
(subclass) to its parent (super-class).</p>
      <p>C is a set of terms representing concepts which are used
as nodes labels.</p>
      <p>L is a function mapping V to R L : V ! R assigning
a real number to each node. This function is recursively
defined as being 1 plus the average value of L for the
children of the node. Initially, this function assigns 0 to
the leaf nodes.</p>
      <p>M is a bijective mapping function mapping V to C (M :
V ! C) assigning a label (representing a concept) to a
node.</p>
      <p>W is a function mapping V to R (W : V ! R) which
assigns a real number as a weight to each node. This
weight is utilized to calculate the similarity between two
entities, which will be discussed in the next section.
P is a function mapping E to R (P : E ! R) which
assigns a real number to each edge as a propagation ratio
of each edge. In this paper P was set to 1.</p>
      <p>The following functions, properties and operators are defined
for a Multi-Tree:
leaf(v) is a function mapping V to ftrue; f alseg that
returns a Boolean value indicating whether a node is a
leaf node or not. A leaf node in Multi-Tree does not have
any children. A multi-tree may have several leaves.
root(v) is a function mapping V to ftrue; f alseg that
returns a Boolean value indicating whether a node is a
root node or not. A Multi-Tree node is a root if it does
not have any parents. A multi-tree has only one root
node.
children(v) is a function mapping V to P(V) (the power
set of V) that returns the set of all the direct children of
the node.
parents(v) is a function mapping V to P(V) (the power
set of V) that returns the set of all the direct parents of
the node.</p>
      <p>v = jchildren(v)j is defined as the cardinality of the
child set of node v. (count of children of the node v)
v = jparents(v)j is defined as the cardinality of the
parent set of node v. (count of parents of the node v)
The combination operator with the symbol ] is defined
between two multi-trees T1, T2 and returns a multi-tree
Tu containing all the vertices and edges that exist in both
T1 and T2. In other words, this operator returns the
combination of two multi-trees. Tu = T1 ] T2 )</p>
      <p>Tu =
8 Eu = E1 [ E2 8 LTu
&gt;&lt; Vu = V1 [ V2 &gt;&lt; M Tu
&gt;: Cu = C1 [ C2 &gt;: PWTTuu
(3)
The weights of the vertices in the tree Tu are calculated
by a recursive function W Tu : V R ! R as
defined in equation 4. In the proposed algorithm, weight
is propagated from the leaves to the root of the
multitree combined from two multi-trees. In this equation,
is a damping factor (degradation ratio). The damping
(4)
(5)
(6)
(7)
factor causes the nodes at lower levels of a multi-tree
(i.e., nodes near to the leaves) to contribute more to the
weight than nodes in higher levels.</p>
      <p>W Tu (vi; ) =
This function considers nodes in a multi-tree in three
categories: leaves, root and nodes situated between leaves
and the root whose weights are calculated by functions
; and respectively.</p>
      <p>is a function mapping V ! f0; 1g. This function
determines whether a specific node, vi, in a combined tree
Tu exists in both of the trees from which it is constituted
(T1; T2).</p>
      <p>Tu (vi) =
is a function mapping V R ! R . This function is
used to calculate the weights of the nodes in a multi-tree.</p>
      <p>Tu (vi; ) = (
1
vi
)(</p>
      <p>X
8vx2children(vi)</p>
      <p>P (vi; vx)W Tu (vx; ))
is a function mapping V R ! R. This function
returns the weight of a node if the node is neither a leaf
node nor the root of the multi-tree.</p>
      <p>Tu (vi; ) = (1</p>
      <p>1</p>
      <p>L(vi)+1 ) Tu (vi; )
+(</p>
      <p>1</p>
      <p>L(vi)+1 ) Tu (vi)
The function calculates the weight of a node by 1
1</p>
      <p>L(vi) share of the average of the weight of its children
which calls the function to calculate the weight of each
child recursively and L1(vi) share for the commonality
of a node between the multi-trees of two concepts.</p>
      <p>The above description is best illustrated by the following
example. The example, shown in Figure 1,2 and the
calculation using equations 5-7 illustrated in Figure 3, demonstrates
how the above functions work for two entities represented by
their features (i.e., products appeared in the profiles of two
users).
d1=(Web Cam, Digital Camera, LCD TV, Blender, Mattress)
d2=(Keyboard, DSLR Camera, LED TV, Mattress, Drawer)</p>
      <p>Using a VSM, the similarity between d1; d2 is equal to
0.2. However, using the proposed method, although LED
and LCD are not equal they have a parent in common which
makes the weight non-zero. Considering = e = 2:71 (also
used in experiments reported later), the similarity between
d1; d2 is: 0.444.</p>
      <p>W(Keyboard)=0, W(Web Cam)=0, W(Digital Camera)=0,
W(DSLR)=0, W(LED)=0, W(LCD)=0, W(Blender)=0,
W(Drawer)=0, W(Mattress)=1, W(Kitchen)=0</p>
      <p>Everything
Electronic</p>
      <p>Furniture
It is clear from the above example that the proposed
method generates a higher similarity value when ontological
information is included. From a recommender system
perspective this is desirable in that it allows purchasing profiles
to be compared in a semantic manner.</p>
    </sec>
    <sec id="sec-4">
      <title>Similarity using a Semantic Multi-Tree</title>
      <p>In order to calculate the similarity between two entities, we
construct two multi-trees each of which represents features of
the corresponding entity in a hierarchical format according to
a specific ontology or taxonomy. In this method, each entity
is represented by its features as leaves of a multi-tree. The
rest of each multi-tree is constructed according to the domain
taxonomy (e.g., WordNet or Wikipedia Categories). Hence,
the multi-tree corresponding to each entity is a sub multi-tree
of the taxonomy with which the sub-multi-tree is constructed.
A multi-tree Tx is said to be a sub multi-tree of TO if:
Tx</p>
      <p>TO ,
( Vx</p>
      <p>Ex
Cx</p>
      <p>VO
EO
CO
(8)</p>
      <p>Assume that, TO = (VO; EO; CO; LO; MO; WO; PO),
is a multi-tree representing the domain taxonomy (e.g.,
Wikipedia Categories), O = (CO; RO), in which CO
represents set of concepts and RO represents set of relations
among concepts in the taxonomy. The transformation
function T is defined as a bijective function T : RO ! E which
maps each relation in the taxonomy O to an edge in the
multitree TO. So, Ex = fei = T (Ri) j Ri 2 ROg. (CO CO
and EO RO).</p>
      <p>The multi-tree, Tx = (Vx; Ex; Cx; Lx; Mx; Wx; Px)
TO, corresponds to entity dx = (c1; :::; cn) in which ci is
a term representing a feature of this entity as well as a
concept in the taxonomy. We define Cx CO in multi-tree Tx
as a set of terms representing features of the entity dx.</p>
      <p>Hence, Cx = fc1; :::; cng [ fcj j 8ci 2 Cx; 8cj 2
CO; is-a(ci; cj ) 2 ROg, Vx = fvi = M (ti) j ti 2 Cxg and
Ex = fei = T (Ri) j 8ck; cl 2 Cx; Ri(ck; cl) 2 ROg such
that Cx CO and O = (CO; RO) is the taxonomy which is
used to construct the tree.</p>
      <p>In the next step, the combination operator is applied to the
two trees whose similarity is being computed. Applying the
combination operator to the two trees, the weight of the root
of the combined tree represents the similarity of the two trees.
The weight of the root is recursively calculated by application
of equations 5-7. The following steps demonstrate the
process of calculating the similarity between two entities d1; d2
represented by sets of features C1; C2 respectively:
1. Construct multi-trees T1 and T2 from sets of features C1
and C2 respectively.
2. Construct Tsim = T1]T2 as a combination of two
multitrees T1; T2 TO
3. Update the weights for the nodes in the combined
multitree Tsim using the recursive equations 5-7.
4. The weight of the root of Tsim is the value which
represents the similarity of two entities represented by
C1; C2; i.e., Sim(d1; d2) = W (root(Tsim)).</p>
      <p>Algorithm 1 and 2 describes the process by which a
multitree is constructed from a feature set representing an entity.</p>
      <sec id="sec-4-1">
        <title>Algorithm 1 Constructing a multi-tree.</title>
        <sec id="sec-4-1-1">
          <title>Proc ConstructMulti-Tree(ConceptSet Cx)</title>
          <p>Tx null
for all c in Cx do</p>
          <p>FindPaths(TO:M 1(c); TO; Tx)
end for
return Tx
Algorithm 2 Finding a path in a multi-tree from a leaf node
to root.</p>
          <p>Proc FindPaths(Node v,Multi-Tree TO,Multi-Tree Tx)
if root(v) then</p>
          <p>Tx:root v
return
end if
for all parent in TO.Parents(v) do
if parent not in Tx:V then</p>
          <p>Tx:V Tx:V [ parent
Tx:E Tx:E[ &lt; parent; v &gt;</p>
          <p>FindPaths( parent,TO; Tx)
end iffavoid cyclesg
end for</p>
          <p>Algorithm 3 describes the calculation of similarity using
the proposed non-linear method.</p>
          <p>Algorithm 3 Calculation of similarity using multi-trees.
T1
T2
TSim T1 ] T2
similarity W TSim (root; )</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>ConstructMulti-Tree(ConceptSet C1)</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>ConstructMulti-Tree(ConceptSet C2)</title>
          <p>5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <p>The proposed model is not only useful for measuring
similarity between pairs of words but is also useful for
information retrieval and recommender systems. In this paper, we
evaluated the semantic multi-tree model for the application of
measuring similarity between pairs of words, but the domain
of the proposed model is not limited just to the application
of measuring similarity between pairs of words. Our
rationale for doing this is that the concept domain is potentially
larger as we are not limited to (say) movies, music or books,
domains often used in recommender datasets.</p>
      <p>One of the methods for evaluating psycholinguistic
systems is comparing the results with human judgement.
Among three standard datasets that exist in the domain
of measuring semantic relatedness between pairs of words,
WordSimilarity-353 is the most comprehensive dataset since
this dataset includes all of the 30 nouns of the Miller and
Charles dataset and most of the 65 pairs of Rubenstein and
Goodenough testset [Rubenstein and Goodenough, 1965].
The WordSimilarity-353 dataset contains 353 pairs of words
compared by human agents in terms of similarity
[Finkelstein et al., 2002]. This system has been evaluated by both
WordSimilarity-353 and Rubenstein and Goodenough
testsets. While the Charles and Miller testset is not available on
the web and is already included in WordSimilarity-353, we
are not able to give statistics about the performance of
proposed method on this dataset.</p>
      <p>Wikipedia is a resource of concepts linked to each other
forming a network, which is collaboratively constructed by
human agents around the world. Each page in Wikipedia
is linked to a set of categories classifying concepts and
Wikipedia pages. Each page in Wikipedia describes one of
the concepts associated to a word. A Wikipedia page
describes a concept using other concepts described in other
pages. In order to evaluate the proposed model to estimate
the similarity between two entities, we used Wikipedia as a
resource of knowledge and Wikipedia Categories as a
taxonomy of concepts with which Wikipedia pages are annotated.
The existing links in a Wikipedia page are considered as
features describing the associated concept. Figure 4 illustrates
the Wikipedia link structure data model. In this figure, each
circle represents a wikipedia category and each square
represents a Wikipedia page corresponding to a concept. Solid
lines represent links between pages and a dotted line
represents a link between a page and categories it belongs to.</p>
      <p>Links between pages
Links between pages
and categories
Category C
Page P</p>
      <p>C
P</p>
      <p>F
M</p>
      <p>B
L</p>
      <p>E</p>
      <p>N</p>
      <p>A
C
G</p>
      <p>H</p>
      <p>O</p>
      <p>D
I
P
In this experiment, The VSM is compared to the
multitree model described previously against human judgement in
terms of accuracy and correlation. For this purpose, each
word is mapped to a Wikipedia page, and then a vector
containing the links of the pages to which the corresponding page
is linked is constructed and is referred to as a link vector. In
Figure 4, the page L is linked to fN; O; P g which are
considered as features of the link vector CL = (N; O; P ). Each
link in the link vector represents another page in Wikipedia
which is linked to Wikipedia categories. In the next step,
another vector is constructed according to the categories of a
link vector’s elements (i.e., leaf nodes in Categories) called
a first order category vector. In Figure 4, fN; O; P g are
linked to categories fE; G; H; Ig. Then, the categories of
the main page, L, (i.e., fE; F g) are added to the first
order category vector and then the multi-tree representing the
concept L is constructed from the first order category vector
(CL1st = (E; F; G; H; I)). The rest of the process for
construction of the multi-tree model is the same as described in
Sections 3 and 4. This process is pictorially represented in
Figure 5.</p>
      <p>In VSM, the link vectors are compared using cosine
similarity as described in equation 1. Both VSM schemes
have been evaluated against human judgement with the same
dataset and the results are compared in Table 1. Since, in this
paper, we are not using a corpus-based approach, and in each
experiment only two vectors representing two concepts are
compared, the inverse document frequency of each link can
not be calculated. Therefore, tf was used instead of tf-idf to
implement the second VSM scheme.</p>
      <p>Word 1
Word 2</p>
      <p>Mapping each
word to the
corresponding
Wikipedia page</p>
      <p>Constructing
Vectors of links
to other pages</p>
      <p>Constructing
Vectors of Leaf</p>
      <p>categories
Constructing</p>
      <p>Multi-Trees
Combining
Multi-Trees</p>
      <p>A
B C D D
F E G H F
K I J K K</p>
      <p>Applying
Equations 5-7</p>
      <p>Similarity ?
The average accuracy in Table 1 is measured regarding the
difference between the value of similarity measured by the
techniques tested above and that of human judgement.</p>
      <p>Accuracy = 1 Average(errori)
errori = SimHuman(wi1; wi2) SimComputer(wi1; wi2)</p>
      <p>The correlation was estimated by Spearman’s rank
correlation coefficient. The correlation between human judgement
and the multi-tree model is demonstrated in Figure 6. Table 1
demonstrates that the proposed model achieved better results
than both VSM schemes in terms of accuracy and correlation
with human judgment in estimating similarity between
entities.</p>
    </sec>
    <sec id="sec-6">
      <title>6 Discussion, Conclusions and Future Work</title>
      <p>In this paper we observed that techniques such as linear VSM
ignore the semantic relationships among features. VSM
calculates the similarity between two documents regarding the
commonality of their features. However, in some cases, two
documents may not be equal but may refer to the same
entity. This limits the capability of a VSM to retrieve related</p>
      <sec id="sec-6-1">
        <title>Average Accuracy compared to human judgement</title>
      </sec>
      <sec id="sec-6-2">
        <title>Correlation with Human judgement</title>
        <p>WordSim-353
VSM Boolean
VSM Frequency</p>
        <p>of terms</p>
        <p>Multi-Tree
Rubenstein and</p>
        <p>Goodenough
VSM Boolean
VSM Frequency</p>
        <p>of terms
Multi-Tree
documents. The multi-tree model compensates for the lack
of semantic relatedness among features using taxonomic
relations that exist among the features of two entities. In this
model the similarity weight is propagated from leaf nodes to
the root of the multi-tree. The multi-tree model was evaluated
by using the WordSimilarity-353 and Rubenstein and
Goodenough datasets against human judgement and the results show
that the multi-tree method outperforms VSM in terms of
correlation and the average accuracy against human judgement
for similarity of pairs of words. The results for
WikiRelate and WLM, two previous systems which used Wikipedia
Categories or WordNet to perform the task of similarity
between two words using the same dataset, are shown in Table
2. WikiRelate and WLM were briefly described in Section 2.</p>
        <p>The results in Table 2 show that the method proposed in
this paper outperforms two of the other competitors namely
WikiRelate and WLM by more than 20% and 3%
respectively. However, ESA is the first ranked system using
machine learning techniques as the basis of a semantic
interpreter that is part of a system that maps fragments of
natural language text into a weighted sequence of Wikipedia
concepts ordered by their relevance to the input. Regarding ESA,
it is clear that augmenting the text representation and
constructing a weighted list of concepts provides benefits that
link analysis alone does not completely replace.
Goodenough
WordSim-353</p>
        <p>W-average</p>
        <p>Another potential model extension is in using a more
sophisticated function such as Pearsons product-moment
correlation or cosine similarity instead of the simple average
function in equation 6.</p>
        <p>A potential application of this model is in recommender
systems, which concentrate on similarity between two
products or two people. Referring once again to the example
described in Section 3 and shown graphically in Figures 1, 2
and 3, the similarity of buying patterns can be established
using a semantic multi-tree approach. For the evaluation of
such systems, we need to construct a handcrafted taxonomy
of products plus annotation of the product dataset to the
taxonomy of products. Keyword search engines are also another
potential application of such systems instead of linear VSM.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Adomavicius and Tuzhilin</source>
          , 2005]
          <string-name>
            <given-names>G.</given-names>
            <surname>Adomavicius</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <article-title>Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions</article-title>
          .
          <source>IEEE transactions on knowledge and data engineering</source>
          , pages
          <fpage>734</fpage>
          -
          <lpage>749</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Annates</source>
          , 2005]
          <string-name>
            <given-names>U.</given-names>
            <surname>Annates</surname>
          </string-name>
          .
          <article-title>Semantic tree methodhistorical perspective and applications Izabela BondeckaKrzykowska</article-title>
          . Annales Universitatis Mariae CurieSkłodowska: Informatica, page
          <volume>15</volume>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Finkelstein et al.,
          <year>2002</year>
          ]
          <string-name>
            <given-names>L.</given-names>
            <surname>Finkelstein</surname>
          </string-name>
          , E. Gabrilovich,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rivlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Solan</surname>
          </string-name>
          , G. Wolfman, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Ruppin</surname>
          </string-name>
          .
          <article-title>Placing search in context: The concept revisited</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS)</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <fpage>116</fpage>
          -
          <lpage>131</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Gabrilovich and Markovitch</source>
          , 2007]
          <string-name>
            <given-names>E.</given-names>
            <surname>Gabrilovich</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Markovitch</surname>
          </string-name>
          .
          <article-title>Computing semantic relatedness using wikipedia-based explicit semantic analysis</article-title>
          .
          <source>In Proceedings of the 20th International Joint Conference on Artificial Intelligence</source>
          , pages
          <fpage>6</fpage>
          -
          <lpage>12</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Landauer et al.,
          <year>1998</year>
          ]
          <string-name>
            <given-names>T.K.</given-names>
            <surname>Landauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.W.</given-names>
            <surname>Foltz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Laham</surname>
          </string-name>
          .
          <article-title>An introduction to latent semantic analysis</article-title>
          .
          <source>Discourse processes</source>
          ,
          <volume>25</volume>
          (
          <issue>2</issue>
          ):
          <fpage>259</fpage>
          -
          <lpage>284</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Larkey and Markman</source>
          , 2005]
          <string-name>
            <given-names>L.B.</given-names>
            <surname>Larkey</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.B.</given-names>
            <surname>Markman</surname>
          </string-name>
          .
          <article-title>Processes of similarity judgment</article-title>
          .
          <source>Cognitive Science: A Multidisciplinary Journal</source>
          ,
          <volume>29</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1061</fpage>
          -
          <lpage>1076</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>[Lee</surname>
          </string-name>
          et al.,
          <year>2008</year>
          ]
          <string-name>
            <given-names>W.N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sundlass</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Musen</surname>
          </string-name>
          .
          <article-title>Comparison of ontology-based semanticsimilarity measures</article-title>
          .
          <source>In AMIA Annual Symposium Proceedings</source>
          , volume
          <year>2008</year>
          , page 384. American Medical Informatics Association,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Maguitman et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Maguitman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Menczer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Roinestad</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Vespignani</surname>
          </string-name>
          .
          <article-title>Algorithmic detection of semantic similarity</article-title>
          .
          <source>In Proceedings of the 14th international conference on World Wide Web</source>
          , pages
          <fpage>107</fpage>
          -
          <lpage>116</lpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Pedersen et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>S.P.T.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Patwardhan</surname>
          </string-name>
          .
          <article-title>Maximizing semantic relatedness to perform word sense disambiguation</article-title>
          ,
          <year>2005</year>
          . University of Minnesota Supercomputing Institute Research Report UMSI,
          <volume>25</volume>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Pesquita et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.O.</given-names>
            <surname>Falca</surname>
          </string-name>
          ˜o,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lord</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.M.</given-names>
            <surname>Couto</surname>
          </string-name>
          .
          <article-title>Semantic similarity in biomedical ontologies</article-title>
          .
          <source>PLoS Comput Biol</source>
          ,
          <volume>5</volume>
          (
          <issue>7</issue>
          ):e1000443,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Rodr´ıguez and Egenhofer</source>
          , 2003]
          <string-name>
            <given-names>M.A.</given-names>
            <surname>Rodr</surname>
          </string-name>
          <article-title>´ıguez and</article-title>
          <string-name>
            <surname>M.J. Egenhofer</surname>
          </string-name>
          .
          <article-title>Determining semantic similarity among entity classes from different ontologies</article-title>
          .
          <source>IEEE transactions on knowledge and data engineering</source>
          , pages
          <fpage>442</fpage>
          -
          <lpage>456</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[Rubenstein and Goodenough</source>
          , 1965]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rubenstein</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.B.</given-names>
            <surname>Goodenough</surname>
          </string-name>
          .
          <article-title>Contextual correlates of synonymy</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>8</volume>
          (
          <issue>10</issue>
          ):
          <fpage>627</fpage>
          -
          <lpage>633</lpage>
          ,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [
          <string-name>
            <surname>Ruiz-Casado</surname>
          </string-name>
          et al.,
          <year>2005</year>
          ]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ruiz-Casado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Alfonseca</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          .
          <article-title>Automatic assignment of wikipedia encyclopedic entries to wordnet synsets</article-title>
          .
          <source>Advances in Web Intelligence</source>
          , pages
          <fpage>380</fpage>
          -
          <lpage>386</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Salton and McGill</source>
          , 1983]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton and M.J. McGill</surname>
          </string-name>
          .
          <article-title>Introduction to modern information retrieval</article-title>
          . New York,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Salton et al.,
          <year>1975</year>
          ]
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.S.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>A vector space model for automatic indexing</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>18</volume>
          (
          <issue>11</issue>
          ):
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          ,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>[Schickel-Zuber and Faltings</source>
          , 2007]
          <string-name>
            <given-names>V.</given-names>
            <surname>Schickel-Zuber</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Faltings</surname>
          </string-name>
          .
          <article-title>Oss: a semantic similarity function based on hierarchical ontologies</article-title>
          .
          <source>In Proceedings of the 20th international joint conference on Artifical intelligence</source>
          , pages
          <fpage>551</fpage>
          -
          <lpage>556</lpage>
          . Morgan Kaufmann Publishers Inc.,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>[Strube and Ponzetto</source>
          , 2006]
          <string-name>
            <given-names>M.</given-names>
            <surname>Strube</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.P.</given-names>
            <surname>Ponzetto. WikiRelate</surname>
          </string-name>
          !
          <article-title>Computing semantic relatedness using Wikipedia</article-title>
          .
          <source>In Proceedings of the National Conference on Artificial Intelligence</source>
          , volume
          <volume>21</volume>
          , page 1419. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press;
          <year>1999</year>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Tversky and Shafir</source>
          , 2004]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tversky</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Shafir</surname>
          </string-name>
          . Preference, belief, and
          <article-title>similarity: selected writings</article-title>
          . The MIT Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>[Witten and Milne</source>
          , 2008]
          <string-name>
            <given-names>I.H.</given-names>
            <surname>Witten</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Milne</surname>
          </string-name>
          .
          <article-title>An effective, low-cost measure of semantic relatedness obtained from Wikipedia links</article-title>
          .
          <source>In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy</source>
          , AAAI Press, Chicago, USA, pages
          <fpage>25</fpage>
          -
          <lpage>30</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>