<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Ontology Recommendation and Reuse in WebCORE by Collaborative Assessments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Iván Cantador</string-name>
          <email>ivan.cantador@uam.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Miriam Fernández</string-name>
          <email>miriam.fernandez@uam.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pablo Castells</string-name>
          <email>pablo.castells@uam.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Escuela Politécnica Superior Universidad Autónoma de Madrid Campus de Cantoblanco</institution>
          ,
          <addr-line>28049, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>General Terms Algorithms</institution>
          ,
          <addr-line>Measurement, Human Factors</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this work, we present an extension of CORE [2], a tool for Collaborative Ontology Reuse and Evaluation. The system receives an informal description of a specific semantic domain and determines which ontologies from a repository are the most appropriate to describe the given domain. For this task, the environment is divided into three modules. The first component receives the problem description as a set of terms, and allows the user to refine and enlarge it using WordNet. The second module applies multiple automatic criteria to evaluate the ontologies of the repository, and determines which ones fit best the problem description. A ranked list of ontologies is returned for each criterion, and the lists are combined by means of rank fusion techniques. Finally, the third component uses manual user evaluations in order to incorporate a human, collaborative assessment of the ontologies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology evaluation</kwd>
        <kwd>ontology reuse</kwd>
        <kwd>collaborative filtering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Web can be considered as a live entity that grows and evolves
fast over time. The amount of content stored and shared on the
web is increasing quickly and continuously. The global body of
multimedia resources on the Internet is undergoing a significant
growth, reaching a presence comparable to that of traditional text
contents. The consequences of this enlargement result in well
known difficulties and problems, such as finding and properly
managing all the existing amount of sparse information.
To overcome these limitations the so-called “Semantic Web”
trend has emerged with the aim of helping machines to process
information, enabling browsers or other software agents to
automatically find, share and combine information in consistent
ways. At the core of these new technologies, ontologies are
envisioned as key elements to represent knowledge that can be
understood, used and shared among distributed applications and
machines. However, ontological knowledge mining and
development are difficult and costly tasks that require major
engineering efforts. In this context, ontology reuse becomes an
essential need in order to exploit past and current efforts and
achievements. Novel tools have been recently developed, such as
ontology search engines [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] represent an important first step
towards automatically assessing and retrieving ontologies which
satisfy user queries and requests. However, ontology reuse
demands additional efforts to address special needs and
requirements from ontology engineers and practitioners. It is
necessary to evaluate and measure specific ontology features,
such as lexical vocabulary, relations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], restrictions, consistency,
correctness, etc., before making an adequate selection. Some of
these features can be measured automatically, but others require a
human judgment to be assessed.
      </p>
      <p>
        The Web 2.0 is arising as a new trend where people collaborate and
share their knowledge to successfully achieve their goals. Following
this aspiration, the aim of this research is to enhance ontology
retrieval and recommendation, combining automatic evaluation
techniques with explicit users’ opinions and experiences. This work
follows a previous approach for Collaborative Ontology Reuse and
Evaluation over controlled repositories, named CORE [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The tool
has been enhanced and adapted to the Web. Novel technologies,
such as AJAX1, have been incorporated to the system for the design
and implementation of the user interface. It has also been improved
to overcome previous limitations, such as handling large numbers of
ontologies. The collaborative capabilities have also been extended.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. SYSTEM ARCHITECTURE</title>
      <p>
        WebCORE is a web application for Collaborative Ontology Reuse
and Evaluation. A user logins into the system via a web browser,
and, thanks to AJAX technology and the Google Web Toolkit2,
dynamically describes a problem domain, searches for ontologies
related to this domain, obtains relevant ontologies ranked by
several lexical, taxonomic and collaborative criteria, and evaluates
by himself those ontologies that he likes or dislikes most.
In this section, we describe the server-side architecture of
WebCORE. Figure 1 shows an overview of the system. We
distinguish three different modules. The first one, the left module,
receives the problem description (Golden Standard) as a full text
or as a set of initial terms, than can be extended by the user using
WordNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The second one, represented in the centre of the
figure, allows the user to select a set of ontology evaluation
techniques to recover the ontologies closest to the given Golden
Standard. Finally, the third one, on the right of the figure, is a
collaborative module that re-ranks the list of recovered ontologies,
taking into consideration previous evaluations of the users.
1 Garrett, J. J. (2005). AJAX: A New Approach to Web
      </p>
      <p>Applications. In http://www.adaptivepath.com/
2 Google Web Toolkit, http://code.google.com/webtoolkit/</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Golden Standard Definition</title>
      <p>
        The first phase of our ontology recommender system is the Golden
Standard definition. The user describes a domain of interest
specifying a set of relevant terms that will be searched through the
concepts (classes or instances) of the ontologies stored in the
system. These terms can automatically be obtained by the internal
Natural Language Processing (NLP) module, which uses a
repository of documents related to the specific domain in which the
user is interested in. This NLP module accesses to the repository of
documents, and returns a list of pairs (lexical entry, part of speech)
that roughly represents the domain of the problem. On the other
hand, the list of initial (root) terms can be manually specified. The
module also allows the user to expand the root terms using
WordNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and some of the relations it provides: hypernym,
hyponym and synonym. The new terms added to the Golden
Standard using these relations might also be extended again, and
new terms can iteratively be added to the problem definition.
      </p>
      <p>The final representation of the Golden Standard is defined as a
set of terms T (LG, POS, LGP, R, Z) where:
• LG is the set of lexical entries defined for the Golden</p>
      <p>Standard.
• POS corresponds to the different Parts Of Speech considered
by WordNet: noun, adjective, verb and adverb.
• LGP is the set of lexical entries of the Golden Standard that
have been extended.
• R is the set of relations between terms of the Golden Standard:
synonym, hypernym, hyponym and root (if a term has not been
obtained by expansion, but is one of the initial terms).
• Z is an integer number that represents the depth or distance
of a term to the root term from which it has been derived.
Example: T1 = (“genetics”, NOUN, “”, ROOT, 0). T1 is one of the
root terms of the Golden Standard. The lexical entry that it
represents is “genetics”, its part of speech is “noun”, it has not
been expanded from any other term so its lexical parent is the
empty string, its relation is “root”, and its depth is 0.</p>
      <p>Figure 2 shows the interface of the Golden Standard Definition
phase. In the left side of the screen, the current list of root terms is
shown. The user can manually insert new root terms to this list
giving their lexical entries and selecting their parts of speech.
Adding new terms, the final Golden Standard definition is
immediately updated: the final list of (root and expanded) terms that
represent the domain of the problem is shown in the bottom of the
figure. The user can also make term expansion using WordNet. He
selects one of the terms from the Golden Standard definition and the
system shows him all its meanings contained in WordNet (top of the
figure). After he has chosen one of them, the system presents him
three different lists with the synonyms, hyponyms and hypernyms
of the term. The user can then selects one or more elements of these
lists and add them to the expanded term list. For each expansion, the
depth of the new term is increased by one unit.</p>
      <p>In the problem definition phase a collaborative component has
been added to the system (right side of Figure 2). This component
reads the term currently selected by the user, and searches for all
the stored problem definitions that contain it. For each of these
problem definitions, the rest of their terms and the number of
problems in which they appear are retrieved and shown in the web
browser. With this simple strategy the user is suggested the most
popular terms, fact that could help him to better describe the
domain in which he is interested in.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Automatic Ontology Recommendation</title>
      <p>
        Once the user has selected the most appropriate set of terms to
describe the problem domain, the tool performs the processes of
ontology retrieval and ranking. Our approach to ontology retrieval
can be seen as an evolution of classic keyword-based retrieval
techniques [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where textual documents are replaced by
ontologies.
      </p>
      <p>The queries supported by our model are expressed using the terms
selected during the Golden Standard definition phase. In classic
keyword-based vector-space models for information retrieval,
each query keyword is assigned a weight that represents the
importance of the concept in the information need expressed by
the query. Analogously, in our system, the terms included in the
Golden Standard are weighted, using the depth measure to
indicate the relative interest of the user for each of the terms to be
explicitly mentioned in the ontologies.</p>
      <p>
        To carry out the retrieval process, we focus on the lexical level,
recovering those ontologies that contain a subset of the terms
expressed by the user during the Golden Standard definition. To
compute the term matching, two different options are available
within the tool: search for exact matches or search for matches
based on the Levenshtein distance between two terms.
Furthermore, the tool also offers two different search spaces, the
ontologies and the corresponding knowledge bases.
Let T be the set of all terms defined in the Golden Standard
definition phase. Let di be the depth measure associate with each
term ti ∈ T. Let q be query vector extracted from the Golden
Standard definition, and let wi be the weight associated to each of
these terms, where for each ti ∈ T, wi ∈ [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]. Then, the weight wi
is calculated as:
w = 1
i di + 1
This measure gives more relevance to the terms explicitly expressed
by the user, and less importance to those ones extended or derived
from previously selected terms. An interesting future work could be
to enhance and refine the query, e.g. based on terms popularity, or
other more complex strategies as terms frequency analysis.
The search engine computes a semantic similarity value between
the query and each ontology as follows. We represent each
ontology with a vector oj ∈ O, where oji is the mean of the term ti
similarities with all the matched entities in the ontology if any
matching exists, and zero otherwise. The components oji are
calculated as:
∑ w(m ji )
      </p>
      <p>M ji
o ji = M ji ∑ w(mi )</p>
      <p>Mi
where Mji is the set of matches of the term ti in the ontology
oj, w(mji) represents the similarities between the term ti and the
entities of the ontology oj that matches with it, Mi is the set of
matches of the term ti within all the ontologies and w(mi)
represents the weights of each of these matches.</p>
      <p>
        For example, if we define in the Golden Standard a term “acid”,
this term may return several matches in the same ontology with
different entities as: “acid”, “amino acid”, etc. In order to
establish the appropriate weight in the ontology vector, oij, the
goal is to compute the number of matches of one term in the
whole repository of ontologies and give more relevance to those
ontologies that have matched that specific term more times.
Each component oij contains specific information about the
similarity between the ontology and the corresponding term ti. To
compute the final similarity between the query vector q and the
ontology vector oj, the vectorial model calculates the cosine
measure between both vectors. However, if we follow the
traditional model, we will only be considering the difference
between the query and the ontology vectors according to the angle
they form, but not taking into account their dimensions. To
overcome this limitation, the cosine measure has been replaced by
the simple dot product. Hence, the similarity measure between an
ontology oj and the query q is simply compute as follows:
sim(q, o j ) =q ⋅ o j
If the knowledge in the ontology is incomplete, the ontology
ranking algorithm performs very poorly. Queries will return less
results than expected, the relevant ontologies will not be retrieved,
or will get a much lower similarity value than it should. For
instance, if there are ontologies about “restaurants”, and “dishes”
are expressed as instances in the corresponding Knowledge Base
(KB), a user searching for ontologies in this domain may be also
interested in the instances and literals contained in the KB. To
cope with this issue, our ranking model combines the similarity
obtained from the terms that belong to the ontology with the
similarity obtained from the terms that belong to the KB using the
adaptation of the vector space model explained before. The user
can select a value vi ∈ [
        <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
        ] for each kind of search, and this
value is then mapped to a corresponding value si = vi 5 . Following
this idea, the final score is computed as:
      </p>
      <p>sO × sim(q, o) + skb × sim(q, kb)</p>
    </sec>
    <sec id="sec-5">
      <title>2.3 Collaborative Ontology Evaluation</title>
      <p>
        The third and last phase of the system is compound of a novel
ontology recommendation algorithm that exploits the advantages
of Collaborative Filtering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], exploring the manual evaluations
stored in the system to rank the set of ontologies that best fulfils
the user’s interests.
      </p>
      <p>In WebCORE, user evaluations are represented as a set of five
different criteria and their respective values, manually determined
by the users who made the evaluations: correctness, readability,
flexibility, level of formality and type of model.</p>
      <p>The above criteria can have discrete numeric or non-numeric
values. The user’s interests are expressed like a subset of these
criteria, and their respective values, meaning thresholds or
restrictions to be satisfied by user evaluations. Thus, a numeric
criterion will be satisfied if an evaluation value is equal or greater
than that expressed by its interest threshold, while a non-numeric
criterion will be satisfied only when the evaluation is exactly the
given threshold (i.e. in a Boolean or yes/no manner).</p>
      <p>According to both types of user evaluation and interest criteria,
numeric and Boolean, the recommendation algorithm will
measure the degree in which each user restriction is satisfied by
the evaluations, and will recommend a ranked ontology list
according to similarity measures between the thresholds and the
collaborative evaluations.</p>
      <p>Figure 4 shows all the previous definitions and ideas, locating
them in the graphical interface of the system. On the left side of
the screen, the user introduces the thresholds for the
recommendations and obtains the final collaborative ontology
ranking. On the right side, the user adds new evaluations for the
ontologies and checks evaluations given by the rest of the users.</p>
    </sec>
    <sec id="sec-6">
      <title>3. EXPERIMENTS</title>
      <p>In this section, we present some early experiments that attempt to
measure: a) the gain of efficiency and effectiveness, and the b)
increment of users’ satisfaction obtained with the use of our
system when searching ontologies within a specific domain.
The scenario of the experiments was the following. A repository
of thirty ontologies was considered and eighteen subjects
participated in the evaluations. They were Computer Science
Ph.D. students of our department, all of them with some expertise
in modeling and exploitation of ontologies. They were asked to
search and evaluate ontologies with WebCORE in three different
tasks. For each task and each student, one of the following
problem domains was selected family, genetics and restaurant.
In the repository, there were six different ontologies related to
each of the above domains, and twelve ontologies describing other
no related knowledge areas. No information about the domains
and the existent ontologies was given to the students.</p>
      <p>Tasks 1 and 2 were performed first without the help of the
collaborative modules of the system, i.e., the term recommender
of the problem definition phase and the collaborative ranking of
the user evaluation phase. After all users finished the previous
ontology searches and evaluations, task 3 was done with the
collaborative components activated. For each task and each
student, we measured the time expended, and the number of
ontologies retrieved and selected (‘reused’). We also asked the
users about their satisfaction (in a 1-5 rating scale) about each of
the selected ontologies and the collaborative modules.
Tables 1 and 2 contain a summary of the obtained results. Note
that measures of task 1 are not shown. We have decided not to
consider them for evaluation purposes because we discern the first
task as a learning process of the use of the tool, and its time
executions and number of selected ontologies as skewed no
objective measures.</p>
      <p>To evaluate the enhancements in terms of efficiency and
effectiveness, we present in Table 1 the average number of reused
ontologies and the average execution times for task 2 and 3. The
results show a significant improvement when the collaborative
modules of the system were activated. In all the cases, the
students made use of the terms and evaluations suggested by
others, accelerating the processes of problem definition and
relevant ontology retrieval.
# reused
ontologies
execution
time
On the other hand, table 2 shows the average degrees of
satisfaction revealed by the users about the retrieved ontologies
and the collaborative modules. Again, the results evidence
positive applications of our approach.</p>
    </sec>
    <sec id="sec-7">
      <title>4. CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper, a web application for ontology evaluation and reuse
has been presented. The novel aspects of our proposal include the
use of WordNet to help users to define the Golden Standard; a
new ontology retrieval technique based on traditional Information
Retrieval models; rank fusion techniques to combine different
ontology evaluation measures; and two collaborative modules:
one that suggests the most popular terms for a given domain, and
one that recommends lists of ontologies with a multi-criteria
strategy that takes into account user opinions about ontology
features that can only be assessed by humans.</p>
    </sec>
    <sec id="sec-8">
      <title>5. ACKNOWLEDGMENTS</title>
      <p>This research was supported by the Spanish Ministry of Science
and Education (TIN2005-06885 and FPU program).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Adomavicius</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tuzhilin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Toward the Next Generation of Recommender Systems: A Survey of the Stateof-the-Art and Possible Extensions</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>17</volume>
          (
          <issue>6</issue>
          ):
          <fpage>734</fpage>
          -
          <lpage>749</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Fernández</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cantador</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Castells</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>CORE: A Tool for Collaborative Ontology Reuse and Evaluation</article-title>
          .
          <source>Proceedings of the 4th Int. Workshop on Evaluation of Ontologies for the Web (EON'06)</source>
          ,
          <source>at the 15th Int. World Wide Web Conference (WWW'06)</source>
          . Edinburgh, UK,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Maedche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Staab</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Measuring similarity between ontologies</article-title>
          .
          <source>Proceedings of the 13th European Conference on Knowledge Acquisition and Management (EKAW</source>
          <year>2002</year>
          ). Madrid, Spain,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>G. A.</given-names>
          </string-name>
          :
          <article-title>WordNet: A lexical database for English. New horizons in commercial and industrial Artificial Intelligence</article-title>
          .
          <source>Communications of the Association for Computing Machinery</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>McGill</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Introduction to Modern Information Retrieval</article-title>
          .
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          , New York,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Swoogle - Semantic Web</surname>
          </string-name>
          Search Engine.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>