Improving Ontology Recommendation and Reuse in
            WebCORE by Collaborative Assessments
                                    Iván Cantador, Miriam Fernández, Pablo Castells
                                                   Escuela Politécnica Superior
                                                 Universidad Autónoma de Madrid
                                            Campus de Cantoblanco, 28049, Madrid, Spain
                            {ivan.cantador, miriam.fernandez, pablo.castells}@uam.es

ABSTRACT                                                                   achievements. Novel tools have been recently developed, such as
In this work, we present an extension of CORE [2], a tool for              ontology search engines [6] represent an important first step
Collaborative Ontology Reuse and Evaluation. The system receives           towards automatically assessing and retrieving ontologies which
an informal description of a specific semantic domain and                  satisfy user queries and requests. However, ontology reuse
determines which ontologies from a repository are the most                 demands additional efforts to address special needs and
appropriate to describe the given domain. For this task, the               requirements from ontology engineers and practitioners. It is
environment is divided into three modules. The first component             necessary to evaluate and measure specific ontology features,
receives the problem description as a set of terms, and allows the         such as lexical vocabulary, relations [3], restrictions, consistency,
user to refine and enlarge it using WordNet. The second module             correctness, etc., before making an adequate selection. Some of
applies multiple automatic criteria to evaluate the ontologies of the      these features can be measured automatically, but others require a
repository, and determines which ones fit best the problem                 human judgment to be assessed.
description. A ranked list of ontologies is returned for each criterion,   The Web 2.0 is arising as a new trend where people collaborate and
and the lists are combined by means of rank fusion techniques.             share their knowledge to successfully achieve their goals. Following
Finally, the third component uses manual user evaluations in order         this aspiration, the aim of this research is to enhance ontology
to incorporate a human, collaborative assessment of the ontologies.        retrieval and recommendation, combining automatic evaluation
                                                                           techniques with explicit users’ opinions and experiences. This work
Categories and Subject Descriptors                                         follows a previous approach for Collaborative Ontology Reuse and
                                                                           Evaluation over controlled repositories, named CORE [2]. The tool
H.3.3 [Information Storage and Retrieval]: Information Search
                                                                           has been enhanced and adapted to the Web. Novel technologies,
and Retrieval – information filtering, retrieval models, selection
                                                                           such as AJAX1, have been incorporated to the system for the design
process.
                                                                           and implementation of the user interface. It has also been improved
                                                                           to overcome previous limitations, such as handling large numbers of
General Terms                                                              ontologies. The collaborative capabilities have also been extended.
Algorithms, Measurement, Human Factors.
                                                                           2. SYSTEM ARCHITECTURE
Keywords                                                                   WebCORE is a web application for Collaborative Ontology Reuse
Ontology evaluation, ontology reuse, collaborative filtering.              and Evaluation. A user logins into the system via a web browser,
                                                                           and, thanks to AJAX technology and the Google Web Toolkit2,
                                                                           dynamically describes a problem domain, searches for ontologies
1. INTRODUCTION                                                            related to this domain, obtains relevant ontologies ranked by
The Web can be considered as a live entity that grows and evolves          several lexical, taxonomic and collaborative criteria, and evaluates
fast over time. The amount of content stored and shared on the             by himself those ontologies that he likes or dislikes most.
web is increasing quickly and continuously. The global body of
multimedia resources on the Internet is undergoing a significant           In this section, we describe the server-side architecture of
growth, reaching a presence comparable to that of traditional text         WebCORE. Figure 1 shows an overview of the system. We
contents. The consequences of this enlargement result in well              distinguish three different modules. The first one, the left module,
known difficulties and problems, such as finding and properly              receives the problem description (Golden Standard) as a full text
managing all the existing amount of sparse information.                    or as a set of initial terms, than can be extended by the user using
                                                                           WordNet [4]. The second one, represented in the centre of the
To overcome these limitations the so-called “Semantic Web”                 figure, allows the user to select a set of ontology evaluation
trend has emerged with the aim of helping machines to process              techniques to recover the ontologies closest to the given Golden
information, enabling browsers or other software agents to                 Standard. Finally, the third one, on the right of the figure, is a
automatically find, share and combine information in consistent            collaborative module that re-ranks the list of recovered ontologies,
ways. At the core of these new technologies, ontologies are                taking into consideration previous evaluations of the users.
envisioned as key elements to represent knowledge that can be
understood, used and shared among distributed applications and
machines. However, ontological knowledge mining and
development are difficult and costly tasks that require major              1
                                                                               Garrett, J. J. (2005). AJAX: A New Approach to Web
engineering efforts. In this context, ontology reuse becomes an                Applications. In http://www.adaptivepath.com/
essential need in order to exploit past and current efforts and            2
                                                                               Google Web Toolkit, http://code.google.com/webtoolkit/
                                                                           immediately updated: the final list of (root and expanded) terms that
                                                                           represent the domain of the problem is shown in the bottom of the
                                                                           figure. The user can also make term expansion using WordNet. He
                                                                           selects one of the terms from the Golden Standard definition and the
                                                                           system shows him all its meanings contained in WordNet (top of the
                                                                           figure). After he has chosen one of them, the system presents him
                                                                           three different lists with the synonyms, hyponyms and hypernyms
                                                                           of the term. The user can then selects one or more elements of these
                                                                           lists and add them to the expanded term list. For each expansion, the
                                                                           depth of the new term is increased by one unit.
                                                                           In the problem definition phase a collaborative component has
                                                                           been added to the system (right side of Figure 2). This component
                                                                           reads the term currently selected by the user, and searches for all
                                                                           the stored problem definitions that contain it. For each of these
                                                                           problem definitions, the rest of their terms and the number of
                                                                           problems in which they appear are retrieved and shown in the web
                                                                           browser. With this simple strategy the user is suggested the most
                                                                           popular terms, fact that could help him to better describe the
                  Figure 1. WebCORE architecture                           domain in which he is interested in.

2.1 Golden Standard Definition
The first phase of our ontology recommender system is the Golden
Standard definition. The user describes a domain of interest
specifying a set of relevant terms that will be searched through the
concepts (classes or instances) of the ontologies stored in the
system. These terms can automatically be obtained by the internal
Natural Language Processing (NLP) module, which uses a
repository of documents related to the specific domain in which the
user is interested in. This NLP module accesses to the repository of
documents, and returns a list of pairs (lexical entry, part of speech)
that roughly represents the domain of the problem. On the other
hand, the list of initial (root) terms can be manually specified. The
module also allows the user to expand the root terms using
WordNet [4] and some of the relations it provides: hypernym,
hyponym and synonym. The new terms added to the Golden
Standard using these relations might also be extended again, and
                                                                                    Figure 2. WebCORE problem definition phase
new terms can iteratively be added to the problem definition.
    The final representation of the Golden Standard is defined as a        2.2 Automatic Ontology Recommendation
set of terms T (LG, POS, LGP, R, Z) where:                                 Once the user has selected the most appropriate set of terms to
   • LG is the set of lexical entries defined for the Golden               describe the problem domain, the tool performs the processes of
     Standard.                                                             ontology retrieval and ranking. Our approach to ontology retrieval
                                                                           can be seen as an evolution of classic keyword-based retrieval
   • POS corresponds to the different Parts Of Speech considered
                                                                           techniques [5], where textual documents are replaced by
     by WordNet: noun, adjective, verb and adverb.
                                                                           ontologies.
   • LGP is the set of lexical entries of the Golden Standard that
                                                                           The queries supported by our model are expressed using the terms
     have been extended.
                                                                           selected during the Golden Standard definition phase. In classic
   • R is the set of relations between terms of the Golden Standard:       keyword-based vector-space models for information retrieval,
     synonym, hypernym, hyponym and root (if a term has not been           each query keyword is assigned a weight that represents the
     obtained by expansion, but is one of the initial terms).              importance of the concept in the information need expressed by
                                                                           the query. Analogously, in our system, the terms included in the
   • Z is an integer number that represents the depth or distance
                                                                           Golden Standard are weighted, using the depth measure to
     of a term to the root term from which it has been derived.
                                                                           indicate the relative interest of the user for each of the terms to be
Example: T1 = (“genetics”, NOUN, “”, ROOT, 0). T1 is one of the            explicitly mentioned in the ontologies.
root terms of the Golden Standard. The lexical entry that it
                                                                           To carry out the retrieval process, we focus on the lexical level,
represents is “genetics”, its part of speech is “noun”, it has not
                                                                           recovering those ontologies that contain a subset of the terms
been expanded from any other term so its lexical parent is the
                                                                           expressed by the user during the Golden Standard definition. To
empty string, its relation is “root”, and its depth is 0.
                                                                           compute the term matching, two different options are available
Figure 2 shows the interface of the Golden Standard Definition             within the tool: search for exact matches or search for matches
phase. In the left side of the screen, the current list of root terms is   based on the Levenshtein distance between two terms.
shown. The user can manually insert new root terms to this list            Furthermore, the tool also offers two different search spaces, the
giving their lexical entries and selecting their parts of speech.          ontologies and the corresponding knowledge bases.
Adding new terms, the final Golden Standard definition is
Figure 3 shows the system recommendation interface. At the left          Each component oij contains specific information about the
side the user can select the matching methodology (fuzzy or              similarity between the ontology and the corresponding term ti. To
exact), the search spaces (ontology entities and knowledge base          compute the final similarity between the query vector q and the
entities), and the weight or importance given to each of the             ontology vector oj, the vectorial model calculates the cosine
previously selected search spaces. In the right part the user can        measure between both vectors. However, if we follow the
visualize the ontology and navigate across it. Finally, the middle       traditional model, we will only be considering the difference
of the interface presents the list of ontologies selected for the user   between the query and the ontology vectors according to the angle
to be evaluated during the collaborative evaluation phase.               they form, but not taking into account their dimensions. To
                                                                         overcome this limitation, the cosine measure has been replaced by
                                                                         the simple dot product. Hence, the similarity measure between an
                                                                         ontology oj and the query q is simply compute as follows:
                                                                                                   sim ( q, o j ) =q ⋅ o j
                                                                         If the knowledge in the ontology is incomplete, the ontology
                                                                         ranking algorithm performs very poorly. Queries will return less
                                                                         results than expected, the relevant ontologies will not be retrieved,
                                                                         or will get a much lower similarity value than it should. For
                                                                         instance, if there are ontologies about “restaurants”, and “dishes”
                                                                         are expressed as instances in the corresponding Knowledge Base
                                                                         (KB), a user searching for ontologies in this domain may be also
                                                                         interested in the instances and literals contained in the KB. To
                                                                         cope with this issue, our ranking model combines the similarity
                                                                         obtained from the terms that belong to the ontology with the
                                                                         similarity obtained from the terms that belong to the KB using the
      Figure 3. WebCORE system recommendation phase                      adaptation of the vector space model explained before. The user
                                                                         can select a value vi ∈ [1, 5] for each kind of search, and this
Let T be the set of all terms defined in the Golden Standard
definition phase. Let di be the depth measure associate with each        value is then mapped to a corresponding value si = vi . Following
                                                                                                                                 5
term ti ∈ T. Let q be query vector extracted from the Golden             this idea, the final score is computed as:
Standard definition, and let wi be the weight associated to each of
these terms, where for each ti ∈ T, wi ∈ [0,1]. Then, the weight wi                         sO × sim(q, o) + s kb × sim(q, kb)
is calculated as:
                                         1
                                                                         2.3 Collaborative Ontology Evaluation
                               wi =                                      The third and last phase of the system is compound of a novel
                                       di + 1
                                                                         ontology recommendation algorithm that exploits the advantages
This measure gives more relevance to the terms explicitly expressed      of Collaborative Filtering [1], exploring the manual evaluations
by the user, and less importance to those ones extended or derived       stored in the system to rank the set of ontologies that best fulfils
from previously selected terms. An interesting future work could be      the user’s interests.
to enhance and refine the query, e.g. based on terms popularity, or      In WebCORE, user evaluations are represented as a set of five
other more complex strategies as terms frequency analysis.               different criteria and their respective values, manually determined
The search engine computes a semantic similarity value between           by the users who made the evaluations: correctness, readability,
the query and each ontology as follows. We represent each                flexibility, level of formality and type of model.
ontology with a vector oj ∈ O, where oji is the mean of the term ti      The above criteria can have discrete numeric or non-numeric
similarities with all the matched entities in the ontology if any        values. The user’s interests are expressed like a subset of these
matching exists, and zero otherwise. The components oji are              criteria, and their respective values, meaning thresholds or
calculated as:                                                           restrictions to be satisfied by user evaluations. Thus, a numeric
                                       ∑ w(m )
                                       M ji
                                                ji
                                                                         criterion will be satisfied if an evaluation value is equal or greater
                                                                         than that expressed by its interest threshold, while a non-numeric
                         o ji = M ji
                                        ∑ w(m ) i
                                                                         criterion will be satisfied only when the evaluation is exactly the
                                        Mi                               given threshold (i.e. in a Boolean or yes/no manner).
where Mji is the set of matches of the term ti in the ontology           According to both types of user evaluation and interest criteria,
oj, w(mji) represents the similarities between the term ti and the       numeric and Boolean, the recommendation algorithm will
entities of the ontology oj that matches with it, Mi is the set of       measure the degree in which each user restriction is satisfied by
matches of the term ti within all the ontologies and w(mi)               the evaluations, and will recommend a ranked ontology list
represents the weights of each of these matches.                         according to similarity measures between the thresholds and the
For example, if we define in the Golden Standard a term “acid”,          collaborative evaluations.
this term may return several matches in the same ontology with           Figure 4 shows all the previous definitions and ideas, locating
different entities as: “acid”, “amino acid”, etc. In order to            them in the graphical interface of the system. On the left side of
establish the appropriate weight in the ontology vector, oij, the        the screen, the user introduces the thresholds for the
goal is to compute the number of matches of one term in the              recommendations and obtains the final collaborative ontology
whole repository of ontologies and give more relevance to those          ranking. On the right side, the user adds new evaluations for the
ontologies that have matched that specific term more times.              ontologies and checks evaluations given by the rest of the users.
                                                                       Table 1. Average number of reused ontologies and execution times (in
                                                                                           minutes) for tasks 2 and 3
                                                                                            Task 2            Task 3
                                                                                           (without            (with              %
                                                                                         collaborative     collaborative     improvement
                                                                                           modules)          modules)
                                                                          # reused
                                                                                             3.45              4.35              26.08
                                                                         ontologies
                                                                         execution
                                                                                             9.3                7.1               23.8
                                                                            time

                                                                       On the other hand, table 2 shows the average degrees of
                                                                       satisfaction revealed by the users about the retrieved ontologies
                                                                       and the collaborative modules. Again, the results evidence
                                                                       positive applications of our approach.

                                                                       Table 2. Average satisfactions values (1-5 rating scale) for ontologies
         Figure 4. WebCORE user evaluation phase                       reused in tasks 2 and 3, collaborative recommendations and rankings
                                                                                  Task          %             Initial term    Final ontology
                                                                        Task 2
3. EXPERIMENTS                                                                      3      improvement     recommendation        ranking
In this section, we present some early experiments that attempt to       3.34     3.56         6.58               4.7               4.4
measure: a) the gain of efficiency and effectiveness, and the b)
increment of users’ satisfaction obtained with the use of our          4. CONCLUSIONS AND FUTURE WORK
system when searching ontologies within a specific domain.             In this paper, a web application for ontology evaluation and reuse
The scenario of the experiments was the following. A repository        has been presented. The novel aspects of our proposal include the
of thirty ontologies was considered and eighteen subjects              use of WordNet to help users to define the Golden Standard; a
participated in the evaluations. They were Computer Science            new ontology retrieval technique based on traditional Information
Ph.D. students of our department, all of them with some expertise      Retrieval models; rank fusion techniques to combine different
in modeling and exploitation of ontologies. They were asked to         ontology evaluation measures; and two collaborative modules:
search and evaluate ontologies with WebCORE in three different         one that suggests the most popular terms for a given domain, and
tasks. For each task and each student, one of the following            one that recommends lists of ontologies with a multi-criteria
problem domains was selected family, genetics and restaurant.          strategy that takes into account user opinions about ontology
                                                                       features that can only be assessed by humans.
In the repository, there were six different ontologies related to
each of the above domains, and twelve ontologies describing other      5. ACKNOWLEDGMENTS
no related knowledge areas. No information about the domains           This research was supported by the Spanish Ministry of Science
and the existent ontologies was given to the students.                 and Education (TIN2005-06885 and FPU program).
Tasks 1 and 2 were performed first without the help of the
collaborative modules of the system, i.e., the term recommender
                                                                       6. REFERENCES
of the problem definition phase and the collaborative ranking of       [1] Adomavicius, G., and Tuzhilin, A.: Toward the Next
the user evaluation phase. After all users finished the previous           Generation of Recommender Systems: A Survey of the State-
ontology searches and evaluations, task 3 was done with the                of-the-Art and Possible Extensions. IEEE Transactions on
collaborative components activated. For each task and each                 Knowledge and Data Engineering 17(6): 734-749, 2005.
student, we measured the time expended, and the number of              [2] Fernández, M., Cantador, I., and Castells, P. CORE: A Tool
ontologies retrieved and selected (‘reused’). We also asked the            for Collaborative Ontology Reuse and Evaluation.
users about their satisfaction (in a 1-5 rating scale) about each of       Proceedings of the 4th Int. Workshop on Evaluation of
the selected ontologies and the collaborative modules.                     Ontologies for the Web (EON’06), at the 15th Int. World
Tables 1 and 2 contain a summary of the obtained results. Note             Wide Web Conference (WWW’06). Edinburgh, UK, 2006.
that measures of task 1 are not shown. We have decided not to          [3] Maedche, A., and Staab, S.: Measuring similarity between
consider them for evaluation purposes because we discern the first         ontologies. Proceedings of the 13th European Conference on
task as a learning process of the use of the tool, and its time            Knowledge Acquisition and Management (EKAW 2002).
executions and number of selected ontologies as skewed no                  Madrid, Spain, 2002.
objective measures.
                                                                       [4] Miller, G. A.: WordNet: A lexical database for English. New
To evaluate the enhancements in terms of efficiency and                    horizons in commercial and industrial Artificial Intelligence.
effectiveness, we present in Table 1 the average number of reused          Communications of the Association for Computing
ontologies and the average execution times for task 2 and 3. The           Machinery, 38(11): 39-41, 1995.
results show a significant improvement when the collaborative
                                                                       [5] Salton, G., and McGill, M.: Introduction to Modern
modules of the system were activated. In all the cases, the
                                                                           Information Retrieval. McGraw-Hill, New York, 1983.
students made use of the terms and evaluations suggested by
others, accelerating the processes of problem definition and           [6] Swoogle - Semantic Web Search Engine.
relevant ontology retrieval.                                               http://swoogle.umbc.edu