=Paper=
{{Paper
|id=None
|storemode=property
|title=GrOnto: A Granular Ontology for Diversifying Search Results
|pdfUrl=https://ceur-ws.org/Vol-560/paper14.pdf
|volume=Vol-560
|dblpUrl=https://dblp.org/rec/conf/iir/CalegariP10
}}
==GrOnto: A Granular Ontology for Diversifying Search Results==
GrOnto: a GRanular ONTOlogy for
Diversifying Search Results
Silvia Calegari
Gabriella Pasi
University of Milano-Bicocca
V.le Sarca 336/14, 20126
Milano, Italy
{calegari,pasi}@disco.unimib.it
ABSTRACT taxonomy. The taxonomy adopted is the one provided by
Results diversification is an approach used in literature to the ODP 1 ontology. Furthermore, it is assumed that usage
cover the possible interpretations of the results produced by statistics have been collected on the distribution of user in-
query evaluation. For diversifying search results we propose tents over the categories ([6]). The aim of this approach is
the GrOnto model. This model is based on a normalized to minimize the risk of user dissatisfaction by computing a
granular view of an ontology: GrOnto allows to associate quality value for each document retrieved in response to a
each result with the suited topical granules in order to cat- query as a combination of relevance and diversity.
egorize it based on the granular information. In this paper a method for diversifying the results pro-
duced in response to a query is proposed. We do not use
a statistical approach in order to diversify the results, but
Categories and Subject Descriptors our method makes use of a semantic support offered by a
H.3.3 [Information Storage and Retrieval]: Information granular view of an ontology [2] to the aim of producing a
Search and Retrieval- information filtering, search process granular taxonomy of the results. By this method the infor-
mation is classified at different topical levels (from a general
topic to a specific topic).
1. INTRODUCTION In a granular ontology the concepts and instances are classi-
In last years, Web search engines have become the de-facto fied into granules. A granule is a chunk of knowledge made
access point to the information available on the Internet. of different objects “drawn together by indistinguishability,
Usually people specify their information needs by writing similarity, proximity or functionality”[12]. A level is just
queries with a limited number of terms (usually 2 − 3 terms the collection of granules of similar nature, and a granular
per query). However, short queries are very difficult to dis- information is a pyramidal information structure with dif-
ambiguate: in fact a term may have several interpretations. ferent levels of clarifications.
One of the problems related to term disambiguation is how The paper is organized as follows. In Section 2 an overview
to diversify results produced as an answer to an ambiguous of the use of ontologies in Information Retrieval is presented.
query. An interesting research topic that in recent years has In Section 3 the definition of a normalized granular view
attracted several researchers is results diversification. The of an ontology is reported. The approach proposed in this
focus is on how to produce a set of diversified results that work, named GrOnto, for diversifying search results is de-
cover the different possible interpretations of the query. The fined in Section 4. At the end, in Section 5 some conclusions
importance of result diversification has been recognized as and future works are stated.
a very important topic in Information Retrieval; the basic
idea is that “the relevance of a set of documents depends not 2. THE USE OF ONTOLOGIES IN INFOR-
only on the individual relevance of its members, but also on
how they relate to one another”[3]. The key aspect is that MATION RETRIEVAL
the relevance of a document has to consider also the seman- In the last decades ontologies have been used in differ-
tics expressed by the terms it contains.“The focus is on how ent areas of research in Computer Science, among which
to diversify search results making explicit use of knowledge Information Retrieval where they have been involved into
about the topics the query or the documents may refer to” several applications to different aims. For example, ontolo-
[1]. gies have been used: in distributed environments, for re-
In a recent research work, a taxonomy of information is ranking the results to better satisfy the user’s needs, to pro-
used to model the user’s request [1]. The idea is to assign vide conceptual indexing and to disambiguate user’s query.
both query and documents to one or more categories of the In distributed environment, significant works are SemreX [7]
and Semantic Link Network (SLN)[13]. SemreX is a recent
project that implements a multi-layer overlay network to
map semantically correlated documents to clustered groups
of neighbors. This semantic mapping is obtained by consid-
Appears in the Proceedings of the 1st Italian Information Retrieval ering the ACM Topic Ontology. In SLN, an ontology has
Workshop (IIR’10), January 27–28, 2010, Padova, Italy. 1
ODP: Open Directory Project, (http://dmoz.org)
http://ims.dei.unipd.it/websites/iir10/index.html
Copyright owned by the authors.
been built as a self-organized semantic data model by defin- vative ontology framework with a semantic expressiveness
ing semantic nodes, semantic links among nodes, and a set (i.e., instances and their properties) richer than the ODP
of relational reasoning rules; where each node identifies a ontology.
resource.
In order to re-rank the results obtained after a search on
the Web, generally, a user’s profile is used. In the litera-
3. GRANULAR VIEW OF AN ONTOLOGY
ture different strategies have been defined in order to build This proposed method is based on the concept of a gran-
a user’s profile by adopting the semantic support of an ontol- ular view (or granular perspective) of an ontology which
ogy. For example in [4] a user profile is built by considering has been defined in [2]. Given a domain ontology, the idea
past queries, and it is represented as a weighted graph by is to analyse the instances and their properties in order to
extracting the related terms from the ODP ontology. discover new semantic associations among them. These se-
In the conceptual indexing field of research, WordNet2 synsets mantic associations can be defined with the application of
are used as terms for the representation of the documents. a rough methodology. The objective is to re-organize the
The concept detection phase consists in extracting concepts ontology in a new taxonomy obtained after the analysis of
from documents that correspond to synsets in WordNet. In the properties values assigned to the instances.
[8] the authors proposed some procedures to identify the cor- The rough structure used is known as Information Table [10].
rect sense of a word. For a domain ontology, an Information Table is induced as
In this paper we are interested in the last field of research the structure:
where the problem of disambiguation of the query is taken hI, P, V al(I), F i
into account. Short queries are very difficult to disambiguate.
Two main problems may arise: word synonymy (i.e., two where I is the set of the instances, P is the set of the prop-
words with the same meaning), and word polysemy (i.e., erties, V al(I) is the set of all the values assumed by the
one word with multiple meanings). In the literature several properties P , and F is the function that assigns to a pair
strategies have been proposed in order to find a solution to (i, p) the value assumed by the instance i ∈ I on the prop-
this problem. Also ontologies have been involved in this field erty p ∈ P . Thus, we can say that two instances are similar
with the goal to provide a semantic support for reducing the if they have the same values only for some properties. For-
ambiguity of the query. A way is to analyse the structure mally, let D ⊆ P , then given two instances i1 , i2 ∈ I, i1 is
of the ontology to expand the terms written into the query similar to i2 with respect to D and ², with ² ∈ [0, 1], iff
with new meanings terms. The use of ontology reduces the
|{dj ∈ D : F (i1 , dj ) = F (i2 , dj )}|
possible (mis)interpretation of a query, but it needs to tune ≥² (1)
a query term to the right level in the hierarchy. Not only the |D|
IS-A relationship is used to discover the suited words [11], This relation says that two instances are similar if they have
but also other important relationships such as, synonymy, at least ²|D| properties with the same value. For example,
meronymy and hypernyms are taken into account. For ex- if we consider a Wine Ontology then a possible set of prop-
ample in [9] the relationships considered are: hyperonymy erties is P := {Location, Color, Sugar, F lavor, Body}. D is
and synset. For each term written in the query, a set of its a subset of P defined as D := {Sugar, F lavor, Body}. In
synsets in WordNet is identified. this case two instances belong to the same granule if they
have at least |(D − 1)| properties with the same value, i.e.
As reported in the Introduction of this paper, the results ² := |(D−1)| := 23 . For example, Longridge Merlot and Ma-
|D|
diversification is another strategy that can be adopted to
rietta Zinfandel belong to the same granule by having two
solve the problem of ambiguous queries. We are interested
properties with the same value, i.e. (f lavor == moderate)
in the situation where there is the necessity to individuate
and (sugar == dry).
the different interpretations of a user’s query. The focus
In [2] the instances are classified into granules at a differ-
is to produce a set of diversified results that cover at best
ent level of clarification. A key aspect is how to choose the
these interpretations. One of pioneers works on diversifica-
granular levels from the non-granular ontology. The idea is
tion is that of Carbonell and Goldstein [3]. In their work,
to cluster the instances into granules by considering their
the diversification is obtained through the use of two sim-
similarity, i.e. by analysing the values of their properties
ilarity functions: one for measuring the similarity of the
(see Equation 1).
documents, and the other one for measuring the similarity
The granular view of an ontology is defined by following 3
between each document and a query. In more recent works a
steps. In order to clarify the construction of the new ontol-
new approach has been explored to categorize both queries
ogy, we refer to a very simple example. In this example, let
and documents by the use of a taxonomy [1, 14]. In these
us consider a small Wine Ontology which has 4 instances,
papers the taxonomy adopted is the one of the ODP ontol-
and the set P of properties previously defined.
ogy. The taxonomy is set by the IS-A relationship among
First step: definition of the tabular version of the ontology.
categories; in fact in this context each concept of the ODP
In this table the rows are the instances and the columns are
ontology represents a specific category.
all the properties defined in the ontology. The selected in-
In our paper we propose a method to diversify search results
stances and properties are the ones defined only by the IS-A
with the adoption of a new granular view of an ontology.
relationships of the ontology domain. Table 1 reports the
Whereas in the previous works ([1, 14]) the taxonomy has
instances and the properties with their values of the small
been used only as a vocabulary for individuating the cate-
Wine Ontology analysed in this work.
gories for queries and documents, now we consider an inno-
Second step: It consists in the definition of the granular
levels. As previously stated the granular levels have been
2
http://wordnet.princeton.edu/ chosen by analysing the properties values of the instances.
Table 1: A tabular version for the small Wine Ontology
Instances Color Sugar Flavor Body Location
Longridge Merlot Red Dry Moderate Light U ndef ined
Marietta Zinfandel Red Dry Moderate Medium U ndef ined
Lane Tanner Pinot Noir Red Dry Delicate Light U ndef ined
Chateau-D-Ychem U ndef ined U ndef ined U ndef ined U ndef ined Bordeaux region
Granular Level Wine Ontology
The tabular representation is used as support for this step.
Thus, from the set of properties P two disjoint sets of gran- 0
ules are induced: D1 := {Color, F lavor, Body, Sugar} and
D2 := {Location}. Only Location belongs to the first level
with the instance Chateau−D −Y chem at the second gran- 1 Color = Red Location =
Bordeaux region
ular level. Whereas for D1 , the choice of the first granular
level has to be made among the properties that belong to
D1 . Also in this case we have to analyze the properties
values assumed by the set of instances, and we can observe 2 A
that the identification of the first granular level can be made
arbitrarily between Color and Sugar since they assume the Chateaux-D-
Ychen
same values for all their instances. For this ontology, without
loss of generality, we can consider Color at the first granu- 3 B
lar level, and for the next level the similarity relation (i.e.,
Equation 1) to the D1 set (without the property Color) can Marietta
be applied. In this illustrative example ² := 32 , that is, two Zinfandel
instances belong to the same granule if they have at least
two out of three properties with the same value. Figure 1 4
depicts the granular classification obtained where the circles Lonridge Lane Tanner
are the properties values and the squares are the instances. Merlot Pinot Noir
The third step is to solve the problem of redundancy of
Figure 2: The granular view of the small Wine On-
Granular Level Wine Ontology
tology after the application of the normalisation pro-
0 cess.
1 Color = Red Location =
Bordeaux region
4. THE PROPOSED MODEL
Flavor=Moderate Body=Light and When using a search engine a user formulates a query in
and (Body=Light or
Body=Medium) and
(Flavor=Delicate or
Flavor=Moderate)
order to retrieve the documents relevant to her/his informa-
Sugar=Dry and Sugar=Dry tion needs. In most cases the user writes short queries that
are difficult to disambiguate. In fact, in several user’s queries
2 A B a query term could be interpreted with different meanings.
Chateaux-D- We propose a solution to diversify search results that aims
Ychen
to increase the effectiveness of the system by reducing the
ambiguity in the interpretation of results. As proposed in
3 [1] we adopt a taxonomy of information where both queries
Marietta Lonridge Lane Tanner
and results may belong to more than one category. In par-
Zinfandel Merlot Pinot Noir ticular we use the taxonomy corresponding to a normalized
granular view of an ontology (see Section 3). The idea is to
Figure 1: A granular view of a small Wine Ontology associate each result with the suited topical granules.
after the application of the rough methodology. Generally, in search engines the evaluation of a user’s query
produces an ordered list of results. For diversifying search
the information. Let us consider two granules Gi and Gj at results the GrOnto model (see Figure 3) takes in input a
the same granular level, we have that Gi is redundant with ranked list of results, and the granular ontology to categorize
respect to Gj iff Gj ⊇ Gi . In [2] a normalisation process each result. In other words, the normalized granular view of
has been defined in order to obtain a normal form of the the ontology is used to apply a filtering on the search results.
granular perspective. For example, if we examine the same As reported in Section 1, in a granular ontology the granules
example of Figure 1, we can observe that GA and GB belong are organized at different levels of clarifications. Thus the
to the same granular level, and that GA ⊇ GB . Indeed, the categorization of each result is performed by locating in the
instances Lonridge Merlot and Lane Tanner Pinot Noir are ontology the right granules with which it may be associated.
completely included into GB but they belong to GA . In this Figure 4 shows the general structure of the approach where
normalisation process the granular subclass GB inherits all the list of results (left-hand side of Figure 4) is re-organized
the common instances from the granular superclass GA (see by the filtering strategy (right-hand side of Figure 4) based
Figure 2). on the granular ontology structure. By applying the catego-
query
T itle1 := ∅ := T itle∩O and Snippet1 := {Lane T anner P inot
N oir, Red} := Snippet ∩ O.
Step 2: “Association of each result Ri with granules of the
Search Engine granular tree”. The output of Step 1 is a set of terms of the
vocabulary O, named Resi , for each retrieved document Ri .
An element of Resi is a granule of the ontology, and to this
Normalized granular view
List of results
+ of an ontology granule we can associate the i − th result. Thus, for each
granule the following structure: < Resultsj , cardT OT j > is
defined, where Resultsj is the set of the search result associ-
GrOnto ated with the j−th granule, i.e. Resultsj := {Ri |granulej ∈
Model
Resi }, and cardT OT j is the cardinality of all the results asso-
ciated with the j −th granule. This means that cardT OT j :=
¡S ¢
Figure 3: A simple schema of the GrOnto model. |Resultsj ∪ n child=0 Resultschild | i.e., the cardinality of all
the results individuated with the granule j − th and the car-
dinality of the results associated with all its n sub-granules
rization process (explained here below), we obtain a repre- (children nodes).
sentation of the results which reflects the classification into By considering the same example of Step 1, we have that
topics corresponding to the granular levels of the adopted the first result R1 has been formally represented as Res1 :=
ontology. Each retrieved document is associated with one {Lane T anner P inot N oir, Red} so that, the selected gran-
ules are Lane Tanner Pinot Noir and Red. Figure 5 depicts
Categorization
Process the situation after the application of Step 2 where the struc-
1 Result
- granule 1
ture assigned with granule1 is < Results1 := {R1 }, 1 >,
2 Result
3 Result
- granule 3 whereas for granule8 is < Results8 := {R1 }, 1 >. Thus, we
- granule 5
4 Result
5 Result
- granule 6 have that the first result R1 has been categorized with two
- granule 2
…
…
- granule 4 topics (granules) at a different level of clarification.
…
100 Result
…
Granular Level Wine Ontology
0
List of results Normalized List of results 0
granular view by following
of an ontology the hierarchical < {R1}, 1 >
granulation
Figure 4: A Web search after the application of the 1 Color = Red 1 2 Location =
Bordeaux region
GrOnto model.
or more granules of the ontology by a procedure explained 2
here below. A 3 4
As an example, let us consider the same vocabulary and
Chateaux-D-
structure of the Wine Ontology described in Section 3. The Ychen
related set of concepts is O := {Red, Bordeaux region, Chateau−
B6
D − Y chen, M ariettaZinf andel, Lonridge M erlot, Lane 3
5
T anner P inot N oir}. During a search session a user is in-
Marietta
terested in finding, for instance, information about red wines Zinfandel
and she/he writes the following short query q:=“red wines < {R1}, 1 >
in France”, and a list of results is displayed. The associa- 4 7 8
tion of each result with granules of the granular ontology is
obtained in two steps. Here below the process undertaken Lonridge
Merlot
Lane Tanner
Pinot Noir
to categorize a search result is explained. We present these
two steps in order to categorize the first result, obviously
Figure 5: Example of the structure assigned to each
the same procedure is applied to the other search results.
granule identified with a result.
Step 1: “Formal representation of each result”. In order
to formally represent the content of a result Ri proposed in
response to a query, we assume that results are described
by T itle and Snippet. The i − th result Ri is then associ-
GrOnto on the Web.
Figure 6 depicts a prototype interface for the GrOntoS
ated with a set of terms, Resi , extracted from the textual
system. We have taken inspiration from Clusty 3 where the
information, i.e. Resi := T itlei ∪ Snippeti where T itlei and
web-page structure is split into three parts: 1) a text area
Snippeti are sets of terms included into the vocabulary of
where the user can formulate her/his request by using the
the granular ontology.
Yahoo! Search engine, 2) a profile used to visualize the por-
Thus, by analysing the first result R1 , we have: Title:=“Wines
tion of the normalized granular view of the ontology involved
of France-A guide to French wines” and Snippet:=“Discover
from the specific query, and 3) a web-page area devoted to
the wines of France, their varieties, history and regions;. . . Lane
the visualization of the results. In particular only the re-
Tanner Pinot Noir is a very famous red wine produced in. . . ”.
sults categorized with a granule of the ontology are displayed
From these two short texts, by considering the set O, we
3
obtain that Res1 := {Lane T anner P inot N oir, Red}, i.e. (http://clusty.com/)
one by one. Figure 6 reports a simple example where the [3] J. Carbonell and J. Goldstein. The use of mmr,
small Wine Ontology of Section 3 is used to classify ALL diversity-based reranking for reordering documents
the results obtained, for example, after the evaluation of the and producing summaries. In In Research and
q:=red wines in France. A user can use the portion of the Development in Information Retrieval, pages 335–336,
granular ontology in order to navigate the results by con- 1998.
sidering the categorization provided by the levels granular. [4] M. Daoud, L. Tamine-Lechani, M. Boughanem, and
In fact by clicking on an item of the portion of the granular B. Chebaro. A session based personalized search using
ontology, all its results will be visualised. Furthermore, each an ontological user profile. In S. Y. Shin and
item is enriched with the cardinality of the results associated S. Ossowski, editors, SAC, pages 1732–1736. ACM,
with its topic, in this way the user is directed towards the 2009.
category more numerous. [5] R. T. Fielding and R. N. Taylor. Principled design of
the modern web architecture. In ICSE ’00:
GrOnto Proceedings of the 22nd international conference on
Software engineering, pages 407–416, New York, NY,
USA, 2000. ACM.
Query
YAHOO! Search [6] A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal.
Using the wisdom of the crowds for keyword
- Red (40) generation. In WWW ’08: Proceeding of the 17th
- granule A (12) international conference on World Wide Web, pages
- Marietta Zinfandel (6)
61–70, New York, NY, USA, 2008. ACM.
- granule B (10) List of results (5)
- Longridge Merlot (3) for the granule [7] H. Jin and H. Chen. Semrex: Efficient search in a
- Lane Tanner Pinot Noir (5) “Lane Tanner Pinot Noir” semantic overlay for literature retrieval. Future
- Bordeaux Region (14) Generation Computer System, 24(6):475–488, 2008.
…
… [8] R. Mihalcea and D. I. Moldovan. Semantic indexing
using wordnet senses. In In Proceedings of ACL
Workshop on IR & NLP, pages 35–45, 2000.
Figure 6: The interface model of the GrOnto model. [9] R. Navigli and P. Velardi. An analysis of
ontology-based query expansion strategies. In
Workshop on Adaptive Text Extraction and Mining,
(Cavtat Dubrovnik, Croatia, Sept 23), 2003.
5. CONCLUSIONS [10] Z. Pawlak. Information systems - theoretical
In this paper we have studied the problem of diversifica- foundations. Information Systems, 6:205–218, 1981.
tion of search results to disambiguate the user’s query in a [11] E. M. Voorhees. Query expansion using
given domain of knowledge represented by a granular ontol- lexical-semantic relations. In SIGIR ’94: Proceedings
ogy. We have proposed a model, named Gronto, based on of the 17th annual international ACM SIGIR
a semantic support for associating search result with one or conference on Research and development in
more categories. A normalized granular view of an ontology information retrieval, pages 61–69, New York, NY,
is the semantic framework adopted in order to cover all the USA, 1994. Springer-Verlag New York, Inc.
possibles meanings of a result. Generally, after the evalua- [12] L. Zadeh. Is there a need for fuzzy logic? Information
tion of a user’s query an ordered list of results is obtained. Sciences, 178:2751–2779, 2008.
GrOnto takes in input this list and the granular ontology, [13] H. Zhuge. Communities and emerging semantics in
and thanks to the adoption of a filtering strategy a taxo- semantic link network: Discovery and learning. IEEE
nomic organization of the results is achieved. Transactions on Knowledge and Data Engineering,
We are implementing the GrOnto model through a simple 21(6):785–799, 2009.
web service by adopting the representational state transfer [14] C.-N. Ziegler, S. M. McNee, J. A. Konstan, and
(REST) paradigm [5]. G. Lausen. Improving recommendation lists through
The prosecution of this research activity will address the topic diversification. In WWW ’05: Proceedings of the
problem of applying the GrOnto approach to personalized 14th international conference on World Wide Web,
ontologies, where the user interests will be represented by pages 22–32, New York, NY, USA, 2005. ACM.
means of a granular ontology. To this aim we are also inves-
tigating the problem of defining personalized granular on-
tologies.
6. REFERENCES
[1] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong.
Diversifying search results. In WSDM ’09: Proceedings
of the Second ACM International Conference on Web
Search and Data Mining, pages 5–14, New York, NY,
USA, 2009. ACM.
[2] S. Calegari and D. Ciucci. Granular computing
applied to ontologies. International Journal of
Approximate Reasoning, 2009. In printing.