Semantic and bayesian profiling services for textual
                resource retrieval
             Eufemia Tinelli                          Pierpaolo Basile                           Eugenio Di Sciascio
    Department of Computer Science           Department of Computer Science           Dipartimento di Elettrotecnica ed Elettronica
           University of Bari                       University of Bari                            Politecnico di Bari
           Italy, Bari 70126                        Italy, Bari 70126                              Italy, Bari 70126
       Email: tinelli@di.uniba.it              Email: basilepp@di.uniba.it                    Email: disciascio@poliba.it

      Giovanni Semeraro
Dipartment of Computer Science
       University of Bari
        Italy, Bari 70126
  Email: semeraro@di.uniba.it


   Abstract— This paper presents an integrated approach to tex-         The rest of the paper is structured as follows: the next
tual resource retrieval, which combines logical inference services   section outlines the work which mainly inspired this paper. In
with user profiles, in which a structured representation of the      Section III a brief summary of semantic-based matchmaking
user interests is maintained. Learning is performed on documents
which have been disambiguated by exploiting the WordNet lexical      in Description Logics is presented together with a Naı̈ve Bayes
database, in an attempt to discover concepts describing user         method for user profiling. The description of the framework
interests. The proposed approach relies on several additional fea-   architecture, a domain reference ontology, together with an
tures compared to classical lexical knowledge systems, including:    example of query example satisfying user needs, is presented
structured user recommendation, numeric value management,            in Section IV, while conclusions close the paper.
definition of strict and negotiable constraints and keywords to
retrieve potential interesting resources w.r.t. both user request                         II. R ELATED W ORK
and profile.
                                                                        Recent years have witnessed a growing interest towards
                      I. I NTRODUCTION                               profiling based resource retrieval. Among the most relevant
   The main goal of this paper is to propose a strategy              systems adopting a bayesian classifier we cite LIBRA [22]
to design advanced semantic search engines based on the              which produces content-based book recommendations by ex-
idea of combining semantic matchmaking with Bayesian text            ploiting product descriptions obtained from Amazon.com Web
categorization. By means of formal ontologies, modeled using         pages. Documents are represented by using keywords and are
OWL [24], the knowledge on a specific domain is modeled and          subdivided into slots, each one corresponding to a specific
exploited in order to make explicit the implicit knowledge, and      section of the document as authors, title, abstract. SiteIF [20]
to reason on it by means of the formal semantics expressed in        exploits a sense-based representation to build a user profile
OWL. On the other hand, a content-based recommender, which           as a semantic network whose nodes represent senses of the
is able to learn user profiles from disambiguated documents, is      words in documents requested by the user. OntoSeek [14]
used for customized search. The recommender exploits lexical         explores the role of linguistic ontologies in knowledge-based
knowledge in the linguistic ontology WordNet [26].                   retrieval systems. AMAYA [27] delivers context-aware recom-
   The success of a retrieval system strongly relies also on         mendations, which are based on provided feedback, context
query formulation and ranking functions. Especially for an           data, and an ontology-based content categorization scheme.
ontology-based system, the query language has to be very             Former system deals with user profile and on this basis can
simple for the end user but, at the same time, its expressiveness    provide a prediction/recommendation about interesting items
must be able to capture the real user needs and to retrieve only     for the end user but w.r.t ITR [11] uses content-based filtering
what the user is really looking for. In this paper we present a      algorithms.
system able both to help the user during the query formulation          Our reference system is ITem Recommender (ITR) whose
process via an intensional navigation of the ontology, and to        strategy is to shift from a keyword-based document repre-
return relevant resources via a ranking function exploiting both     sentation to a sense-based one in order to integrate lexical
the ontology-related semantics of the query and the user profile     knowledge in the indexing step of training documents. Several
managed by the content-based recommender.                            methods have been proposed to accomplish this task. Inn [21]
   Hence, the system suggests interesting items to user by           is proposed to include WordNet information at the feature level
taking into account three elements: user profiles, semantic item     by expanding each word in the training set with all the syn-
descriptions and lexical item descriptions.                          onyms for it, including those available for each sense, in order
to avoid a word sense disambiguation (WSD) process [28].            in formulating a query even in the case of ignorance of the
This approach has shown a decrease of effectiveness in the          vocabulary of the underlying information system.
obtained classifier, mostly due to the word ambiguity problem.         We do not present here related work on semantic match-
In [28] is pointed out that some kind of disambiguation is          making (the interested reader is referred to [9]) but only a
required in any case. Subsequent works [3], [32] show that          framework of semantic-enabled e-marketplaces aimed at fully
embedding WSD in document classification tasks can improve          exploiting semantics of supply/demand descriptions in B2C
classification accuracy.                                            and C2C e-marketplaces [13]. Main features of this framework
   Besides, for improving search and visualization various          are the followings: full exploitation of non-standard inferences
example-based search tools have been developed, such as             for explanation services in the query-retrieval-refinement loop;
SmartClient [33]. SmartClient uses constraint satisfaction tech-    semantic-based ranking in the request answering; fully graph-
niques, allows to refine (critique) preference values specified     ical and usable interface, which requires no prior knowledge
in the first step of the search and supports trade-off analysis     of any logic principles, though fully exploiting it in the back-
among different attributes, e.g., looking for an apartment a        office.
user can make a compromise between distance and rent (more
distant less expensive). Also in [19] a candidate/critiques
model has been presented, which allows users to refine                        III. BASIC SERVICES AND ALGORITHMS
candidate solutions proposed. Here, preferences are elicited
incrementally by analyzing critiques through subsequent iter-
                                                                       A close relation exists between OWL and Description
ations. It is an Automated Travel Assistant (ATA) for planning
                                                                    Logics. In fact, the formal semantics of OWL DL sub-language
airline travels, and similarly to SmartClient, ATA exploits CSP
                                                                    is grounded in the Description Logics theoretical studies. We
techniques: preferences are described using soft constraints
                                                                    assume the reader be familiar with the basics of Description
defined on the values of attributes. AptDecision [31] is a
                                                                    Logics and with two standard inference services provided by
tool supporting elicitation of preferences in the real estate
                                                                    a DL reasoner: Subsumption and Satisfiablity [2].
domain: browsing the domain, users can discover new features
of interest and through their refinement of apartment features,        Given a query Q and an item to be retrieved I the following
agents can build a profile of their preferences using learning      match classes can be identified with respect to an ontology
techniques. FindMe [6] uses case-based reasoning as a way           T (see [12], [18], [25]).
of recommending products in e-commerce catalogs. FindMe,              • exact - T |= Q ≡ I. I is semantically equivalent to Q.
and its enhanced version The Wasabi Personal Shopper [4],               All the characteristics expressed in Q are presented in I
combines instance-based browsing and tweaking by difference.            and I does not expose any additional characteristic with
Different FindMe-like systems have been developed, in various           respect to Q.
domains. Among systems based on FindMe the most renowned              • full - T |= I v Q. I is more specific than Q. All the
is Entrée [5], a restaurant recommender, which allows users            characteristics expressed in Q are provided by I and I
to refine a query on the basis of the results displayed, so it is       exposes also other characteristics both not required by Q
possible to choose a restaurant less expensive or closer than           and not in conflict with the ones in Q.
the restaurant shown after the first query.                           • plug-in - T |= Q v I. Q is more specific than I. All the
   Recently, there has been a growing interest toward systems           characteristics expressed in I are provided by Q and Q
supporting semantics exploitation, in different domains. In             requires also other characteristics both not exposed by I
[15] an application is presented, improving traditional web             and not in conflict with the ones in I.
searching using semantic web technologies: two Semantic               • potential - T 6|= I u Q v ⊥. Q is compatible with I.
Search applications are presented, running on an application            Nothing in Q is logically in conflict with anything in I.
framework called TAP, which provides a set of simple mech-            • partial - T |= I u Q v ⊥. Q is not compatible with
anisms for sites to publish data onto the Semantic Web and              I. Something in Q is logically in conflict with some
for applications to consume these data via a query interface            characteristic in I.
called GetData. The results provided by the system are then
compared with traditional text search results of Google.it          With respect to the above classification, in case of potential
Web pages. Story Fountain [23] is an ontology-based tool,           match a similarity measure is needed to understand “how
which provides a guided exploration of digital stories using        potentially” I satisfies Q.
a reasoning engine for the selection and organization of               The semantic similarity between a query and an item to
resources. Story Fountain provides support for six different        be retrieved can be computed with the aid of the algorithm
exploration facilities to aid users engaged in exploration pro-     rankPotential [12]. Starting from the unfolded version (i.e.,
cess. The system is being used by the tour guides at Bletchley      normalized with respect to the reference ontology) of both
Park. The approach has been further investigated in [8]. An         the query and the item description, the algorithm is able to
intelligent query interface exploiting an ontology-based search     quantify how many information requested in the query are
engine is presented in [7]; the system enables access to data       missing in the item description. In order to understand the
sources through an integrated ontology and supports a user          approach we consider the following trivial example where the
ontology is just a simple taxonomy1 .                                           semantic similarity measure adopted is the Leacock-Chodorow
                                                                               measure [17]. Similarity between synsets a and b is inversely
                          B v A
                             C v BuE                                            proportional to the distance between them in the WordNet is-a
                 T =
                         
                             D ≡ AuF                                            hierarchy, measured by the number of hops in the shortest path
                                                                                from a to b. The algorithm starts by defining the context C of
With respect to the previous T consider the query Q = C u D                     w as the set of words in the same slot of w having the same
and the item I = E u B u G. Referring to the above                              POS as w, then it identifies both the sense inventory Xw for w
classification we see that Q and I are a potential match.                       and the sense inventory Xj for each word wj in C. The sense
Unfolding T in both Q and I we obtain Q = C uB uAuE uF                          inventory T for the whole context C is given by the union
and I = E u B u A u G. Since in the ontology the third one                      of all Xj . After this step, we measure the similarity of each
is an equivalence axiom, we rewrite D with A u F instead                        candidate sense si ∈ Xw to that of each sense sh ∈ T and then
of expanding it as for B and C. Once we have the unfolded                       the sense assigned to w is the one with the highest similarity
version of Q and I we say that two pieces of information                        score. Each document is mapped into a list of WordNet synsets
{C,F } are missing in I in order to completely satisfy Q                        following three steps.
(and then reach a full match). Since the maximum number                            1) each monosemous word w in a slot of d is mapped into
of missing pieces of information is equal to the length of the                         the corresponding WordNet synset;
unfolded Q, in this case five, we assign a normalized semantic                     2) for each pair of words hnoun,nouni or hadjective,nouni,
similarity score of (1 − 25 ) to the previous example match.                           a search in WordNet is made to verify if at least one
Then in the most general case, given an ontology T and two                             synset exists for the bigram hw1 , w2 i. In the positive
concept Q and I, the semantic similarity score is computed by                          case, the algorithm is applied on the bigram, otherwise
the following formula:                                                                 it is applied separately on w1 and w2 ; in both cases
                                 rankP otential(I, Q)                                  all words in the slot are used as the context C of the
                rank = 1 −                                               (1)           word(s) to be disambiguated;
                                rankP otential(>, Q)
                                                                                   3) each polysemous unigram w is disambiguated by the
where > is the most generic concept in every DLs ontology.
                                                                                       algorithm, using all words in the slot as the context C
Obviously the previous score is equal to 1 only in the case of
                                                                                       of w.
full match.
                                                                                A new version of the WSD algorithm has been recently
   On the other hand, we consider the problem of learning
                                                                                produced [30].
user profiles as a binary text categorization task [29]. Each
                                                                                   The WSD procedure is used to obtain a synset-based vector
document has to be classified as interesting or not with respect
                                                                                space representation that we called Bag-Of-Synsets (BOS). In
to the user preferences. Therefore, the set of categories is C =
                                                                                this model, a synset vector corresponds to a document, instead
{c+ , c− }, where c+ is the positive class (user-likes) and c−
                                                                                of a word vector. Each document is represented by a set of
the negative one (user-dislikes). We present a method able to
                                                                                slots. Each slot is a textual field corresponding to a specific
learn sense-based profiles by exploiting an indexing procedure
                                                                                feature of the document, in an attempt to take into account
based on WordNet.
                                                                                also the structure of documents.
   We extend the classical BOW model [29] to a model in
                                                                                   Formally, assume that we have a collection of N documents,
which the senses (meanings) corresponding to the words in
                                                                                each document being subdivided into M slots. Let m be the
the documents are considered as features. The goal of the
                                                                                index of the slot, n = 1, 2, ..., N , the n-th document is reduced
WSD algorithm is to associate the appropriate sense s to a
                                                                                to 3 bags of synsets, one for each slot:
word w in document d, by exploiting its context C (a set of
words that precede and follow w). The sense s is selected                                         dm     m m                 m
                                                                                                   n = htn1 , tn2 , . . . , tnDnm i
from a predefined set of possibilities, usually known as sense
                                                                                where tmnk is the k-th synset in slot sm of document dn and
inventory, that in our algorithm is obtained from WordNet
                                                                                Dnm is the total number of synsets appearing in the m-th slot
[26]. The basic building block for WordNet is the SYNSET
                                                                                of document dn . For all n, k and m, tm   nk ∈ Vm , which is
(SYNonym SET), a set of words with synonymous meanings
                                                                                the vocabulary for the slot sm (the set of all different synsets
which represents a specific sense of a word. The text in
                                                                                found in slot sm ). Document dn is finally represented in the
d is processed by two basic phases: the document is first
                                                                                vector space by M synset-frequency vectors:
tokenized and then, after removing stopwords, part of speech
(POS) ambiguities are solved for each token. Reduction to                                       fnm = hwn1
                                                                                                        m     m
                                                                                                           , wn2            m
                                                                                                                 , . . . , wnD nm
                                                                                                                                  i
lemmas is performed and then synset identification with WSD                     where wnk  m
                                                                                             is the weight of the synset tm
                                                                                                                          nk in the slot sm
is performed: w is disambiguated by determining the degree of                   of document dn and can be computed in different ways:
semantic similarity among candidate synsets for w and those                     It can be simply the number of times synset tk appears in
of each word in C. The proper synset assigned to w is that with                 slot sm or a more complex TF - IDF score. Our hypothesis is
the highest similarity with respect to its context of use. The                  that the proposed indexing procedure helps to obtain profiles
   1 For the sake of simplicity in this example we do not consider roles even   able to recommend documents semantically closer to the user
if rankPotential is able to deal with them for ALN ontologies.                  interests.
   As a strategy to learn user profiles on BOS-indexed doc-
                                                                                                                               |T R|
uments, ITem Recommender (ITR) uses a Naı̈ve Bayes text                                                                        X
categorization algorithm to build profiles as binary classifiers                                         N (tk , cj , sm ) =           wji nkim         (7)
(user-likes vs user-dislikes). The induced probabilistic model                                                                 i=1

estimates the a posteriori probability, P (cj |di ), of document                      In (7), nkim is the number of occurrences of token tk in
di belonging to class cj as follows:                                                  slot sm of document di . The sum of all N (tk , cj , sm ) in the
                                            Y                                         denominator of equation (6) denotes the total weighted length
             P (cj |di ) = P (cj )                  P (tk |cj )N (di ,tk )      (2)   of the slot sm in class cj . In other words, P̂ (tk |cj , sm ) is
                                        w∈di                                          estimated as the ratio between the weighted occurrences of tk
                                                                                      in slot sm of class cj and the total weighted length of the slot.
where N (di , tk ) is the number of times token tk occurs in                          The final outcome of the learning process is a probabilistic
document di . In ITR, each document is encoded as a vector                            model used to classify a new document in the class c+ or c− .
of BOS, one for each slot. Therefore, equation (2) becomes:                           This model is the user profile, which includes those tokens
                                       |S| |bim |                                     that turn out to be most indicative of the user preferences,
                           P (cj ) Y Y                                                according to the value of the conditional probabilities in (6).
          P (cj |di ) =                P (tk |cj , sm )nkim                     (3)
                           P (di ) m=1
                                                k=1
                                                                                              IV. SYSTEM FEATURES AND ARCHITECTURE
where S= {s1 , s2 , . . . , s|S| } is the set of slots, bim is the
BOS in the slot sm of di , nkim is the number of occurrences                            Based on the previous techniques we built a system (see 1)
of token tk in bim . Training is performed on BOS-represented                         enabling users to perform:
documents, thus tokens are WordNet synsets, and the induced                             • semantic searching by selecting ontology classes and
model relies on synset frequencies. To calculate (3), the system                          properties;
has to estimate P (cj ) and P (tk |cj , sm ) in the training phase.                     • personalized searching based on user profiles and item
The documents used to train the system are rated on a discrete                            information;
scale from 1 to MAX, where MAX is the maximum rating that                               • semantic-personalized searching obtained by combining
can be assigned to a document. According to an idea proposed                              the two types of searching;
in [22], each training document di is labeled with two scores, a
                     i                                  i
“user-likes” score w+   and a “user-dislikes” score w−    , obtained
from the original rating r:
               i            r−1                        i        i
              w+ =                 ;                  w− = 1 − w+               (4)
                          M AX − 1
   The scores in (4) are exploited for weighting the occurrences
of tokens in the documents and to estimate their probabilities
from the training set T R. The prior probabilities of the classes
are computed according to the following equation:

                                         |T R|
                                                    wji + 1
                                          P
                                            i=1
                           P̂ (cj ) =                                           (5)
                                            |T R| + 2
Witten-Bell smoothing [34] is adopted to compute
P (tk |cj , sm ), by taking into account that documents are
structured into slots and that token occurrences are weighted
using scores in equation (4):                                                                Fig. 1.   Ontology-based recommender system architecture


                               P                                                      In our approach, an item is represented by means of the
                      
                            N (tk ,cj ,sm )
                       Vcj + i N (ti ,cj ,sm ) if N (tk , cj , sm ) 6= 0
                      
                                                                                     following data:
P̂ (tk |cj , sm ) =
                      
                      
                      
                          Vcj +
                               P VN (t ,c ,s ) V −V1
                                   i
                                       cj

                                            i   j    m         cj
                                                                    otherwise
                                                                                         • item information - defined by two sets of information.
                                                                                           The first set is composed of intrinsic information. For
                                                                (6)                        example, in a bibliographic research scenario, intrinsic
where N (tk , cj , sm ) is the count of the weighted occurrences                           information could be: item identification number, authors,
of token tk in the slot sm in the training data for class cj , Vcj                         title, abstract, slot types. The second set is composed of
is the total number of unique tokens in class cj , and V is the                            information produced by the classifier during the training
total number of unique tokens across all classes. N (tk , cj , sm )                        step: the bag of synsets obtained by the WSD process on
is computed as follows:                                                                    each slot;
  •   user profile - learned by the ITR system, as described in        •  all full constraints - if user sets all request features as
      the previous section;                                               full ones then returned items have to express explicitly
   • item semantic description - an OWL description of the                at least all requested features. Of course this does not
      reference ontology.                                                 prevent the item description to include also not requested
All the above information is stored in a repository, while the            features. In this case the module evaluates a full match;
adopted reference vocabulary to define semantic profile is the          • mixed - in this case the module first compute a full

WordNet lexical database. Recommender system architecture                 matches considering a temporary request composed of
is composed of several modules and each one has a specific                full features only. The set of items returned in the previous
role and instantiates a part of the repository. The Interface             step could contain also features in logical conflict with
Module allows to define semantic item description and user                someone in the potential request, so the module runs a
request. It provides a GUI to browse the hierarchy of concepts            potential match with the potential part of the request to
and to outline properties of the selected class. This feature             discard these results.
of the GUI supports a user to define requests as descriptions        Results returned by Profile Engine are defined by the
which are logically consistent w.r.t. the reference ontology.        pair hitem identifier, relevance ratei whilst the ones
The user request is split into two main parts:                       returned by Match Engine are defined by the pair
   • full : in the full part of the request, the user sets the       hitem identifier, rank valuei according to equations (1)
      constraints she wants to be satisfied by (in full match        and (3).
      with) the retrieved items.                                        The Recommender Manager allows the communication and
   • potential : here the user sets her preferences, i.e., her       synchronization between Profile and Match Engine. This mod-
      wishful options. The ontology-based score is computed          ule is designed to deal with two issues: results ordering and
      measuring “how many” of these constraints are satisfied        synchronization. In web-based retrieval systems, ranked lists
      by a retrieved item.                                           are surely preferable to unordered sets of items. Nevertheless
                                                                     they become less effective and usable as the complexity of the
We stress the fact that in order to fulfill user needs the
                                                                     item description increases, together with possible user pref-
research process could be an iterative one. On the basis of
                                                                     erences. At least two implementations of the Recommender
returned results, a user is able to refine the previous request
                                                                     Engine are possible:
exchanging a full constraint for a potential one and vice versa.
Our recommender system considers user query features as full            1) Recommender Manager operates in two steps. In the first
constraints by default and as potential only whenever explicitly           step, this module receives user requests. They are OWL
stated by the user [10].                                                   description translated in DIG; then it activates Profile
   The Profile Engine Module prepares the item information                 and Match Engine. In the second step, Recommender
by performing WSD on the textual descriptions of the items to              Engine fetches user request results by two previous
be recommended. Mainly built on the ITR system, it performs                modules, then it computes a unique score adding match
the training step on the disambiguated text in order to infer the          rank and classifier relevance rate value. The new ranked
user profiles which will be exploited in the recommendation                results will be sent to the GUI;
process. Each inferred user profile is a binary classifier able to      2) Match Engine works as a filter to produce a subset of
categorize an item as interesting or not interesting according             items which is sent to Profile Engine. Profile Engine
to the classification score of the class user-likes.                       uses only this subset to answer to the user request.
   The Match Engine Module implements matchmaking and                   Our recommender system implements the first approach
ranking algorithms. It allows to compare the query with the          and computes results score according to the following simple
description of items referring to the same OWL ontology.             formula: score = α ∗ relevance + β ∗ rank where α and
The reasoner is not embedded within the application, so the          β are numeric coefficients. As an initial attempt we set
Ontology Manager communicates with the Matchmaker via a              both of coefficients to 0.5 value. The evaluation of several
DIG 1.1 interface over HTTP. The Match Engine is used to             experimental tests and usage of different reference ontologies
manage full and potential constraints. We think that when the        could require the change of the previous coefficients values.
user sets a request characteristic as full , she wants to be sure       As real scenario we propose queries formulated in terms of
that full feature are explicitly mentioned in the item description   an ontology which models the bibliographic research domain
to be retrieved. Matchmaker Engine evaluates full or potential       [16]. The ontology is defined by the following classes (Figure
match in the following way:                                          2):
   • all potential constraints - if user sets all request features      • Item - has subclass as Article, inProceeding, Book;
      as potential ones then returned items will be those po-           • Event - has subclass as Conference, Workshop,
      tentially satisfying user request. According to an Open             Meeting;
      World Assumption, returned items could have additional            • Topic - topic hierarchy will be defined by specific
      or missing features w.r.t the request but no features in            research topic and we may use Computer Science terms
      logical conflict with any in the request. In this case the          of ACM Topic Hierarchy [1] for example;
      module runs a potential match;                                    • Author - author hierarchy will be defined by sev-
                                                                 representing the match degree between a query and an item
                                                                 description. The score takes into account both the ontology-
                                                                 based query and the user profile.
                                                                   An initial prototype implementing the proposed approach
                                                                 has been developed and presented in the paper. Currently,
                                                                 we are performing experiments on large datasets in order to
                                                                 validate the proposed approach.

                                                                                               R EFERENCES
                                                                  [1] ACM. Top Two Levels of The ACM Computing Classification System
                                                                      . www.acm.org/class/1998/overview.html, 1998.
                                                                  [2] F. Baader, D. Calvanese, D. Mc Guinness, D. Nardi, and P. Patel-
                                                                      Schneider. The Description Logic Handbook. Cambridge University
               Fig. 2.   Reference Ontology Hierarchy                 Press, 2002.
                                                                  [3] S. Bloedhorn and A. Hotho. Boosting for text classification with
                                                                      semantic features. In Proc. of 10th ACM SIGKDD Int. Conf. on
                                                                      Knowledge Discovery and Data Mining, Mining for and from the
     eral professional figure like AcademicStaff, Manager,            Semantic Web Workshop, pages 70–87, 2004.
     PhDstudent;                                                  [4] R. Burke, M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes,
                                                                      and M. Sartin. Integrating knowledge-based and collaborative-filtering
   • Organization - has subclass as University,
                                                                      recommender systems. In Proceedings of the Workshop on AI and
     Enterprise, Institute;                                           Electronic Commerce. AAAI 99, Orlando, Florida, 1999.
   • Project - project hierarchy will be defined by specific      [5] Robin Burke. Knowledge-based recommender systems. In A. Kent,
                                                                      editor, Encyclopedia of Library and Information Systems, volume 69.
     research project adding properties as financedBy;                New York, 2000.
Besides, the item concept can be defined by the following         [6] Robin D. Burke, Kristian J. Hammond, and Benjamin C. Young. The
properties: aboutProject - has project class as range -,              findme approach to assisted browsing. IEEE Expert, 12(4):32–40, 1997.
                                                                  [7] Tiziana Catarci, Paolo Dongilli, Tania Di Mascio, Enrico Franconi,
hasPublicationYear, presentedAt - has event class as range            Giuseppe Santucci, and Sergio Tessaris. An ontology based visual tool
-, hasAuthor - has author class as range -, developedBy               for query formulation support. In Proceedings of the 16th European
- has organization class as range -, hasTopic - has                   Conference onArtificial Intelligence (ECAI ’04), pages 308–312, 2004.
                                                                  [8] Trevor Collins, Paul Mulholland, and Zdenek Zdráhal. Semantic
topic class as range. Obviously this domain ontology                  browsing of digital collections. In proc. of 4th International Semantic
can be extended by specific journal item properties like              Web Conference (ISWC 2005), pages 127–141, 2005.
hasVolume, hasMonth and by specific book item properties          [9] Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio, Francesco M.
                                                                      Donini, and Marina Mongiello. Concept abduction and contraction
like hasPublisher and hasEdition.                                     for semantic-based discovery of matches and negotiation spaces in
   Finally, in order to retriev only interesting items sev-           an e-marketplace. Electronic Commerce Research and Applications,
eral disjoint sets are defined. For example Book is disjoint          4(4):345–361, 2005.
                                                                 [10] Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio, Francesco M.
by Article, inProceeding, inBook, inCollection and                    Donini, and Azzurra Ragone. Knowledge elicitation for query refinement
Proceedings while inProceeding is disjoint by Book,                   in a semantic-enabled e-marketplace. In 7th International Conference
Thesis, Booklet and Manual.                                           on Electronic Commerce ICEC 05 ACM Press, pages 685–691. ACM,
                                                                      2005.
   In this scenario a user can propose queries such as the       [11] Marco Degemmis, Pasquale Lops, and Pierpaolo Basile. An intelligent
combination of (a)”I’m looking for an inProceeding item               personalized service for conference partecipants. In 16th International
published after 2004, developed in a research project by              Symposium on Methodologies for Intelligent Systems (ISMIS’06), 2006.
                                                                 [12] Tommaso Di Noia, Francesco M. Di Sciascio, Eugenio andDonini, and
both Enterprise and University” and (b)”I am interested               Marina Mongiello. A system for principled matchmaking in anelectronic
in items with matchmaking as keyword in title, including              marketplace. International Journal of Electronic Commerce, 8(4):9–37,
my profile”. In the previous request (a) is the semantic              2004.
                                                                 [13] Eugenio Di Sciascio, Simona Colucci, Tommaso DiNoia, Francesco M.
query for matchmaker, (b) is the profile-based query for              Donini, Azzurra Ragone, and Raffaele Rizzi. A semantic-based fully
the classifier. According to domain ontology the previous             visual application formatchmaking and query refinement in b2ce-
semantic query (a) is translated in the following DL descrip-         marketplaces. In 8th International conference on ElectronicCommerce,
                                                                      ICEC 06, pages 174–184. ACM, ACM Press, 2006.
tion inProceeding u (≥ 2004 hasPublicationYear) u                [14] N. Guarino, C. Masolo, and G. Vetere. Content-based access to the web.
∀aboutProject.(ResearchProject u ∀developedBy.                        IEEE Intelligent Systems, 14(3):70–80, 1999.
(Enterprise u University)).                                      [15] Ramanathan V. Guha, Rob McCool, and Eric Miller. Semantic search.
                                                                      In Proceedings of the Twelfth International World Wide Web Conference,
                         V. C ONCLUSION                               WWW2003, pages 700–709, 2003.
                                                                 [16] P. Haase, J. Broekstra, M.Ehrig, M. Menken, P.Mika, M. Plechawski,
  In this paper, we have described a strategy to design an            P. Pyszlak, B. Schnizler, R. Siebes, S. Staab, and C. Tempich. Bibster - a
advanced semantic search engine able to combine logic based           semantics-based bibliographic peer-to-peer system. In the International
                                                                      Semantic Web Conference (ISWC2004), 2004.
matchmaking with Bayesian text categorization. The use of        [17] C. Leacock and M. Chodorow. Combining local context and WordNet
Wordnet has been exploited in order to enhance a text based           similarity for word sense identification, pages 305–332. In C. Fellbaum
categorization exploited by a Bayesian approach for automated         (Ed.), MIT Press, 1998.
                                                                 [18] L. Li and I. Horrocks. A Software Framework for Matchmaking Based
user profile learning. Combining probabilistic and logic based        on Semantic Web Technology. International Journal of Electronic
similarity measure we have shown how to compute a score               Commerce, 8(4), 2004.
[19] Greg Linden, Steve Hanks, and Neal Lesh. Interactive assessment of
     user preference models: The automated travel assistant. In Proceedings
     of the Sixth International Conference on User Modeling, pages 67–78,
     Vienna, 1997.
[20] B. Magnini and C. Strapparava. Improving user modelling with content-
     based techniques. In Proc. 8th Int. Conf. User Modeling, pages 74–83.
     Springer, 2001.
[21] George Miller. Wordnet: An on-line lexical database. International
     Journal of Lexicography, 3(4), 1990. (Special Issue).
[22] Raymond J. Mooney and Loriene Roy. Content-based book recom-
     mending using learning for text categorization. In Proceedings of the
     5th ACM Conference on Digital Libraries, pages 195–204, San Antonio,
     US, 2000. ACM Press, New York, US.
[23] Paul Mulholland, Trevor Collins, and Zdenek Zdráhal. Story fountain:
     intelligent support for story research and exploration. In Proc. of
     Intelligent User Interfaces Conf., pages 62–69, 2004.
[24] OWL. Web Ontology Language. www.w3.org/TR/owl-features/, 2004.
[25] M. Paolucci, T. Kawamura, T.R. Payne, and K. Sycara. Semantic
     Matching of Web Services Capabilities. In proc. of International
     Semantic Web Conference (ISWC 2002), number 2342 in LNCS. 2002.
[26] Princeton University. WordNet. http://wordnet.princeton.edu/, 2005.
[27] Christian Räck, Stefan Arbanowski, and Stephan Steglich. Context-
     aware, ontology-based recommendations. In international Symposium
     on Applications and the Internet Workshops (SAINTW’06), 2005.
[28] Sam Scott and Stan Matwin. Text classification using wordnet hyper-
     nyms. In COLING-ACL Workshop on usage of WordNet in NLP Systems,
     pages 45–51, 1998.
[29] F. Sebastiani. Machine learning in automated text categorization. ACM
     Computing Surveys, 34(1), 2002.
[30] Giovanni Semeraro, Marco Degemmis, Pasquale Lops, and Pierpaolo
     Basile. Combining learning and word sense disambiguation for in-
     telligent user profiling. In Twentieth International Joint Conference
     on Artificial Intelligence, January 6-12, 2007, Hyderabad, India (to
     appear), 2007.
[31] Sybil Shearin and Henry Lieberman. Intelligent profiling by example.
     In Proceedings of the International Conference on Intelligent User
     Interfaces (IUI 2001), pages 145–151. ACM press.
[32] M. Theobald, R. Schenkel, and G. Weikum. Exploting structure,
     annotation, and ontological knowledge for automatic classification of
     xml data. In Proceedings of International Workshop on Web and
     Databases, pages 1–6, 2004.
[33] Marc Torrens, Boi Faltings, and Pearl Pu. Smartclients: Constraint
     satisfaction as a paradigm for scaleable intelligent information systems.
     Constraints, 7(1):49–69, 2002.
[34] I.H. Witten and T.C. Bell. The zero-frequency problem: Estimating
     the probabilities of novel events in adaptive text compression. IEEE
     Transactions on Information Theory, 37(4), 1991.