Semantic and bayesian profiling services for textual resource retrieval Eufemia Tinelli Pierpaolo Basile Eugenio Di Sciascio Department of Computer Science Department of Computer Science Dipartimento di Elettrotecnica ed Elettronica University of Bari University of Bari Politecnico di Bari Italy, Bari 70126 Italy, Bari 70126 Italy, Bari 70126 Email: tinelli@di.uniba.it Email: basilepp@di.uniba.it Email: disciascio@poliba.it Giovanni Semeraro Dipartment of Computer Science University of Bari Italy, Bari 70126 Email: semeraro@di.uniba.it Abstract— This paper presents an integrated approach to tex- The rest of the paper is structured as follows: the next tual resource retrieval, which combines logical inference services section outlines the work which mainly inspired this paper. In with user profiles, in which a structured representation of the Section III a brief summary of semantic-based matchmaking user interests is maintained. Learning is performed on documents which have been disambiguated by exploiting the WordNet lexical in Description Logics is presented together with a Naı̈ve Bayes database, in an attempt to discover concepts describing user method for user profiling. The description of the framework interests. The proposed approach relies on several additional fea- architecture, a domain reference ontology, together with an tures compared to classical lexical knowledge systems, including: example of query example satisfying user needs, is presented structured user recommendation, numeric value management, in Section IV, while conclusions close the paper. definition of strict and negotiable constraints and keywords to retrieve potential interesting resources w.r.t. both user request II. R ELATED W ORK and profile. Recent years have witnessed a growing interest towards I. I NTRODUCTION profiling based resource retrieval. Among the most relevant The main goal of this paper is to propose a strategy systems adopting a bayesian classifier we cite LIBRA [22] to design advanced semantic search engines based on the which produces content-based book recommendations by ex- idea of combining semantic matchmaking with Bayesian text ploiting product descriptions obtained from Amazon.com Web categorization. By means of formal ontologies, modeled using pages. Documents are represented by using keywords and are OWL [24], the knowledge on a specific domain is modeled and subdivided into slots, each one corresponding to a specific exploited in order to make explicit the implicit knowledge, and section of the document as authors, title, abstract. SiteIF [20] to reason on it by means of the formal semantics expressed in exploits a sense-based representation to build a user profile OWL. On the other hand, a content-based recommender, which as a semantic network whose nodes represent senses of the is able to learn user profiles from disambiguated documents, is words in documents requested by the user. OntoSeek [14] used for customized search. The recommender exploits lexical explores the role of linguistic ontologies in knowledge-based knowledge in the linguistic ontology WordNet [26]. retrieval systems. AMAYA [27] delivers context-aware recom- The success of a retrieval system strongly relies also on mendations, which are based on provided feedback, context query formulation and ranking functions. Especially for an data, and an ontology-based content categorization scheme. ontology-based system, the query language has to be very Former system deals with user profile and on this basis can simple for the end user but, at the same time, its expressiveness provide a prediction/recommendation about interesting items must be able to capture the real user needs and to retrieve only for the end user but w.r.t ITR [11] uses content-based filtering what the user is really looking for. In this paper we present a algorithms. system able both to help the user during the query formulation Our reference system is ITem Recommender (ITR) whose process via an intensional navigation of the ontology, and to strategy is to shift from a keyword-based document repre- return relevant resources via a ranking function exploiting both sentation to a sense-based one in order to integrate lexical the ontology-related semantics of the query and the user profile knowledge in the indexing step of training documents. Several managed by the content-based recommender. methods have been proposed to accomplish this task. Inn [21] Hence, the system suggests interesting items to user by is proposed to include WordNet information at the feature level taking into account three elements: user profiles, semantic item by expanding each word in the training set with all the syn- descriptions and lexical item descriptions. onyms for it, including those available for each sense, in order to avoid a word sense disambiguation (WSD) process [28]. in formulating a query even in the case of ignorance of the This approach has shown a decrease of effectiveness in the vocabulary of the underlying information system. obtained classifier, mostly due to the word ambiguity problem. We do not present here related work on semantic match- In [28] is pointed out that some kind of disambiguation is making (the interested reader is referred to [9]) but only a required in any case. Subsequent works [3], [32] show that framework of semantic-enabled e-marketplaces aimed at fully embedding WSD in document classification tasks can improve exploiting semantics of supply/demand descriptions in B2C classification accuracy. and C2C e-marketplaces [13]. Main features of this framework Besides, for improving search and visualization various are the followings: full exploitation of non-standard inferences example-based search tools have been developed, such as for explanation services in the query-retrieval-refinement loop; SmartClient [33]. SmartClient uses constraint satisfaction tech- semantic-based ranking in the request answering; fully graph- niques, allows to refine (critique) preference values specified ical and usable interface, which requires no prior knowledge in the first step of the search and supports trade-off analysis of any logic principles, though fully exploiting it in the back- among different attributes, e.g., looking for an apartment a office. user can make a compromise between distance and rent (more distant less expensive). Also in [19] a candidate/critiques model has been presented, which allows users to refine III. BASIC SERVICES AND ALGORITHMS candidate solutions proposed. Here, preferences are elicited incrementally by analyzing critiques through subsequent iter- A close relation exists between OWL and Description ations. It is an Automated Travel Assistant (ATA) for planning Logics. In fact, the formal semantics of OWL DL sub-language airline travels, and similarly to SmartClient, ATA exploits CSP is grounded in the Description Logics theoretical studies. We techniques: preferences are described using soft constraints assume the reader be familiar with the basics of Description defined on the values of attributes. AptDecision [31] is a Logics and with two standard inference services provided by tool supporting elicitation of preferences in the real estate a DL reasoner: Subsumption and Satisfiablity [2]. domain: browsing the domain, users can discover new features of interest and through their refinement of apartment features, Given a query Q and an item to be retrieved I the following agents can build a profile of their preferences using learning match classes can be identified with respect to an ontology techniques. FindMe [6] uses case-based reasoning as a way T (see [12], [18], [25]). of recommending products in e-commerce catalogs. FindMe, • exact - T |= Q ≡ I. I is semantically equivalent to Q. and its enhanced version The Wasabi Personal Shopper [4], All the characteristics expressed in Q are presented in I combines instance-based browsing and tweaking by difference. and I does not expose any additional characteristic with Different FindMe-like systems have been developed, in various respect to Q. domains. Among systems based on FindMe the most renowned • full - T |= I v Q. I is more specific than Q. All the is Entrée [5], a restaurant recommender, which allows users characteristics expressed in Q are provided by I and I to refine a query on the basis of the results displayed, so it is exposes also other characteristics both not required by Q possible to choose a restaurant less expensive or closer than and not in conflict with the ones in Q. the restaurant shown after the first query. • plug-in - T |= Q v I. Q is more specific than I. All the Recently, there has been a growing interest toward systems characteristics expressed in I are provided by Q and Q supporting semantics exploitation, in different domains. In requires also other characteristics both not exposed by I [15] an application is presented, improving traditional web and not in conflict with the ones in I. searching using semantic web technologies: two Semantic • potential - T 6|= I u Q v ⊥. Q is compatible with I. Search applications are presented, running on an application Nothing in Q is logically in conflict with anything in I. framework called TAP, which provides a set of simple mech- • partial - T |= I u Q v ⊥. Q is not compatible with anisms for sites to publish data onto the Semantic Web and I. Something in Q is logically in conflict with some for applications to consume these data via a query interface characteristic in I. called GetData. The results provided by the system are then compared with traditional text search results of Google.it With respect to the above classification, in case of potential Web pages. Story Fountain [23] is an ontology-based tool, match a similarity measure is needed to understand “how which provides a guided exploration of digital stories using potentially” I satisfies Q. a reasoning engine for the selection and organization of The semantic similarity between a query and an item to resources. Story Fountain provides support for six different be retrieved can be computed with the aid of the algorithm exploration facilities to aid users engaged in exploration pro- rankPotential [12]. Starting from the unfolded version (i.e., cess. The system is being used by the tour guides at Bletchley normalized with respect to the reference ontology) of both Park. The approach has been further investigated in [8]. An the query and the item description, the algorithm is able to intelligent query interface exploiting an ontology-based search quantify how many information requested in the query are engine is presented in [7]; the system enables access to data missing in the item description. In order to understand the sources through an integrated ontology and supports a user approach we consider the following trivial example where the ontology is just a simple taxonomy1 . semantic similarity measure adopted is the Leacock-Chodorow  measure [17]. Similarity between synsets a and b is inversely  B v A C v BuE proportional to the distance between them in the WordNet is-a T =  D ≡ AuF hierarchy, measured by the number of hops in the shortest path from a to b. The algorithm starts by defining the context C of With respect to the previous T consider the query Q = C u D w as the set of words in the same slot of w having the same and the item I = E u B u G. Referring to the above POS as w, then it identifies both the sense inventory Xw for w classification we see that Q and I are a potential match. and the sense inventory Xj for each word wj in C. The sense Unfolding T in both Q and I we obtain Q = C uB uAuE uF inventory T for the whole context C is given by the union and I = E u B u A u G. Since in the ontology the third one of all Xj . After this step, we measure the similarity of each is an equivalence axiom, we rewrite D with A u F instead candidate sense si ∈ Xw to that of each sense sh ∈ T and then of expanding it as for B and C. Once we have the unfolded the sense assigned to w is the one with the highest similarity version of Q and I we say that two pieces of information score. Each document is mapped into a list of WordNet synsets {C,F } are missing in I in order to completely satisfy Q following three steps. (and then reach a full match). Since the maximum number 1) each monosemous word w in a slot of d is mapped into of missing pieces of information is equal to the length of the the corresponding WordNet synset; unfolded Q, in this case five, we assign a normalized semantic 2) for each pair of words hnoun,nouni or hadjective,nouni, similarity score of (1 − 25 ) to the previous example match. a search in WordNet is made to verify if at least one Then in the most general case, given an ontology T and two synset exists for the bigram hw1 , w2 i. In the positive concept Q and I, the semantic similarity score is computed by case, the algorithm is applied on the bigram, otherwise the following formula: it is applied separately on w1 and w2 ; in both cases rankP otential(I, Q) all words in the slot are used as the context C of the rank = 1 − (1) word(s) to be disambiguated; rankP otential(>, Q) 3) each polysemous unigram w is disambiguated by the where > is the most generic concept in every DLs ontology. algorithm, using all words in the slot as the context C Obviously the previous score is equal to 1 only in the case of of w. full match. A new version of the WSD algorithm has been recently On the other hand, we consider the problem of learning produced [30]. user profiles as a binary text categorization task [29]. Each The WSD procedure is used to obtain a synset-based vector document has to be classified as interesting or not with respect space representation that we called Bag-Of-Synsets (BOS). In to the user preferences. Therefore, the set of categories is C = this model, a synset vector corresponds to a document, instead {c+ , c− }, where c+ is the positive class (user-likes) and c− of a word vector. Each document is represented by a set of the negative one (user-dislikes). We present a method able to slots. Each slot is a textual field corresponding to a specific learn sense-based profiles by exploiting an indexing procedure feature of the document, in an attempt to take into account based on WordNet. also the structure of documents. We extend the classical BOW model [29] to a model in Formally, assume that we have a collection of N documents, which the senses (meanings) corresponding to the words in each document being subdivided into M slots. Let m be the the documents are considered as features. The goal of the index of the slot, n = 1, 2, ..., N , the n-th document is reduced WSD algorithm is to associate the appropriate sense s to a to 3 bags of synsets, one for each slot: word w in document d, by exploiting its context C (a set of words that precede and follow w). The sense s is selected dm m m m n = htn1 , tn2 , . . . , tnDnm i from a predefined set of possibilities, usually known as sense where tmnk is the k-th synset in slot sm of document dn and inventory, that in our algorithm is obtained from WordNet Dnm is the total number of synsets appearing in the m-th slot [26]. The basic building block for WordNet is the SYNSET of document dn . For all n, k and m, tm nk ∈ Vm , which is (SYNonym SET), a set of words with synonymous meanings the vocabulary for the slot sm (the set of all different synsets which represents a specific sense of a word. The text in found in slot sm ). Document dn is finally represented in the d is processed by two basic phases: the document is first vector space by M synset-frequency vectors: tokenized and then, after removing stopwords, part of speech (POS) ambiguities are solved for each token. Reduction to fnm = hwn1 m m , wn2 m , . . . , wnD nm i lemmas is performed and then synset identification with WSD where wnk m is the weight of the synset tm nk in the slot sm is performed: w is disambiguated by determining the degree of of document dn and can be computed in different ways: semantic similarity among candidate synsets for w and those It can be simply the number of times synset tk appears in of each word in C. The proper synset assigned to w is that with slot sm or a more complex TF - IDF score. Our hypothesis is the highest similarity with respect to its context of use. The that the proposed indexing procedure helps to obtain profiles 1 For the sake of simplicity in this example we do not consider roles even able to recommend documents semantically closer to the user if rankPotential is able to deal with them for ALN ontologies. interests. As a strategy to learn user profiles on BOS-indexed doc- |T R| uments, ITem Recommender (ITR) uses a Naı̈ve Bayes text X categorization algorithm to build profiles as binary classifiers N (tk , cj , sm ) = wji nkim (7) (user-likes vs user-dislikes). The induced probabilistic model i=1 estimates the a posteriori probability, P (cj |di ), of document In (7), nkim is the number of occurrences of token tk in di belonging to class cj as follows: slot sm of document di . The sum of all N (tk , cj , sm ) in the Y denominator of equation (6) denotes the total weighted length P (cj |di ) = P (cj ) P (tk |cj )N (di ,tk ) (2) of the slot sm in class cj . In other words, P̂ (tk |cj , sm ) is w∈di estimated as the ratio between the weighted occurrences of tk in slot sm of class cj and the total weighted length of the slot. where N (di , tk ) is the number of times token tk occurs in The final outcome of the learning process is a probabilistic document di . In ITR, each document is encoded as a vector model used to classify a new document in the class c+ or c− . of BOS, one for each slot. Therefore, equation (2) becomes: This model is the user profile, which includes those tokens |S| |bim | that turn out to be most indicative of the user preferences, P (cj ) Y Y according to the value of the conditional probabilities in (6). P (cj |di ) = P (tk |cj , sm )nkim (3) P (di ) m=1 k=1 IV. SYSTEM FEATURES AND ARCHITECTURE where S= {s1 , s2 , . . . , s|S| } is the set of slots, bim is the BOS in the slot sm of di , nkim is the number of occurrences Based on the previous techniques we built a system (see 1) of token tk in bim . Training is performed on BOS-represented enabling users to perform: documents, thus tokens are WordNet synsets, and the induced • semantic searching by selecting ontology classes and model relies on synset frequencies. To calculate (3), the system properties; has to estimate P (cj ) and P (tk |cj , sm ) in the training phase. • personalized searching based on user profiles and item The documents used to train the system are rated on a discrete information; scale from 1 to MAX, where MAX is the maximum rating that • semantic-personalized searching obtained by combining can be assigned to a document. According to an idea proposed the two types of searching; in [22], each training document di is labeled with two scores, a i i “user-likes” score w+ and a “user-dislikes” score w− , obtained from the original rating r: i r−1 i i w+ = ; w− = 1 − w+ (4) M AX − 1 The scores in (4) are exploited for weighting the occurrences of tokens in the documents and to estimate their probabilities from the training set T R. The prior probabilities of the classes are computed according to the following equation: |T R| wji + 1 P i=1 P̂ (cj ) = (5) |T R| + 2 Witten-Bell smoothing [34] is adopted to compute P (tk |cj , sm ), by taking into account that documents are structured into slots and that token occurrences are weighted using scores in equation (4): Fig. 1. Ontology-based recommender system architecture P In our approach, an item is represented by means of the  N (tk ,cj ,sm )  Vcj + i N (ti ,cj ,sm ) if N (tk , cj , sm ) 6= 0   following data: P̂ (tk |cj , sm ) =    Vcj + P VN (t ,c ,s ) V −V1 i cj i j m cj otherwise • item information - defined by two sets of information. The first set is composed of intrinsic information. For (6) example, in a bibliographic research scenario, intrinsic where N (tk , cj , sm ) is the count of the weighted occurrences information could be: item identification number, authors, of token tk in the slot sm in the training data for class cj , Vcj title, abstract, slot types. The second set is composed of is the total number of unique tokens in class cj , and V is the information produced by the classifier during the training total number of unique tokens across all classes. N (tk , cj , sm ) step: the bag of synsets obtained by the WSD process on is computed as follows: each slot; • user profile - learned by the ITR system, as described in • all full constraints - if user sets all request features as the previous section; full ones then returned items have to express explicitly • item semantic description - an OWL description of the at least all requested features. Of course this does not reference ontology. prevent the item description to include also not requested All the above information is stored in a repository, while the features. In this case the module evaluates a full match; adopted reference vocabulary to define semantic profile is the • mixed - in this case the module first compute a full WordNet lexical database. Recommender system architecture matches considering a temporary request composed of is composed of several modules and each one has a specific full features only. The set of items returned in the previous role and instantiates a part of the repository. The Interface step could contain also features in logical conflict with Module allows to define semantic item description and user someone in the potential request, so the module runs a request. It provides a GUI to browse the hierarchy of concepts potential match with the potential part of the request to and to outline properties of the selected class. This feature discard these results. of the GUI supports a user to define requests as descriptions Results returned by Profile Engine are defined by the which are logically consistent w.r.t. the reference ontology. pair hitem identifier, relevance ratei whilst the ones The user request is split into two main parts: returned by Match Engine are defined by the pair • full : in the full part of the request, the user sets the hitem identifier, rank valuei according to equations (1) constraints she wants to be satisfied by (in full match and (3). with) the retrieved items. The Recommender Manager allows the communication and • potential : here the user sets her preferences, i.e., her synchronization between Profile and Match Engine. This mod- wishful options. The ontology-based score is computed ule is designed to deal with two issues: results ordering and measuring “how many” of these constraints are satisfied synchronization. In web-based retrieval systems, ranked lists by a retrieved item. are surely preferable to unordered sets of items. Nevertheless they become less effective and usable as the complexity of the We stress the fact that in order to fulfill user needs the item description increases, together with possible user pref- research process could be an iterative one. On the basis of erences. At least two implementations of the Recommender returned results, a user is able to refine the previous request Engine are possible: exchanging a full constraint for a potential one and vice versa. Our recommender system considers user query features as full 1) Recommender Manager operates in two steps. In the first constraints by default and as potential only whenever explicitly step, this module receives user requests. They are OWL stated by the user [10]. description translated in DIG; then it activates Profile The Profile Engine Module prepares the item information and Match Engine. In the second step, Recommender by performing WSD on the textual descriptions of the items to Engine fetches user request results by two previous be recommended. Mainly built on the ITR system, it performs modules, then it computes a unique score adding match the training step on the disambiguated text in order to infer the rank and classifier relevance rate value. The new ranked user profiles which will be exploited in the recommendation results will be sent to the GUI; process. Each inferred user profile is a binary classifier able to 2) Match Engine works as a filter to produce a subset of categorize an item as interesting or not interesting according items which is sent to Profile Engine. Profile Engine to the classification score of the class user-likes. uses only this subset to answer to the user request. The Match Engine Module implements matchmaking and Our recommender system implements the first approach ranking algorithms. It allows to compare the query with the and computes results score according to the following simple description of items referring to the same OWL ontology. formula: score = α ∗ relevance + β ∗ rank where α and The reasoner is not embedded within the application, so the β are numeric coefficients. As an initial attempt we set Ontology Manager communicates with the Matchmaker via a both of coefficients to 0.5 value. The evaluation of several DIG 1.1 interface over HTTP. The Match Engine is used to experimental tests and usage of different reference ontologies manage full and potential constraints. We think that when the could require the change of the previous coefficients values. user sets a request characteristic as full , she wants to be sure As real scenario we propose queries formulated in terms of that full feature are explicitly mentioned in the item description an ontology which models the bibliographic research domain to be retrieved. Matchmaker Engine evaluates full or potential [16]. The ontology is defined by the following classes (Figure match in the following way: 2): • all potential constraints - if user sets all request features • Item - has subclass as Article, inProceeding, Book; as potential ones then returned items will be those po- • Event - has subclass as Conference, Workshop, tentially satisfying user request. According to an Open Meeting; World Assumption, returned items could have additional • Topic - topic hierarchy will be defined by specific or missing features w.r.t the request but no features in research topic and we may use Computer Science terms logical conflict with any in the request. In this case the of ACM Topic Hierarchy [1] for example; module runs a potential match; • Author - author hierarchy will be defined by sev- representing the match degree between a query and an item description. The score takes into account both the ontology- based query and the user profile. An initial prototype implementing the proposed approach has been developed and presented in the paper. Currently, we are performing experiments on large datasets in order to validate the proposed approach. R EFERENCES [1] ACM. Top Two Levels of The ACM Computing Classification System . www.acm.org/class/1998/overview.html, 1998. [2] F. Baader, D. Calvanese, D. Mc Guinness, D. Nardi, and P. Patel- Schneider. The Description Logic Handbook. Cambridge University Fig. 2. Reference Ontology Hierarchy Press, 2002. [3] S. Bloedhorn and A. Hotho. Boosting for text classification with semantic features. In Proc. of 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Mining for and from the eral professional figure like AcademicStaff, Manager, Semantic Web Workshop, pages 70–87, 2004. PhDstudent; [4] R. Burke, M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes, and M. Sartin. Integrating knowledge-based and collaborative-filtering • Organization - has subclass as University, recommender systems. In Proceedings of the Workshop on AI and Enterprise, Institute; Electronic Commerce. AAAI 99, Orlando, Florida, 1999. • Project - project hierarchy will be defined by specific [5] Robin Burke. Knowledge-based recommender systems. In A. Kent, editor, Encyclopedia of Library and Information Systems, volume 69. research project adding properties as financedBy; New York, 2000. Besides, the item concept can be defined by the following [6] Robin D. Burke, Kristian J. Hammond, and Benjamin C. Young. The properties: aboutProject - has project class as range -, findme approach to assisted browsing. IEEE Expert, 12(4):32–40, 1997. [7] Tiziana Catarci, Paolo Dongilli, Tania Di Mascio, Enrico Franconi, hasPublicationYear, presentedAt - has event class as range Giuseppe Santucci, and Sergio Tessaris. An ontology based visual tool -, hasAuthor - has author class as range -, developedBy for query formulation support. In Proceedings of the 16th European - has organization class as range -, hasTopic - has Conference onArtificial Intelligence (ECAI ’04), pages 308–312, 2004. [8] Trevor Collins, Paul Mulholland, and Zdenek Zdráhal. Semantic topic class as range. Obviously this domain ontology browsing of digital collections. In proc. of 4th International Semantic can be extended by specific journal item properties like Web Conference (ISWC 2005), pages 127–141, 2005. hasVolume, hasMonth and by specific book item properties [9] Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio, Francesco M. Donini, and Marina Mongiello. Concept abduction and contraction like hasPublisher and hasEdition. for semantic-based discovery of matches and negotiation spaces in Finally, in order to retriev only interesting items sev- an e-marketplace. Electronic Commerce Research and Applications, eral disjoint sets are defined. For example Book is disjoint 4(4):345–361, 2005. [10] Simona Colucci, Tommaso Di Noia, Eugenio Di Sciascio, Francesco M. by Article, inProceeding, inBook, inCollection and Donini, and Azzurra Ragone. Knowledge elicitation for query refinement Proceedings while inProceeding is disjoint by Book, in a semantic-enabled e-marketplace. In 7th International Conference Thesis, Booklet and Manual. on Electronic Commerce ICEC 05 ACM Press, pages 685–691. ACM, 2005. In this scenario a user can propose queries such as the [11] Marco Degemmis, Pasquale Lops, and Pierpaolo Basile. An intelligent combination of (a)”I’m looking for an inProceeding item personalized service for conference partecipants. In 16th International published after 2004, developed in a research project by Symposium on Methodologies for Intelligent Systems (ISMIS’06), 2006. [12] Tommaso Di Noia, Francesco M. Di Sciascio, Eugenio andDonini, and both Enterprise and University” and (b)”I am interested Marina Mongiello. A system for principled matchmaking in anelectronic in items with matchmaking as keyword in title, including marketplace. International Journal of Electronic Commerce, 8(4):9–37, my profile”. In the previous request (a) is the semantic 2004. [13] Eugenio Di Sciascio, Simona Colucci, Tommaso DiNoia, Francesco M. query for matchmaker, (b) is the profile-based query for Donini, Azzurra Ragone, and Raffaele Rizzi. A semantic-based fully the classifier. According to domain ontology the previous visual application formatchmaking and query refinement in b2ce- semantic query (a) is translated in the following DL descrip- marketplaces. In 8th International conference on ElectronicCommerce, ICEC 06, pages 174–184. ACM, ACM Press, 2006. tion inProceeding u (≥ 2004 hasPublicationYear) u [14] N. Guarino, C. Masolo, and G. Vetere. Content-based access to the web. ∀aboutProject.(ResearchProject u ∀developedBy. IEEE Intelligent Systems, 14(3):70–80, 1999. (Enterprise u University)). [15] Ramanathan V. Guha, Rob McCool, and Eric Miller. Semantic search. In Proceedings of the Twelfth International World Wide Web Conference, V. C ONCLUSION WWW2003, pages 700–709, 2003. [16] P. Haase, J. Broekstra, M.Ehrig, M. Menken, P.Mika, M. Plechawski, In this paper, we have described a strategy to design an P. Pyszlak, B. Schnizler, R. Siebes, S. Staab, and C. Tempich. Bibster - a advanced semantic search engine able to combine logic based semantics-based bibliographic peer-to-peer system. In the International Semantic Web Conference (ISWC2004), 2004. matchmaking with Bayesian text categorization. The use of [17] C. Leacock and M. Chodorow. Combining local context and WordNet Wordnet has been exploited in order to enhance a text based similarity for word sense identification, pages 305–332. In C. Fellbaum categorization exploited by a Bayesian approach for automated (Ed.), MIT Press, 1998. [18] L. Li and I. Horrocks. A Software Framework for Matchmaking Based user profile learning. Combining probabilistic and logic based on Semantic Web Technology. International Journal of Electronic similarity measure we have shown how to compute a score Commerce, 8(4), 2004. [19] Greg Linden, Steve Hanks, and Neal Lesh. Interactive assessment of user preference models: The automated travel assistant. In Proceedings of the Sixth International Conference on User Modeling, pages 67–78, Vienna, 1997. [20] B. Magnini and C. Strapparava. Improving user modelling with content- based techniques. In Proc. 8th Int. Conf. User Modeling, pages 74–83. Springer, 2001. [21] George Miller. Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 1990. (Special Issue). [22] Raymond J. Mooney and Loriene Roy. Content-based book recom- mending using learning for text categorization. In Proceedings of the 5th ACM Conference on Digital Libraries, pages 195–204, San Antonio, US, 2000. ACM Press, New York, US. [23] Paul Mulholland, Trevor Collins, and Zdenek Zdráhal. Story fountain: intelligent support for story research and exploration. In Proc. of Intelligent User Interfaces Conf., pages 62–69, 2004. [24] OWL. Web Ontology Language. www.w3.org/TR/owl-features/, 2004. [25] M. Paolucci, T. Kawamura, T.R. Payne, and K. Sycara. Semantic Matching of Web Services Capabilities. In proc. of International Semantic Web Conference (ISWC 2002), number 2342 in LNCS. 2002. [26] Princeton University. WordNet. http://wordnet.princeton.edu/, 2005. [27] Christian Räck, Stefan Arbanowski, and Stephan Steglich. Context- aware, ontology-based recommendations. In international Symposium on Applications and the Internet Workshops (SAINTW’06), 2005. [28] Sam Scott and Stan Matwin. Text classification using wordnet hyper- nyms. In COLING-ACL Workshop on usage of WordNet in NLP Systems, pages 45–51, 1998. [29] F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 2002. [30] Giovanni Semeraro, Marco Degemmis, Pasquale Lops, and Pierpaolo Basile. Combining learning and word sense disambiguation for in- telligent user profiling. In Twentieth International Joint Conference on Artificial Intelligence, January 6-12, 2007, Hyderabad, India (to appear), 2007. [31] Sybil Shearin and Henry Lieberman. Intelligent profiling by example. In Proceedings of the International Conference on Intelligent User Interfaces (IUI 2001), pages 145–151. ACM press. [32] M. Theobald, R. Schenkel, and G. Weikum. Exploting structure, annotation, and ontological knowledge for automatic classification of xml data. In Proceedings of International Workshop on Web and Databases, pages 1–6, 2004. [33] Marc Torrens, Boi Faltings, and Pearl Pu. Smartclients: Constraint satisfaction as a paradigm for scaleable intelligent information systems. Constraints, 7(1):49–69, 2002. [34] I.H. Witten and T.C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4), 1991.