Supporting Consumers in Providing Meaningful Multi-Criteria Judgments Friederike Klan Birgitta König-Ries Institute of Computer Science Institute of Computer Science Friedrich-Schiller-University of Jena Friedrich-Schiller-University of Jena friederike.klan@uni-jena.de birgitta.koenig-ries@uni-jena.de ABSTRACT 1. INTRODUCTION The huge amount of products and services that are avail- The huge amount and heterogeneity of information, prod- able online, makes it difficult for consumers to identify offers ucts and services that are available online, makes it difficult which are of interest to them. Semantic retrieval techniques for consumers to identify offers which are of interest to them. for Web Services address this issue, but make the unreal- Hence, new techniques that support users in the product istic assumption that offer descriptions describe a service’s search and selection process are required. In the past decade, capabilities correctly and that service requests reflect a con- semantic technologies have been developed and leveraged to sumer’s actual requirements. As a consequence, they might approach this issue [3]. They provide information with a produce inaccurate results. Alternative retrieval techniques well-defined and machine-comprehensible meaning and thus such as collaborative filtering (CF) mitigate those problems, enable computers to support people in identifying relevant but perform not well in situations where consumer feedback content. This idea is not restricted to information, but also is scarce. As a solution, we propose to combine both tech- applies to functionality provided via the web as services. niques. However, we argue that the multi-faceted nature Semantic Web Services (SWS) provide a specific function- Web Services imposes special requirements on the under- ality semantically described in a machine-processable way lying feedback mechanism, that are only partially met by over a well-defined interface. Similarly, service requesters existing CF solutions. The focus of this paper is on how to may semantically express their service requirements. Hav- elicit consumer feedback that can be effectively used in the ing both, a semantic description of a consumer’s needs as context of Web Service retrieval and how to support users well as the published semantic descriptions of available Web in that process. Our main contribution is an algorithm that Services, suitable service offers can be automatically discov- suggests which service aspects should be judged by a con- ered by comparing (matching) the given service request with sumer. The approach effectively adjusts to user’s ability and available offer descriptions. Services might be automatically willingness to provide judgments and ensures that the pro- configured and composed and finally invoked over the web. vided feedback is meaningful and appropriate in the context Existing semantic matchmaking and service selection ap- of a certain service interaction. proaches evaluate the suitability of available service offers exclusively by comparing the published offer descriptions with a given request description. They implicitly assume Categories and Subject Descriptors that offer descriptions describe a service’s capabilities cor- H.3.3 [Information Storage And Retrieval]: Informa- rectly and that service requests reflect a consumer’s actual tion Search and Retrieval; H.3.5 [Information Storage requirements. The first assumption might have been valid in And Retrieval]: On-line Information Services a market with a small number of well-known and accredited companies. However, it is no longer true in today’s market, where easy and cheap access to the Internet and the emer- General Terms gence of online marketplaces that offer easy to set up on- Algorithms, Human Factors, Measurement line storefronts enable virtually everyone to provide his own online shop accessible to millions of buyers. The situation becomes even more critical, since due to the huge number of Keywords offers, a hard competition and price war has been aroused recommend what to judge, multi-criteria judgments, person- that might cause some providers to promise more than they alized feedback elicitation are able to provide. In our mind, the assumption that ser- vice requests reflect a consumer’s actual requirements is also not realistic. This is due to the fact that, though SWS ap- proaches provide adequate means to semantically describe service needs, they require the user to do this at a formal, logic-based level that is not appropriate for the average ser- vice consumer in an e-commerce setting. As a result, SWS applications typically provide request templates for common Copyright is held by the author/owner(s). Workshop on the Practical Use of service needs. Those templates are then adjusted to fit to a Recommender Systems, Algorithms and Technologies (PRSAT 2010), held consumer’s requirements in a certain purchasing situation. in conjunction with RecSys 2010. September 30, 2010, Barcelona, Spain. Though the resulting service requests might be a good esti- eyes. Hence, feedback is biased by personal expectations and mate of a consumer’s service needs, they cannot exactly met preferences about the invoked service. Moreover, feedback his true requirements. As a consequence, service discovery may refer to different services and to different request con- mechanisms that are purely based on the comparison of se- texts. For example, a ticket booking service might have been mantic request and offer descriptions might produce inac- used to buy group tickets for a school class or to buy a sin- curate results and thus lead to suboptimal service selection gle ticket. However, the suitability of a service might differ decisions. depending on the request context and hence the resulting To mitigate those problems, alternative retrieval techniques feedback also does. Feedback mechanisms should account such as collaborative filtering [9] have been developed. Those for those facts. To enable effective usage, feedback has to techniques do not rely on explicit models of consumer re- be meaningful, i.e., the expectations and the context under- quirements and product properties. They evaluate product lying a judgment should be clear. In addition, it should be ratings of neighboring users, i.e. those that have a similar evident whether and how feedback made under one circum- taste, to recommend products or services that might be of stance can be used to infer about a service’s suitability in interest to a potential consumer. Though collaborative fil- another situation. tering approaches are very effective in many domains, they We would also like to emphasize the necessity of feedback lack the powerful knowledge representation and matchmak- to be as detailed as possible, i.e. comprising of judgments re- ing capabilities provided by SWS and thus perform not well ferring to various aspects of a service interaction. This is for in situations where feedback is scarce [9]. As a solution, we several reasons. Firstly, feedback, judging the quality of a propose to combine both techniques. More specifically, we provided service as a whole, is of limited significance, since as suggest to perform retrieval based on semantic service de- an aggregated judgment it provides not more than a rough scriptions and then use a collaborative feedback mechanism estimate of a service’s performance. Secondly, aggregated to verify and refine those results. We think, that such a feedback tends to be inaccurate. This is due to the fact, that hybrid approach can benefit from the best of both worlds humans are bad at integrating information about different and thus has the potential to significantly improve the re- aspects, as they appear in a multi-faceted service interac- trieval quality. Combining semantic retrieval with collab- tion, in particular if those aspects are diverse and incompa- orative feedback mechanisms is not new (see for example rable [2, 10]. Finally, it has been shown in [4] that using [8, 11]). However, we argue that simply re-using existing detailed consumer feedback allows to estimate user taste’s techniques, as done in other approaches, will not tap the more accurately and thus can significantly improve predic- full potential of this type of approach. This is due to the tion accuracy. In the context of detailed, i.e. multi-criteria, fact, that the multi-faceted nature and the peculiarities of consumer feedback, meaningful also means that the relation- SWS impose special requirements on the underlying feed- ship between different service aspects that might have been back mechanism and in particular on the properties of the judged is clear and that all relevant aspects characterizing consumer feedback that is required. In this paper, we will a certain service interaction have been judged. The latter is analyze those requirements (Sect. 2) and will show that they due to the fact, that inferred judgments based on incomplete are only partially met by existing collaborative filtering so- information might be incorrect. lutions (Sect. 3). The focus of this paper is on how to elicit Another problem we encounter is feedback scarcity. Given consumer feedback that can be effectively used in the context certain service requirements, a certain context and a par- of SWS retrieval and how to support users in that process ticular service, feedback for exactly this set-up is rare and (Sects. 4 and 5). Our main contribution is an algorithm typically not available at all. Hence, scarce feedback has to that suggests which service aspects should be judged by a be exploited effectively. In particular, service experiences consumer (Sect. 6). The approach accounts for a user’s abil- related to different, but similar contexts and those related ity and willingness to provide judgments and ensures that to other, but similar services have to be leveraged. However, the provided feedback is meaningful and appropriate in the unfolding the full potential of consumer feedback, in partic- context of a certain service interaction. Our evaluation re- ularly when using multi-aspect feedback, requires that users sults show that the proposed procedure effectively adjusts to provide useful responses. To ensure this, the feedback elic- a consumer’s personal judgment preferences and thus pro- itation process should be assisted. In particular, it should vides helpful support for the process of feedback elicitation be taken care that elicited feedback is comprehensive and (Sect. 7). A detailed discussion on how to effectively use appropriate in the context of a certain service interaction. consumer feedback to enhance SWS retrieval is published in In addition, a consumer’s willingness to provide feedback [6]. as well as his expertise in the service domain should be ac- counted for. This is important, since asking a consumer for 2. REQUIREMENTS a number of judgments he is not able and/or not willing to provide will result in no or bad quality feedback. Finally, Various collaborative filtering mechanisms that allow to it should also be ensured that all relevant information that retrieve products or services that are of interest to a con- are necessary for effectively exploring consumer feedback are sumer [9] have been proposed. Those mechanisms are very recorded. This should happen transparently for the user. effective in many domains and seem to be very promising in Since the type of service interactions to be judged and the context of our work. However, we argue that the multi- the kind of users that provide feedback are diverse and not faceted nature of SWS imposes special requirements on the known in advance, even for a specific area of application, a underlying feedback mechanism, that are only partially met hard-wired solution with predefined service aspects to judge by existing CF solutions. In the following, we will specify is inappropriate. In fact, the process of feedback elicitation those requirements. should be customizable and should be automatically config- Consumer feedback is subjective, since it reflects a ser- urable at runtime. vice’s suitability as perceived through a certain consumer’s 3. RELATED APPROACHES in Fig. 1, acceptable instances are mobile phones that are Aspects such as feedback scarcity and subjectivity of con- cheaper than 50$, are either silver or black, are of bar or sumer feedback are typically addressed in existing collabo- slider style and are from either Nokia or Sony Ericsson. As rative filtering solutions [9]. Also, dealing with the context- dependent nature of judgments has been an issue (see e.g. :ServiceProfile effect [1]). However, existing solutions only partially address the Owned question of how to effectively use judgments made in one product context to infer about a service’s suitability in another con- Product text. Multi-criteria feedback has been an issue in both aca- price productType demic [4] and commercial recommender systems. Typically, Price MobilePhone the set of aspects that might be judged by a consumer is currency amount either the same for all product types or specific per product 0.3 * (battery mul style mul color) + 0.7 * (phoneType mul battery mul color) Currency Double category. However, in the first case, this set of aspects is ==usd <=50 either very generic, i.e. not product-specific, or not appro- priate for all products. In the second case, this set has to battery style phoneType color be specified manually for each new product. Moreover, typi- Battery MobilePhoneType MobilePhoneStyle Color cally the single aspect ratings are supplementary in the sense in {bar, slider} in {silver, black} that they do not have any influence on a product’s overall manufacturer model rating. Alternatively, some reviewing engines such as those ... Model provided by Epinions 1 or Powerreviews2 , offer more flexible Company in {nokia[1.0], sonyEricsson[0.8]} reviewing facilities based on tagging. Those systems allow consumers to create tags describing the pros and contras of a given product. These tags can then be reused by other Figure 1: DSD service request users. Tagging provides a very intuitive and flexible mech- anism that allows for product-specific judgments. However, can be seen in the example, DSD utilizes a specific mech- the high flexibility of the approach is at the cost of the judg- anism to declaratively and hierarchically characterize (ac- ments’ meaningfulness. This is due to the fact that tags do ceptable) sets of service effects: Service effects are described not have a clear semantics. In particular, the relationship by means of their attributes, such as price or color. Each between different tags is unknown and thus makes them in- attribute may be constrained by direct conditions on its val- comparable. Moreover, those systems do not ensure that ues and by conditions on its subattributes. For instance, all relevant aspects of a product or a service interaction are the attribute phoneType is constrained by a direct condition judged. To summarize our findings, more flexible and adap- on its subattribute manufacturer, which indicates that only tive mechanisms to elicit and describe multi-criteria feed- mobile phones from Nokia or Sony Ericsson are acceptable. back are required. In particular, the question of how to The direct condition <= 50 on the price amount in Fig. 1 describe this type of feedback meaningfully has been hardly indicates that only prices lower than 50$ are acceptable. considered. To the best of our knowledge, the issue of assist- Attribute conditions induce a tree-like and more and more ing consumers in providing comprehensive, appropriate and fine-grained characterization of acceptable service effects. A meaningful feedback has not been addressed at all. Also, as- DSD request does not only specify which service effects are pects such as a consumer’s ability and willingness to provide acceptable, but also indicates to which degree they are ac- judgments for specific aspects have been hardly considered ceptable. In this context, a preference value from [0, 1] is in existing solutions. specified for each attribute value. The default is 1.0 (totally acceptable), but alternative values might be specified in the direct conditions of each attribute. For example, the pref- 4. SEMANTIC WEB SERVICE RETRIEVAL erence value for the attribute manufacturer is 1.0 for Nokia As a basis for further discussion, we introduce the seman- phones and 0.8 for mobile phones from Sony Ericsson. tic service description language DSD (DIANE Service De- As demonstrated in [7], DSD service and request descrip- scription) [7] and its mechanisms for automatic semantic tions can be efficiently compared. Given a service request, service matchmaking that underlie our approach. Similarly the semantic matchmaker outputs an aggregated overall pref- to other service description approaches, DSD is ontology- erence value ∈ [0, 1] for each available service offer descrip- based and describes the functionality a service provides as tion. This value is called matching value and indicates how well as the functionality required by a service consumer by ell a considered service offer fits to a consumer’s require- means of the precondition(s) and the set of possible effect(s) ments encoded in the service request. Based on the match- of a service execution. In the service request depicted in ing values, the best fitting service offer is determined and Fig. 1, the desired effect is that a product is owned after invoked. service execution. A single effect corresponds to a partic- ular service instance that can be executed. While service 5. FEEDBACK ELICITATION offer descriptions describe the individual service instances that are offered by a service provider, e.g. the set of mobile In the following, we will analyze what is required to make phones offered by a phone seller, service request descrip- detailed consumer feedback meaningful, comprehensive and tions declaratively characterize the set of service instances appropriate to characterize a certain service interaction. We that is acceptable for a consumer. In the service request will demonstrate how semantic service descriptions can be used to elicit feedback that fulfills those requirements. A 1 http://www.epinions.com detailed discussion on how to effectively use the elicited con- 2 http://www.powerreviews.com sumer feedback to enhance SWS retrieval is out of the scope of this paper and is published in [6]. I’m not willing to do that when purchasing a new server for our working group. This aspect should be considered What is required to make consumer feedback appropri- during feedback elicitation. To achieve this, we propose the ate, comprehensive and meaningful. following solution. We assume, that a service request at least covers all service Assume, that given a certain service request, an appro- aspects that are important to the consumer. Potentially, all priate service was selected and invoked and now its suit- service aspects in a request description might be rated by ability has to be judged by the consumer. In a first step, a consumer. In order to be able to exploit these ratings, we utilize the provided service request to determine possible we need to make sure that they are meaningful (i.e., con- feedback structures as defined in the previous section. Sub- tain the rating context, e.g., which product a rating refers sequently, the structure that is most suitable for the user, to) and comprehensive (i.e., contain all relevant aspects, a i.e. in the context of the given request, fits best to the quality rating without information whether the price was ok consumer’s personal abilities and judgment preferences, is is not helpful). In addition, we need to know how different selected and presented to the user. The required knowledge service aspects relate to each other (e.g., how can a rating about the user’s judgment requirements is learned from the about quality be gained from ratings on subaspects such as his behavior in previous judgment sessions. The presented usability and battery capacity?). The challenging question feedback structure represents a careful compromise between is how to fulfill the identified requirements while still being the consumer’s competing judgment requirements and might flexible in the choice of the aspects to rate. be adjusted to his actual judgment needs. This can be done by expanding and/or hiding subtrees of the presented struc- Creating appropriate, comprehensive and meaningful ture. For example, in the structure depicted in Fig. 1, we consumer feedback. might expand the leaf phoneType to judge its subaspects We propose the concept of a feedback structure to deal manufacturer and model. Finally, the user judges all leaf at- with that issue. A feedback structure is a subtree of the tributes of the structure, e.g. by providing a rating. Once, request tree, whose leaves correspond to the aspects that the consumer submits his judgments, the system takes care may be rated by the user. Consider the example request of storing all relevant feedback information and session data depicted in Fig. 1. The dotted part of the tree indicates for future recommendations. In particular, it is recorded a possible feedback structure for that request, where the which and how many service aspects were judged by the aspects price, battery, style, color and phoneType have to be consumer and which service request lead to the judgment. rated by the consumer. Note that this structure contains The acquired information are used later on to identify suit- all information that are necessary to effectively utilize the able feedback structures in future judgment sessions. provided ratings. In particular, it encodes the context of a rating in terms of the path from the request root to the 6.1 Feedback structure suitability rated aspect, the other aspects that were judged and the Given a consumer’s service request, typically many dif- hierarchical relationship between the considered aspects. ferent feedback structures are possible. However, how to To assure that the provided feedback is comprehensive, measure the suitability of each feedback structure to iden- the request subtrees rooted at the feedback structure’s leaves tify one that fits best to the user’s personal abilities and should cover all leaves of the tree. This guarantees that all willingness to provide judgments? We have to consider two service aspects considered in the request description are ei- aspects here. Firstly, comprise the feedback structure leaves ther directly or indirectly (by providing an aggregated rat- of those attributes that the consumer’s is able to judge and ing) judged by the service consumer. The feedback struc- secondly, is the consumer willing to judge all those aspects? ture depicted in Fig. 1 fulfills this requirement and thus is As a measure of a consumer’s willingness and ability to valid. Omitting, e.g., the aspect phoneType would result in judge a certain service aspect, we us the frequency with an invalid structure. Note, that we are still flexible in the which the user judged this aspect in the past. We also con- choice of the attributes to be rated, e.g. we could allow the sider the request context in which an aspect was judged. consumer to provide a single rating for productType instead More specifically, we consider how similar the request that of asking him to judge battery, style, color and phoneType lead to the past judgment is to our request. Let r be the separately. The feedback structure together with the con- service request that was posed by the consumer. Then the sumer provided ratings are propagated to other consumers consumer’s willingness and ability P to judge0 service aspect and might be used to infer knowledge about a service’s suit- a is determined by wa (r) = r 0 ∈Ra sim(r , r), where Ra ability for consumers with other service requirements (see is the set of past service requests that lead to a judgment [6] for details). of a. The value sim(r0 , r) indicates how similar the service requirements encoded in the past request r0 are to those in current request r. A detailed discussion on how compute the 6. RECOMMENDING WHAT TO JUDGE semantic similarity of two requests is provided in Sect. 6.3. The suitability sattributes (f s, r) of a given feedback struc- To ensure feedback quality, the feedback elicitation pro- ture fs is determined by the consumer’s willingness and abil- cess should be assisted and should account for a consumer’s ity to judge its leaf aspects Af s . We propose to compute it judgment preferences such as his willingness to provide rat- as the sum of its leaf attributes’ wi -values. ings as well as his expertise in the considered service domain. P However, those judgment preferences might differ from re- i∈Af s wi (r) quest to request, e.g. I might be an expert in judging the sattributes (f s, r) = P (1) j∈Ar wj (r) quality of Personal Computers, but I do not know that much about servers. As a consequence, I like to/I’m able to judge The term is normalized by dividing it by the sum of the the quality of a purchased computer, in case of a PC, but wj -values of all attributes j ∈ Ar that are contained in the given request r. Hence, sattributes (f s, r) ∈ [0, 1]. Each request node is associated with a list of entries, each To measure a consumer’s willingness to judge k = |Af s | corresponding to one of the feedback structures that are leaf aspects Af s , we compare how similar the past requests possible for the subtree rooted at that node. Let fs be that also led to a judgment of k aspects are to the service one of those structures and let [a, b] be its corresponding request r posed by the consumer. More specifically, the entry. Then a is the number of aspects that have to be suitability snumber (f s, r) of the feedback structure fs with judged in fs and b is sattributes (f s, r), where r is the re- respect to the number of service aspects that have to be quest subtree rooted at the considered node. The algorithm judged is determined by works as follows. First, it initializes each request node’s list with an entry [1, sattributes (f s, r)], where fs is the feed- snumber (f s, r) = sim(Rk , r), (2) back structure comprising only of the node itself and r is where sim(Sk , r) is the mean request similarity of all past the request subtree rooted at the considered node. For an service requests that lead to a judgment of k aspects. In example, consider Fig. 2. The initial entry in each list is cases, where no previous requests lead to a number of k ser- highlighted. The number within each node indicates the vice aspects to be judged, snumber (f s, r) is determined as value sattributes (f s, r), which, for the sake of this example, the mean of sim(Rk0 , r) and sim(Rk00 , r), where k0 is the is arbitrarily chosen. Starting from the request leaves (high- largest k0 < k for which a past request with k0 judgments lighted request nodes), the algorithm recursively computes exists and k00 is the smallest k00 > k for which a past re- lists for all parent nodes. Computing a node’s list is done in quest with k00 judgments exists. In case, k0 /k00 did not ex- three steps. First, the cross product C of the child nodes’ ist, sim(Rk0 , r)/sim(Rk00 , r) was assumed to be 1.0/0.0, i.e. entry sets is computed. For example to determine possible by default feedback structures with a low number of service feedback structures for the product-node (Fig. 2), we have aspects to be judged are preferred. Assuming that sim(x, y) to determine C = [1, 0.2], [2, 0.1] × [1, 0.2], [4, 0.2], [5, 0.2], i.e. is a value from [0, 1], snumber (f s, r) is also from [0, 1]. The the cross product of the price and productType node’s entry overall suitability s(f s, r) ∈ [0, 1] of a feedback structure fs lists. Each element c of C gives rise to an entry [a, b] in the in the context of the posed request r is product-node list, i.e. to a possible feedback structure fs of this node’s subtree. Since a is the number of attributes to s(f s, r) = α · sattributes (f s, r) + β · snumber (f s, r). (3) judge in fs, it is computed as the sum of the a values in c. The suitability b of fs with respect to the selection of at- The parameters α and β with α, β ∈ [0, 1] and α = 1 − β tributes that have to judged is computed as the sum of its determine the influence of the terms sattributes (f s, r) and leaf attributes’ b values (Formula 1), i.e. the sum of the b snumber (f s, r), respectively. The values α and β might vary values in c. In a final step, we prune the computed list. This from user to user. In Sect. 6.4, we will demonstrate how is done by keeping only a single entry [a, b] for each differ- those values can be learned from a consumer’s past judgment ent value of a per node, where b = max{x|[a, x]isinthelist}. behavior. Note, that in doing so, we keep only those feedback struc- 6.2 Determining possible feedback structures tures that have the potential to be optimal and hence reduce the length of the node list to at most l, where l is the num- For a given request, the number of possible feedback struc- ber of leaves of the subtree rooted at the considered node. tures might be high, whereas the number of those that have Finally, we end up with a list for the request root comprising the potential to be optimal (with respect to their suitabil- of entries for all possible feedback structures for the request, ity s(f s, r) for the user) is low. Hence, we require a way that have the potential to be optimal. Those structures are to determine potentially optimal feedback structures effec- compared with respect to their suitability (Formula 3). The tively, i.e. without having to construct all possible struc- most suitable is selected and presented to the user. tures. In the following, we propose an algorithm that per- forms this task. It constructs potentially optimal feedback structures recursively and drops non-optimal partial struc- 6.3 Request similarity tures as soon as possible. Fig. 2 shows how the algorithm As mentioned earlier, a consumer’s judgment preferences depend on the request context, i.e. the kind of service in- {[1,0.3], [2,0.4], [5,0.4],[6,0.4],[3,0.3], [7,0.3]} teraction, that has to be judged. To allow for a compari- 0.0 son of the request contexts, in which judgments have been product made in the past, with the current request, we require a {[1,0.3], [2,0.4], [5,0.4],[6,0.4],[3,0.3], [6,0.3], [7,0.3]} 0.3 measure for the semantic similarity of two requests, i.e. the price productType similarity of the service requirements they encode. In this {[1,0.2], [2,0.1]} 0.2 0.2 {[1,0.2], [4,0.2], [5,0.2]} section, we will propose such a measure. It recursively com- currency amount battery color putes the similarity sim(r, r0 ) of two request trees r and r0 style phoneType by computing the similarity of their root nodes’ ontolog- 0.05 0.05 0.05 0.05 0.05 0.05 ical type (simtype (root(r), root(r0 ))) and direct conditions {[1,0.05]} {[1,0.05]} {[1,0.05]} {[1,0.05]} {[1,0.05]} (simdc (root(r), root(r0 ))) and the aggregated similarity of manufacturer {[1,0.05], [2,0.0]} model their root nodes’ child trees (simattr (root(r), root(r0 ))). More 0.0 0.0 specifically, we define sim(r, r0 ) to be the mean of these three {[1,0.0]} {[1,0.0]} values. In the remainder of the section, we will explain the rationale between those three similarity values and particu- larize on how to determine them. Possible similarity values Figure 2: Determining possible feedback structures sim(r, r0 ) are from the interval [0, 1], where a similarity value of 0.0 means ”not similar at all” and a value of 1.0 means works, exemplary for the service request depicted in Fig. 1. that the service requirements encoded by two requests are identical. tree comprising of a single node, having the most generic type defined for a in the ontology. We illustrate the pro- Determining the type similarity. The type similarity simtype (n, n0 ) ∈ [0, 1] of two nodes n request r: productType and n0 indicates how similar those nodes are with respect MobilePhone to their ontological type. It is defined similar to Jaccard’s index [5], that is often used to compare the sample sets, |An ∩ An0 | battery style phoneType color simtype (n, n0 ) = (4) |An ∪ An0 | Battery MobilePhoneStyle MobilePhoneType Color in {bar, slider} in {silver, black} where An is the set of attributes defined for the type of n and An0 is the set of attributes defined for the type of n0 . ... ... The type similarity simtype (n, n0 ) for the root nodes of the |{battery,phoneT ype,color}| requests depicted in Fig. 3 is |{battery,phoneT ype,color,style}| = request r': productType 0.75. Phone Determining the similarity of the direct conditions. The similarity simdc (n, n0 ) ∈ [0, 1] of two nodes n and n0 battery phoneType color indicates how similar those nodes are with respect to their direct conditions. As mentioned in Sect. 4, direct conditions Battery Color restrict acceptable values of a service attribute. For each in {black} kind of direct condition that might be specified for a certain ... attribute, we define a separate similarity measure. For ex- ample, for direct conditions of type IN {. . .}, the similarity is determined as the quotient of the number of common val- Figure 3: Determining the similarity of two requests ues divided by the number of values that are allowed for n r and r0 or n0 . For direct conditions of type <= x and >= x, the similarity is calculated as min{x, y}/ max{x, y}, where x is cedure for the root nodes of the two request fragments de- the upper/lower bound for the values of n and y for those of picted in Fig. 3. The type of r’s root node is MobilePhone n0 . Accordingly, if only one of the nodes specified a certain and that of r0 ’s root node is Phone. Assume, that the on- type of direct condition, the similarity is defined to be 0.0 tology defines the attributes battery, phoneType and color and if both nodes do not specify any direct conditions, the for the type Phone and an additional attribute style for similarity is defined to be 1.0. the type MobilePhone, which is a subtype of Phone. The As an example, consider again the requests depicted in similarity simattr (n, n0 ) of the requests’ root nodes n and Fig. 3. The Color-nodes both specify a direct condition n0 is determined by the similarity of their corresponding of type IN {. . .}. The similarity simdc (ncolor , n0color ) with child trees for the attributes A = {battery, phoneType, color, respect to this direct condition is 1/2 = 0.5. The Bat- style}. The attributes battery and color are specified in both 0 tery-nodes both do not specify any direct conditions, hence requests, hence the similarity values sim(rbattery , rbattery ) simdc (nbattery , n0battery ) = 1.0. 0 and sim(rcolor , rcolor ) can be computed by determining the request similarity for the request subtrees rooted at the Determining and aggregating the similarity of the root Battery-nodes and the subtrees rooted at the Color-nodes. nodes’ child trees. The attribute style is only defined for the type MobilePhone, 0 The similarity value simattr (n, n0 ) ∈ [0, 1] indicates how hence sim(rstyle , rstyle ) = 0.0. The attribute phoneType is similar two nodes n and n0 are with respect to their child defined for both types, MobilePhone and Phone, but only 0 trees. Let A be the set of attributes defined either for specified in r. Hence, rphoneT ype has to be replaced by a the type of n, the type of n0 or for both types and let node t0 having the most generic type defined for the at- {sim(ra , ra0 )|a ∈ A} be the similarity values for correspond- tribute phoneType. Let PhoneType be this type. This means ing attribute subtrees ra and ra0 of n and n0 . Again, in- that the type of the node that describes the attribute phone- spired by Jaccard’s index, the aggregated similarity of two Type has to be PhoneType or one of its subtypes. Presume nodes’ child trees is defined as the sum of the similarity val- that MobilePhoneType is a subtype of PhoneType. The simi- 0 ues {sim(ra , ra0 )|a ∈ A} divided by the sum of the maximal larity sim(rphoneT ype , rphoneT ype ) is determined by comput- 0 similarity values that can be achieved for each attribute, i.e. ing sim(rphoneT ype , t ), where rphoneT ype is the subtree of r |A|. rooted at the MobilePhoneType-node. P 0 a∈A sim(ra , ra ) 6.4 Dynamically adjusting α and β simattr (n, n0 ) = (5) |A| As discussed earlier, the parameters α and β that weight Since attributes in A are not necessarily defined for both, the the influence of the terms sattributes (f s, r) and snumber (f s, r), type of n and n0 , we set sim(ra , ra0 ) = 0.0, if the attribute might vary from user to user. In this section, we will demon- a is not defined for one type. Attributes in A might also strate how those values can be learned from a consumer’s not be specified in one or both of the nodes. If an attribute past judgment behavior. Initially, i.e. without having in- a is not specified in both nodes, we set sim(ra , ra0 ) = 1.0, formation about a user’s previous judgment behavior, we else, if a is specified in just one of the nodes, sim(ra , ra0 ) do not know anything about those parameters’ values, so α is defined to be sim(ra , t0 ) resp. sim(t, ra0 ), where t is a could be any value from the interval [0, 1] and β = 1 − α. Hence, for the purpose of computing the suitability Test runs and results. s(f s, r) of possible feedback structures, we set α to the We performed test runs with different judgment prefer- midpoint of this interval, i.e. α = 0.5 = β. Once hav- ences and different sets of requests that were posed during ing determined the most suitable feedback structure fs, we a sequence of sessions. In a first series of tests, the requests present it to the consumer, who has the opportunity to within each sequence of sessions were different, but chosen change it by expanding/collapsing nodes. Finally, the con- from a single (computer) category, e.g. just notebook re- sumer provides judgments for the resulting structure’s leaf quests. This test setting served as a baseline and was chosen nodes. Obviously, the resulting feedback structure fs’ was to evaluate the performance of our approach in the absence more suitable to the user than the structure fs that was of any context effects. We performed three kinds of tests recommended. Hence, we conclude that s(f s0 , r) should be differing in the judgment preferences of the judging user. larger than s(f s, r). Using Formula 3, we get that s(f s, r) − In test A1, the consumer always judged a certain number snumber (f s0 , r)/sattributes (f s0 , r) − snumber (f s0 , r) < α for of aspects. However, the types of aspects that were judged sattributes (f s0 , r) > snumber (f s0 , r) and > α for sattributes (f s0 , r) differed. In test A2, the user judged a different number of < snumber (f s0 , r). Using those information, we can adjust, attributes during each session, but required that the set of i.e. shrink the range of α correspondingly. For example, if attributes to judge contained a certain set of attributes. For we get α < 0.8, we adjust the interval to [0, 0.8). In case, example, a user might require to always judge the price of the consumer’s judgment behavior is inconsistent, e.g. hav- a product, but is also willing to rate other service aspects. ing α ∈ (0.5, 0.7), we get α < 0.8, we simply ignore those Finally, we performed a test A3, were the consumer had spe- information. To ensure, that the most recent information cific requirements on both, the number and kind of aspects have the most influence, we process session data in the or- to judge. The tests A1-A3 were performed with request der of increasing age. sets from different categories. The plot depicted in Fig.4 (A2) is representative for all test runs and all types of tests in this series. It shows the results for test A2 performed 7. EVALUATION with requests from the category digital watches. As can be In the evaluation of our approach, we wanted to find out seen, the adaptation of the recommendation algorithm to how fast the recommendation algorithm proposed in Sect. 6 the consumers judgment preferences is very fast. The initial adjusts to different judgment preferences. edit distance decreases to 0 after just one session. This is due to the fact, that request similarity does not play a role Test setting. in those tests and hence the values of α and β can be arbi- For that purpose, we created a set of DSD service requests trarily chosen. The depicted behavior was observed for all covering typical requirements of consumers looking for com- three kinds of tests (A1-A3). puter items from different categories, such as desktop PCs, PDAs, servers, notebooks or organizers. For our tests, we A2 created 48 service requests, 6 per category. Requests within 4 edit distance each category varied in the selection of attributes that were 3 specified and in the range of attribute values that were ac- ceptable for the user. All request types shared common at- 2 tribute types, e.g. for all kinds of requests an attribute color 1 and an attribute price could be specified. Using this requests we performed several tests with a sin- 0 gle test user. The basic procedure for each test was as fol- 1 5 10 lows. Starting with no information about previous judgment session number behavior, several judgment sessions were performed. Dur- ing each session, one of the 48 requests was selected. After B1 that, the system proposed a feedback structure using the al- 4 edit distance gorithm proposed in Sect. 6 with knowledge about the user’s 3 judgment behavior in the previous judgment sessions. After being provided with the recommended feedback structure, 2 the user had the opportunity to change this structure. For 1 that purpose, the consumer was allowed to expand/collapse feedback structure nodes. By clicking on a particular node, 0 all its direct children were expanded/collapsed. The quality 1 5 10 of the proposed feedback structure was measured as the edit session number distance between the proposed feedback structure and the actual feedback structure that was used. More formally, we Figure 4: Results of the tests A2 and B1 counted the number of expand/collapse operations the user had to perform to get the structure whose leaves he finally In a second series of tests, we evaluated how fast the pro- judged. The rationale behind this measure is, that the edit posed recommendation algorithm adjusts to a consumers distance is a direct measure of the users effort to get to the judgment preferences, if those depend on the request con- desired structure and thus, in our opinion, is a good measure text. For that purpose, we performed judgment sessions, for the quality of the recommended structure. For each of were the user posed requests from different categories and the test, we looked at whether and how fast the edit distance exhibited a different judgment behavior for each category. decreased with the number of judgment sessions. We run three types of tests. In test B1, similarly to test B2 can be identified. 4 edit distance 3 8. CONCLUSION In this paper, we demonstrated how detailed consumer 2 feedback, that is meaningful and appropriate in the context 1 of a service interaction, can be elicited and how users can be supported in that process. Our main contribution is an al- 0 gorithm that suggests service aspects that might be judged 1 5 10 by a consumer. Our evaluation results show, that the pro- session number posed procedure effectively adjusts to a user’s ability and willingness to provide judgments. B3 4 edit distance 3 9. REFERENCES [1] G. Adomavicius. Incorporating contextual information 2 in recommender systems using a multidimensional 1 approach. ACM Transactions on Information Systems, 23(1):103, 2005. 0 [2] R. M. Dawes. The robust beauty of improper linear 1 5 10 models in decision making. American Psychologist, session number 34(7):571–582, 1979. [3] D. Fensel, H. Lausen, A. Polleres, J. de Bruijn, Figure 5: Results of the tests B2 and B3 M. Stollberg, D. Roman, and J. Domingue. Enabling Semantic Web Services: The Web Service Modeling Ontology. Springer, 2007. A1, the user always judged a particular number of aspects. [4] Y. K. Gediminas Adomavicius. New recommendation However, this number differed for each request category. For techniques for multi-criteria rating systems. IEEE example, a user might always judge 3 aspects when asking Intelligent Systems, 22(3), 2007. for desktop PCs, but is willing to judge 5 service aspects, [5] P. Jaccard. Étude comparative de la distribution when asking for notebooks. Analogously to test A2, the florale dans une portion des alpes et des jura. Bulletin user in test B2 required the set of aspects to be judged to de la Société Vaudoise des Sciences Naturelles, contain a particular set of aspects. However, this set var- 37:547–579, 1901. ied for different request categories. Finally, we performed [6] F. Klan and B. König-Ries. Enabling trust-aware a test B3, were the consumer had specific requirements on semantic web service selection - a flexible and both, the number and kind of aspects to judge. Those re- personalized approach. Jenaer Schriften zur quirements were different for each request category. Fig. 4 Mathematik und Informatik, Math/Inf/02/10, (B1) exemplary shows the results for tests of type B1. In Friedrich-Schiller-University Jena, August 2010. the depicted test, we alternated sessions based on a request [7] U. Küster, B. König-Ries, M. Klein, and M. Stern. for a desktop PC (continuous line), where the user judged Diane - a matchmaking-centered framework for 11 service aspects, with those based on a request for a PDA automated service discovery, composition, binding and (dotted line), where the consumer judged only one aspect. invocation. In Proceedings of the 16th International As can be seen, the adjustment to the consumer’s judgment World Wide Web Conference (WWW2007), Banff, preferences for PDAs takes 3 sessions. This is due to the Alberta, Canada, May 2007. fact, that at the beginning both terms sattributes and snumber [8] U. S. Manikrao and T. V. Prabhakar. Dynamic are equally weighted. Since for desktop PCs many aspects selection of web services with recommendation system. are judged and since most of those aspects are also shared In Intl. Conf. on Next Generation Web Services by PDA requests, term sattributes dominates the suitability Practices, pages 117–121, Washington, DC, 2005. value and thus favors improper feedback structures. This IEEE Computer Society. changes when α and β adjust over time. Fig. 5 (B2) ex- [9] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. emplary shows the results for tests of type B2. Again, we Collaborative filtering recommender systems. In alternated desktop PC requests with those for a PDA. While P. Brusilovsky, A. Kobsa, and W. Nejdl, editors, The when judging desktop PCs, we had a set of two aspects that Adaptive Web: Methods and Strategies of Web had to be judged in any case, it was only one specific aspect Personalization, volume 4321 of Lecture Notes in when judging PDAs. Again, it required 4 sessions to adjust Computer Science, pages 291–324. Springer, Berlin, α and β appropriately. Finally, Fig. 5 (B3) exemplary shows Heidelberg, 2007. the results for tests of type B3. In this test, we alternated three types of requests (desktop PC, PDA and digital watch [10] P. Slovic. Limitations of the Mind of Man: requests). As can be seen, the algorithm propose appro- Implications for decision making in the nuclear age. priate feedback structures after just 1 session of each type. Los Alamos Scientific Laboratory, 1972. This is due to the fact, that for the three request types, the [11] H. C. Wang, C. S. Lee, and T. H. Ho. Combining consumer’s judgment behavior differed much in terms of the subjective and objective qos factors for personalized number and types of aspects to be judged. Hence, though web service selection. Expert Syst. Appl., α and β are not yet adjusted, the correct feedback structure 32(2):571–584, 2007.