WISM'06 1077 An Efficient Implementation of a Rule-based Adaptive Web Information System Davide Valeriano, Roberto De Virgilio, Riccardo Torlone, and Daniele Di Federico Dipartimento di Informatica e Automazione Università degli studi Roma Tre {valeriano,devirgilio,torlone,difederico}@dia.uniroma3.it Abstract. Mobile devices provide a variety of ways to access informa- tion resources available on the Web and a high level of adaptability to different aspects (e.g., device capabilities, network QoS, user preferences, location, an so on) is strongly required in this scenario. In this paper we propose a technique for the efficient adaptation of Web Information Sys- tems. The approach relies on special rules that allows us to specify, in a declarative way, how to generate, in an automatic way, the adaptation specifications that satisfy the requirements of a given context. Rules are classified in a set of clusters and a matching relation between clusters and context requirements is defined. The technique of rule evaluation and clustering guarantee that different contexts and orthogonal requirements of adaptation, possibly not fixed in advance, can be efficiently organized and taken into account in the adaptation process. 1 Introduction Modern Web Information Systems (WIS) are challenged to generate (automat- ically) a hypermedia presentation built from information that is gathered from different, possibly heterogeneous, sources that are distributed over the Web. Nowadays, given the spread of mobile devices, a novel and fundamental require- ment of these systems is the ability to adapt and personalize content delivery according to the context of the client. An important point that needs to be taken into account is that this scenario is very dynamic: some aspects of the context can dynamically change. For instance new users with new preferences can be enabled to access the information system, possibly unpredicted types of access devices can be used, alternative network connections can be made available, fur- ther aspects of the context (like the location or others environmental factors) need to be incorporated. Also, the technology used to implement the adaptation can be changed and this should have a limited impact on the overall application. It follows that the adaptation process should guarantee a high level of flexibility in terms of responsiveness to rapidly changing requirements of adaptation and suitable actions to be undertaken to meet these requirements. In the literature, some adaptation approaches rely on the definition of rules to generate suitable adaptation specifications [2, 3, 10]. In [5] we have presented 1078 Web Information Systems Modeling a general methodology for context adaptation, based on special rules that model the adaptation at an abstract level, inspired by the approaches cited above. The rules specify, in a declarative way, how to generate, in an automatic way, the adaptation specifications that satisfy the requirements of a given context. A relevant problem is to rapidly select and activate the rules that best fit the re- quirements of adaptation of the context (in particular when the context changes frequently). In this paper we present an efficient implementation to optimize the activation process of adaptation rules. To this aim, we introduce a clustering technique to classify the adaptation specifications. The rest of the paper is organized as follows. Section 2 introduces the rule- based approach. In Section 3, we illustrate how to define the set of clusters and the process to efficiently activate a rule, using the clusters. In Section 4 we present a practical implementation and experimental results and finally, in Section 5 we draw some conclusions and sketch future works. 2 A Rule-based approach to model adaptation requirements In this section we present special rules to specify, in a declarative way, how to generate, in an automatic way, adaptation specifications. The rules are based on two basic notions: the generic profile and the abstract configuration. 2.1 Generic profiles A (generic) profile is a description of an autonomous aspect of the context in which the Web site is accessed (and that should influence the presentation of its contents). Examples of profiles are the user, the device, the location, and so on. A dimension is property that characterizes a profile. Each dimension can be conveniently described by means of a set of attributes. Attributes can be simple or composite. A simple attribute has a value associated with it, whereas a complex attribute has a set of (simple or complex) attributes associated with it. In this model, a context is a collection of profiles. Figure 1 reports a graphical example of profiles. In our model, different profiles over the same dimensions can P1 P2 P3 Hardware Software Category Hardware Software Category Category cpu display memory os version type family model display type type OS width height width height profile dimension complex attribute simple attribute Fig. 1. An example of profiles WISM'06 1079 be compared making use of a subsumption relationship C. Intuitively, given two profiles P1 and P2 , if P1 C P2 then P2 is more detailed than P1 since it includes the attributes of P2 at the same or at coarser level of detail. More precisely, we first say that an attribute A covers another attribute B if either they are simple and A = B or they are composite and for each sub-attribute Ai of A there is a sub-attribute Bj of B such that Ai covers Bj . The subsumption relationship is then defined as follows. Definition 1 (Subsumption) Given two profiles P1 and P2 , we say that P1 is subsumed by P2 , P1 C P2 , if for each dimension d of P1 there is a dimension d’ of P2 such that for each attribute A of d there is an attribute A’ of d’ covered by A. As an example, given the profiles reported in Figure 1 we have that P2 C P1 and P3 C P2 . 2.2 Abstract configurations We introduce the notion of (abstract) configuration as a triple C = (q, h, s) where: – q is a query over the underlying database expressed in relational calculus, a declarative and abstract language for relational databases ([1]); – h is an abstract hypertext definition expressed in WebML ([4]), a conceptual model for Web application which allows us to describe the organization of Web pages in a tight and concise way, by abstracting their logical features; – s is presentation specification expressed in terms of an original notion of logical style sheet, which is a set of tuples over a predefined collection of Web Object Types (WOTs) like text, image, video, form and so on; each WOT τ is associated with a set of presentation attributes: they identify possible styles (e.g. font, color, spacing, position) that can be specified for τ . An example of configuration is reported below. q = {T : x1 , S : x2 , D : x3 , C : x4 , N : x5 | NewsItems(N : x0 , T : x1 , S : x2 , D : x3 , G : x6 , J : x7 ), Details(NK : x0 , P : x8 , C : x4 ), Newspapers(J : x7 , N : x5 , C : x9 ), x6 = Sport} h= IndexUnit NewsIndex ( source News Items; attributes Title, Date; orderby Date ) DataUnit ContentData ( selector Item = CurrentItem; attributes Title, Date, Content) link NewsToDetails ( from NewsIndex to ContentData parameters CurrentItem = NK) link ContentToRestOfContent ( from ContentData to ContentData parameters CurrentItem = NK) s= Text( Font: Arial, ..., Border: 0pt) Link( Note: False, ..., Color: Blue) Image( Format: jpeg, ..., Alignment: left) 1080 Web Information Systems Modeling It is important to note that a configuration is indeed a logical notion that can be represented and implemented in several ways and with different syntaxes. This property guarantees the generality of the approach with respect to actual languages and tools used to implement the adaptive application. For instance, we can implement a configuration using SQL at the content level, XHTML at the navigation level and a set of CSS files at the presentation level. 2.3 Adaptation Rule The matching relationship between configurations and profiles is represented by means of a novel notion of adaptation rule. An adaptation rule has the form Pr : Cd ⇒ Cf where: • Pr is a parametric profile, that is, a profile in which parameters can appear in place of values, • Cd is a condition, made out of a conjunction of atoms of the form AθB or Aθc, where A and B are parameters occurring in Pr , c is a constant value, and θ is a comparison operator, • Cf is a parametric configuration in which parameters occurring in Pr can appear in place of values. The intuitive semantics of an adaptation rule is the following: if the client has a profile Pr and the condition Cd is verified then generate the configuration Cf . An example of adaptation rule for a device profile with the hardware dimen- sion H is the following: H (ImgCapable : X , ScreenSize : Y , ColorCapable : Z ) : X = false, Y < 1500 ⇒ {q, h, s} where X, Y , and Z are the parameters of H and {q, h, s} is the following para- metric configuration: q = { T : x1 , S : x2 , D : x3 , C : x4 | NewsItems(N : x0 , T : x1 , S : x2 , D : x3 , G : x5 , J : x6 ), Details(NK : x0 , P : x7 , C : x4 )} h = Page NewsPage ( units NewsIndex ) Page ContentPage ( units ContentData ) IndexUnit NewsIndex ( source News Items; attributes Title, Date; orderby Date ) DataUnit ContentData ( selector Item = CurrentItem; attributes Title, Date, Content) link NewsToDetails ( from NewsIndex to ContentData parameters CurrentItem = NK) s = Text( Color: Black, Size: 8pt) Link( Style: Underline, Color: Black) Image( Size: Y × 0, 5, Color: Z) WISM'06 1081 Note the use of the parameters Y and Z of H to resize the images and to eliminate/maintain colors. A profile P activates a rule Pr : Cd → Cf if Pr C P . Hence, a profile P can activate a rule for a profile that is more general than P . Now, let R be a rule Pr : Cd → Cf activated by a profile P and let C be the configuration obtained from Cf by substituting the parameters of Pr with the actual values occurring in P : we say that C is generated by P using R. 3 An optimization strategy We assume to have at disposal an initial set of rules SR . In an adaptability scenario, context changes generate events that produce one or more profiles (instances). Each profile (instance) can activate a rule of SR . We should optimize the research to select and activate efficiently the most appropriate rules and generate the most suitable configuration. So we organize SR classifying the rules in a set of clusters. Each cluster represents a class of adaptation that matches the requirements of a class of context. 3.1 A distance between profiles We define a distance that allows to compare profiles. We take advantage of the tree structure of the generic profile. We get inspiration from the Tree Edit Dis- tance [9], where the distance between two trees T1 and T2 , that we indicate as dted (T1 , T2 ), depend on the sequence of operations (chosen in a limited prede- fined set) that transforms T1 into T2 . We assign a fixed cost to each operation: the distance between the considered trees is the sum of essential costs for the transformation. For our case, let’s consider the set of operations edit, remove and add, respectively to modify the content of a node, to delete a node and to create a new node. In our tree structures, we distinguish two principal types of node: dimension and attribute. We have operations to modify, remove or create dimensions and attributes and we indicate the cost of an operation with γ. Definition 2 (S-derivation) Given two trees T1 and T2 , S = s1 , ... , sk the sequence of operations to transform T1 in T2 , an S-derivation between T1 and T2 is a sequence U0 , ... , Uk of trees resulting by applying the single Pi=k operation, where U0 = T1 and Uk = T2 , and the cost γ(S − derivation) is i=0 γ(si ). So we can intuitively define the distance between two profiles P1 and P2 , the minimum cost for a S-derivation between them. Definition 3 (Distance between profiles) Given two profiles P1 and P2 , we define the distance between them as dted (P1 , P2 ) = min{γ(S − derivation) | S- derivation from P1 to P2 } 1082 Web Information Systems Modeling 3.2 Definition of Clusters Let’s represent a cluster as a tree where the nodes are profiles. The root repre- sents the minimum set of information that nodes of the tree share. Basically a client is identified by the profile sent to the system. Let’s remember that a profile is principally described by dimensions, that characterize the main information, articulated in attributes. So, the root of a cluster is a profile characterized only by dimensions. It will contain the minimum information that several profiles has to include to be in the same cluster. Given a profile P, pruned by all attributes, we can build several beginner clusters, rooted with profiles generated from P, on the number and type of dimensions. However the designer can decide the detail level for the roots. For instance a designer can decide to articulate some dimensions of a root with attributes, to extend the level of detail of the minimum information to share. We define a cluster as follows Definition 4 (Cluster Tree) A cluster CL is a couple {NCL , ECL } where NCL is the set of nodes (profiles) and ECL is the set of edges (ni , nj ) such that ni / nj We can relate a node Pr of a cluster with the rule Pr : Cd → Cf in an Hash table using Pr as access key. Given the set of roots Rt = {Rt1 , ... ,Rtn } and the set of rules SR , we initialize a set of clusters CL = {CL1 , ... ,CLn } as follows: • we set CLi = {{Rti },∅}, • ∀ (Pr : Cd → Cf ) ∈ SR , (i) we search a CLi such that Rti C Pr , and we navigate the cluster in depth until level k where doesn’t exist any node subsumed by Pr , (ii) we search at level k −1 the node N d such that N d C Pr and if there are more nodes N d such that N d C Pr we choose the node with lowest distance, (iii) we add Pr to NCLi and (N d, Pr ) to ECLi , (iv) if there are one or more nodes N di such that Pr C N di then we delete any edge of type (N dj ,N di ) in ECLi and add to it the edge (Pr , N di ), • for each node of a cluster we add the relative rule to the Hash table. 3.3 The activation process Built the set of clusters, now we can perform an optimization strategy to activate a rule in an efficient way. Given the set of rules SR , the set of clusters CL = {CL1 , CL2 , ..., CLn }, the Hash table Ht and a profile instance PI of a profile schema PS , we produce a set of configurations as follows: 1. we initialize P 0 with PS ; 2. we search the cluster CLi in CL, with a root Rti such that Rti C P 0 and that has the lowest distance from it; 3. (i) we navigate in depth the cluster until level k where any node, that is subsumed by P 0 , doesn’t exist; (ii) we search at level k − 1 the node N d such that N d C P 0 and if more nodes are subsumed by P 0 we choose the one with WISM'06 1083 lowest distance; (iii) we extract the rule Rl , related in the Ht to N d, and if the condition is satisfied, we activate Rl and instance the configuration with the value of PI ; (iv) we update P 0 as P 0 - N d; 4. we iterate from step (2.) until P 0 is not empty and a rule that can be activated by P 0 exists; 5. if no rule can be activated in this way, the system will return a default configuration. 3.4 An example of interaction between profiles and clusters Let’s assume a client sending a profile PI (see Figure. 2(a)), instance of a profile schema P to the system. Assume also that the system has already initialized its own set of clusters, composed by clusters C1 and C2 (see Figure 3). We easily see that the root of the cluster C2 is the one which subsumes P and has the lowest distance from it. So we navigate that cluster and note that the only child N of the root it is equal to P ; then we extract the rule Rl related in the Ht to N and if the condition is satisfied, we activate it and instance the configuration with the values of PI . Subsequently, assume that a new profile PI0 (see Figure 2(b)), instance of a profile P 0 , is sent by another client to the system. At the same way explained before, we see that the root of the cluster that subsumes P 0 and has the lowest distance from it, is cluster C1 root. Navigating that cluster, we need to confront P 0 with root’s two children. We easily see that no one of them is equal to or has a subsumption relation with P 0 : being their parent a root, is not possible to activate its rule; so, as we explained before, the system will return a default configuration. Device Device Hardware Hardware Software display display OS width height width height Win 800 640 800 640 (a) Profile PI (b) Profile PI0 Fig. 2. Profiles sent to the system. 1084 Web Information Systems Modeling Device Device Hardware Software Hardware Device Device Device - Hardware Software Hardware Software Hardware - Pr1:Cd1 -> Cf1 - display OS Version CPU OS Version display - - width height width height Device Device Hardware Software Hardware display CPU OS display Version CPU width height width height (a) Cluster C1 (b) Cluster C2 Fig. 3. Cluster Set 4 A practical implementation 4.1 Benefits of Clustering We can easily see that using this clustering algorithm we can reduce the number of accesses to SR . In our system, we have seen that a profile given by a client can activate a rule in the default set. So when a client sends a profile to the system it has to search for the most suitable rule in SR . We can assume that the worst case is when the system has the greatest possible number of default rules, that is when the system has in SR one rule for each possible profile that a client can send. Without using clusters we should execute a number of ordered accesses equal to the total number of possible profiles. It is easy to see that the number of possible profiles is the number of profiles obtained with every possible different combina- tion of dimensions and attributes. Assuming to have a complete schema with n dimensions and m attributes for each dimension, Pn ¡ ¢ we can see that the number of different combinations of dimensions is i=1 ni while the number of different Pm ¡ ¢ combinations of attributes for each single dimension is j=1 m j . Combining these two results and looking carefully at our structure, we see that the number Pn hPm ¡m¢i(i−1 n ) of different profiles is equal to i=1 j=1 j (n − i + 1). If we use the clustering algorithm, in this case we will obtain a number of clusters equal to the number of different combinations of dimensions, seen above. To access the right clusters, the system has to access to all of their roots and then find Pthe right way n £¡ ¢¤ on the selected tree. So it can ignore a number of trees equal to i=1 ni + 1. We can see that in this situation the worst case is when the root of the selected WISM'06 1085 cluster has the maximum number n of dimensions, because it would be the tree with the maximum number of nodes. The remaining clusters will contain a num- Pn−1 hPm ¡m¢i(i−1 n ) ber of profiles equal to i=1 j=1 j (n − i + 1). These profiles will be ignored by the system, which has only accessed the roots of their clusters, that are not profiles. So, counting also these accesses, we find that at least we obtain to Pn−1 hPm ¡m¢i(i−1 n ) Pn ¡ ¢ save a number of accesses equal to i=1 j=1 j (n − i + 1)− i=1 ni . This number doesn’t take count of the fact that the system won’t access all the nodes of the selected cluster, but only some of them, depending on its structure. Anyway it gives evidence that, using this clustering algorithm, it is possible to obtain a certain benefit in terms of number of accesses. 4.2 Experimental results We have designed and developed a practical implementation to test the our ap- proach. The system relies on an RDF database of contents and the selection of data is done by performing RDF queries expressed in SPARQL [8]. We use an RDF(S) representation for schema level and RDF for instance level. The im- plementation makes use of Jena [6], a Java framework for building RDF based applications. Each rule is implemented in terms of Inference Rule of Jena [7]. Using the Inference Engine of Jena, the system generates a configuration that infers the adaptation on RDF(S) documents. Finally the system returns several response as output, using XSLT transformations on RDF instances. We have tested our system to experiment the effectiveness and the efficiency of the ap- proach illustrated in this paper in several practical cases. On the server side, we used an IBM computer xSeries 225, equipped with a Xeon2 2.8Ghz processor, a 4 GB RAM, and a 120 GB HDD SCSI. The Web site have been accessed by three different types of devices: a mid-range desktop, several PDAs with differ- ent capabilities, and some cellular phones. Figure 4 reports the average response time (vertical axis) for each device type (horizontal axis). The Figure shows promising times to produce the adapted response. The Desktop client presents an average time of 72 ms, the PDA client 97 ms and the Wap client 121 ms. The diagram compares the medium response time in absence and in presence of clusters. In particular the system, using clusters, capitalizes the approach for mobile devices, almost halving the response time. For instance the Wap client benefits from 121 ms to 59 ms. 5 Conclusions and future works In this paper we have presented a novel rule-based approach supporting the automatic adaptation of content delivery in Web Information Systems. This problem arises in common scenarios where the information system is accessed, in different contexts, by a variety of mobile devices. The adaptation is achieved automatically by means of special rules that specify, in a declarative way, how a configuration can be generated to meet the requirements of adaptation of a 1086 Web Information Systems Modeling List of Rules Clusters of Rules 140 medium response time (ms) 120 100 80 60 40 20 0 desktop pda wap devices Fig. 4. Results of experiments given profile. Finally the organization of rules in a set of clusters guarantees the responsiveness of the system in a dynamic scenario. Different contexts and or- thogonal requirements of adaptation, possibly not fixed in advance, can be taken into account in this process. The results presented in this paper are subject of further conceptual and practical investigation. From a conceptual point of view, we are currently investigating the expressive power of rules and the generality of clusters. In particular we are improving the distance between profiles, using a comparison between common pathes. From a practical point of view we are improving the efficiency of the system and we are extending its functionality. References 1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. 2. C. Bettini, D. Maggiorini, and D. Riboni. Distributed context monitoring for con- tinuous mobile services. In IFIP TC8 Working Conference on Mobile Information Systems (MOBIS), 2005. 3. S. Ceri, F. Daniel, V. Demaldé, and F. M. Facca. An approach to user-behavior- aware web applications. In 5th International Conference on Web Engineering (ICWE), pages 417–428, 2005. 4. S. Ceri, P. Fraternali, A. Bongio, M. Brambilla, S. Comai, and M. Matera. Design- ing Data-Intensive Web Applications. Morgan Kaufmann Publishers Inc., 2002. 5. R. DeVirgilio, R. Torlone, and G.-J. Houben. A rule-based approach to content delivery adaptation in web information systems. In 7th International Conference on Mobile Data Management (MDM’06) - to appear -, 2006. 6. HP-Labs. Jena. http://jena.sourceforge.net/, 2004. 7. HP-Labs. Jena 2 inference support. http://jena.sourceforge.net/inference/, 2005. 8. Hp-Labs. Sparql, query language for rdf. www.w3.org/TR/rdf-sparql-query/, 2005. 9. C. Isert. The editing distance between trees. Technical report, Ferienakademie, for course 2: Bume: Algorithmik Und Kombinatorik, 1999. 10. H. Kiyomitsu, A. Takeuchi, and K. Tanaka. Activeweb: Xml-based active rules for web view derivations and access control. In International workshop on Information technology for virtual enterprises (ITVE01), pages 31–39, 2001.