An Agent-Oriented Personalized Web Searching System Tarek Helmy Satoshi Amamiya Tsunenori Mine Makoto Amamiya Department of Intelligent Systems Kyushu University 6-1 Kasugakoen, Kasuga-shi Fukuoka 816-8580, Japan Tel: 81-92-583-7615, Fax: 81-92-583-1338 [helmy,roger,mine,amamiya]@al.is.kyushu-u.ac.jp Abstract. Web retrieval is now one of the most important issues in computer science, and we believe that applying multi-agent systems to this area is a promising approach. We introduce Kodama1 system, which is being developed and in use at Kyushu University, as a multi-agent-based approach to build a distributed Information Retrieval (IR) system that lets users retrieve relevant distributed information from the Web. We reported methods to agentify the Web, and to cluster the agentified domain into communities. In order to investigate the performance of our system, we carried out several experiments in multiple Server Agent domains and developed a smart query routing mechanism for routing the user’s query. The results ensure that the idea of Web page agentification, clustering and routing techniques promise to achieve more relevant information. 1. Introduction With the exponentially growing amount of information available on the Internet, the task of retrieving relevant information consistent with the user’s information need has become increasingly difficult and the users normally face with very large hit lists with low precision. The information gathering and retrieving processes in the traditional search engine are independent of the User’s Preference (UP), and therefore feedback from the later process is hardly adaptive to improve the quality of the former process. Kodama project starts in response to the need for a new kind of agent-oriented IR system that is completely different from the traditional search engines populated on the Internet. Researchers in Artificial Intelligence (AI) and Information Retrieval (IR) fields have already succeeded in developing multi-agent based techniques to automate the management of information flooding [1]. A way to partially address the scalability problems posted by the size and dynamic nature of the Web is to divide the Web into localized Sever Agents (SA) that agentify specific domains by a set of Web Page Agents (WPA) developed in our project [2,3,4]. We will start by describing the agentification of the Web servers and discuss the methodologies of clustering the agents into communities. We introduce the routing mechanism of Kodama and the evaluation of Kodama agents. 2. Web Server Agentifcation Cooperating intelligent Kodama agents are employed to agentify the Web servers where the infrastructure is preexisting in the form of Web links. Kodama system uses three types of agents in the agentification mechanism for searching the Web. A SA assigned to each Web server, a WPA assigned to each Web page, and a User Interface Agent (UIA) assigned to each user’s machine [2,3,4]. The SA starts from the portal address of the Web server and creates the hyper structure of WPAs communities based on the hyper link structure in the Web server, (see Figure 1). The SA 1 Kyushu University Open Distributed Autonomous Multi-Agent 113 knows all the WPAs in the server and works as a gateway when the WPAs communicate with each other or with one in another server. The SA clusters the WPAs into communities and automatically defines its attributes to be used in the routing mechanism. The WPA analyzes and continually keeps track of the content of its Web page. Each WPA has its own parser, to which the WPA passes a URL, and an IP vector, in which the WPA keeps all the policy keywords found in its URL. The WPA takes essential properties and principles given by the SA to create the IP as ontology that represents the context of the Web page. The created WPAs register themselves to the SA and write all the words into an IP file. An SA of n Web pages creates one IP file in which the terms of the Web pages are represented as n vectors of keywords with a weight value assigned to each keyword and modified according to the user's responses ( ℜ ). The WPA uses the IP to decide whether or not the user’s query (qi) belongs to the WPA. At the retrieval phase, when the WPAs receive a qi from a SA, initiate search by interpreting the qi and/or either asking “Is this yours?” or announcing “This is yours,” to its down-chain WPAs. The selected WPAs and/or their down-chain WPAs of each Web server, which in turn, interpret the qi according to their IPs and reply the answer “This is mine” or “Not mine” with some confidence. The UIA is designed to learn the UP either implicitly or explicitly from his/her browsing history. We have developed Kodama's browser and investigated some sensors in correlation with the time of visiting the page to let the UIA detects autonomously the actual user's implicit response. The UIA resides in the user's machine, communicates with the WPAs via a SA to retrieve information relevant to the qi, and shows the results returned by the WPAs to the user after filtering and re-ranking them. The UIAs in Kodama system look over the shoulders of the users, receive ℜ of his/her interest/not interest to the results and regard them as rewards to adapt the UP files. The UIA uses UP to predict a user's action based on the similarity of the current query to the already learned UP. Fig. 1. The hierarchical structure of an agentified domain 3. Creating the Agents Communities While agentifying the Web server and creating a WPA for each Web page based on the pre- existing hyper link structures, the SA will add to its known SAs table all the portal addresses of the external links, which point to other portals. This means that, the SAs community will be created automatically while the agentification process. For instances, while agentifying the Web server of Kyushu University http://www.kyushu-u.ac.jp/, if there is an external link, i.e. http://www.osaka-u.ac.jp/, in one of the Web pages of Kyushu University's Web server, then this portal; http://www.osaka-u.ac.jp/; address will be added as a community member of the SA of Kyushu University. This means, the WPAs will send "friend of mine" messages to the SA to register the portal addresses of the external links in their Web pages as a relevant one. The SA will cluster the WPAs into communities based on the incoming keywords of the qi over time. The server's administrator may define keywords as seeds for clustering the Web pages of the server. We introduce a definition of WPAs community that enables the SA to effectively focus on related subsets of WPAs to increase the precision of search results. The name of a cluster is initially constructed from the qi and is dealt with the main attribute of the cluster. The cluster's name is updated according to newly inputted queries related to the cluster and a set of keywords surrounded by specific HTML tags that are included in the cluster’s Web pages relevant to the queries. This means that, over time the communities of WPAs will be refined so that an agent may be assigned to or released from specific community. Following, we show the definition to create 114 2 a cluster of WPAs. Let Q be a set of cluster names {CNqi | 1 ≤ i ≤ n, CNqi = {w j | 1 ≤ j ≤ m}} , where w j is a keyword, n is the number of elements in Q. We call the number of elements in a set, size. Thus, n is a size of Q, and m is a size of CNqi . Let q be a user's query, such that q = {w j | 1 ≤ j ≤ l} ( l is a size of q ) and φ be an empty set. The clustering procedure is as following. Q ← φ , enter a query q i . If qi ∩ CNq j = φ for any CNq j ∈ Q , then create new cluster Cqi that consists of a set of Web pages relevant to q i . Then, q i is assigned to CNqi , which is the name of Cqi , i.e. CNqi ← qi and Q ← Q ∪CNqi . For each CNq j ∈ Q, if qi ∩ CNq j ≠ φ and qi ⊄ CNq j , then Q ← Q ∪CNqi , CNqi ← CNqi ∪ qi and CNqi ← CNqi ∪ k j for every k j ∈ Tag . Where Tag is a set of keywords surrounded by specific tags in such Web pages that are in Cq j and relevant to q i . 4. Routing Mechanism of Kodama Although a single router is scalable enough to potentially handles of thousands of SAs. In practice it is desirable to run a separate router for relevant SAs of a common topic. For instances, the SAs of AAAI, IEEE and ACM portals belong to one router agent. The router delegates the given query to the most popular and relevant SA. For each community of SAs, there is a router agent that holds a set of attributes as ontology of each SA (see Table 1). Where A11 to A1m means the set of attributes automatically determined by clustering the WPAs and adapted by the system to reflect the ontology of the SA1. W is the weight value, which assigned to each attribute and continually adapted based on the ℜ from the UIAs. Routing refers to the process of selecting the relevant SA and forwarding queries to it to retrieve the Web pages consistent with user's information need. Relevancy is used to determine the popularity of the SA for a particular type of queries. Kodama system maintains the similarity S ij between q i and the attribute fields of SAj of the known SAs's table using the following formula. S j = ∑ w j , i ⋅ g ( k i ) . Where, g(ki ) = 1 if ki ∈qi ∩SAj , otherwise g (k i ) = 0 . i i Table 1. Server Agent's Atrributes Relevant SA Attribute Information of a SA SAt A11,W11 A12,W12 … A1m,W1m SA2 A21,W21 A22,W22 … A2m,W2m SAj A31,W31 A32,W32 … A3m,W3m SAx Ax1,Wx1 Ax2,Wx2 … Axm,Wxm SAn An1,Wn1 An2,Wn2 … Anm,Wnm 5. Distributing Web Search on Kodama There should be a single entity that controls the list of router agents. When you register a router agent, it goes through one of several dozen routers who work with Kodama to add names to the list. Kodama, in turn, keeps a central database known as the router's database that contains information about the profile of each router. Each of the routers has thousands of SAs and handles its requests. While registering the SA into a router agent, the SA sends the names of its clusters to be used for routing relevant queries into that SA. The routers are specialized agents that send your query and those of every other UIAs to their relevant SAs along thousands of pathways of SAs. When the router receives a query from the UIA, the router does the followings with it. It asks for a list of relevant SAs. If the router agent found relevant SAs, it 115 3 assigns the request with specific SA because it already knows that this SA is relevant to this query. Then, it merges the results of the query and sends them back to the UIA. Otherwise, it forwards the queries to other routers if the results do not satisfy the user or the router could not find a relevant SA. It may return an error message because the router could not find any relevant SA. 6. Experimental Results We have performed several experiments to make a consistent evaluation of Kodama system performance. In the experiments, we agantified fifty Web servers by giving the portal addresses of the Web servers to the system; the number of Web pages within the agentified servers varies from 300 to 2500 pages. The system creates the hyper structure of the WPA communities based on the hyperlink structure of each Web server and creates the SA's attributes to be used in the router side for the routing. Then, the UIAs sent some queries to the system. We calculated the Precision of the retrieved URLs to user’s queries. The results (see Fig. 2) show that the idea of Web page agentification and the routing mechanisms promise to achieve more relevant information to the users and also promoted using Kodama as a PinPoint IR system. P r e c is io n 1 0 0 9 0 8 0 Precision 7 0 6 0 5 0 4 0 3 0 2 0 1 0 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 3 0 Q u e r ie s Fig. 2. Precision of the queries submitted to the system 7. Conclusion and Future Work We introduced methods to agentify the Web, and to cluster the SAs and WPAs into communities. We also introduced the routing mechanism of Kodama system to select the most relevant SA to the given query. We carried out experiments to investigate the performance of Kodama system. Through these experiments, we ensure that Kodama's techniques promise to achieve more relevant information to the users. Currently, the routing of Kodama is a simple query routing that binds to two hierarchical levels of router agents. We plan to scale it by increasing the number of SAs and developing more sophisticated routing mechanism for maintaining multiple hierarchy of router agents. 8. REFERENCES [1] Edmund S. Yu, Ping C. Koo, and Elizabth D. Liddy, “Evolving Intelligent Text-based Agents”, Proc. of the 4th International Conference of Autonomous Agents, June 3-7- 2000, Spain, pp.388-395. [2] T. Helmy, S. Amamiya and M. Amamiya, “Collaborative Kodama Agents with Automated Learning and Adapting for Personalized Web Searching”, Proc. of the 13th Inter. Conference on Innovative Applications of AI (IAAI/IJCAI-2001), pp. 65-72, August 7-9, 2001, USA. [3] T. Helmy, S. Amamiya and M. Amamiya, “Pinpoint Web Searching and User Modeling on the Collaborative Kodama Agents”, LNCS Proc. of the 2nd Inter. Conference on Electronic Commerce and Web Technologies EC-WEB 2001, pp. 305-314, Sept. 2001, Germany. [4] T. Helmy, S. Amamiya, and M. Amamiya, “User’s Ontology-Based Autonomous Interface Agents”, The Second Inter. Conference on Intelligent Agent Technology (IAT2001) Proc. book entitled “intelligent Agent Technology: Research and Development”, pp. 264-273, October 23-26, Japan. 116 4