An Instance-based Approach for Identifying Candidate Ontology Relations within a Multi-Agent System Andrew Brent Williams1 and Costas Tsatsoulis2 Abstract. Discovering related concepts in a multi-agent system of a set of sentences in the description logic. MacGregor [8], who among agents with diverse ontologies is difficult using existing worked on the LOOM knowledge representation language, stated the knowledge representation languages and approaches. We describe trade off between this type of language and its expressiveness. Also, an approach for identifying candidate relations between expressive, the essential ontological promiscuity of artificial intelligence states diverse ontologies using concept cluster integration. We evaluate that any agent can create and accommodate an ontology based on its the feasibility of this approach using lightweight ontologies. These usefulness for the task at hand [4]. Therefore, a group of agents with lightweight ontologies are constructed from the Magellan search individualized ontologies may wish to share knowledge but find it Web site and consist of Web page categories, or concepts, and their difficult to understand the relationships between their concepts. corresponding instances. This situation of agents with diverse ontologies may exist in the World Wide Web domain. Web users may construct simple ontolo- gies while searching and ”surfing” the Web. These users search for 1 INTRODUCTION information using Web browsers and search engines. Once a person finds a Web page of interest, often they will bookmark them for later In order to facilitate knowledge sharing between a group of interact- reference. These bookmarks can be grouped into categories, or con- ing information agents (i.e. a multi-agent system), a common ontol- cepts, of similar Web pages to form a taxonomy, or a lightweight ogy should be shared. However, we recognize that often agents do ontology. not always commit a priori to a common, pre-defined global ontol- We believe that information agents may benefit from using these ogy. Our ongoing research investigates approaches for agents with di- ontologies to search for related concepts in a multi-agent system. verse ontologies to share knowledge by automated learning methods These agents may understand different concepts that are not exactly and agent communication strategies [15]. When these agents have the same but may be related. As an example, there may be an agent diverse ontologies there are many challenges for knowledge sharing that understands the concept, ”NBA”. Another agent may know the and communication. One of these challenges is for agents to automat- concept, ”College Hoops”. Although these concepts are not exactly ically learn representations for diverse ontologies from categorized the same, they are both clearly related to the concept, ”Basketball”. Web pages and identify the relationships between two agents’ on- Agents that do not know the relationships of their concepts to each tologies. In this paper, we demonstrate the feasibility for identifying other need to be able to teach each other these relationships. If the a candidate 1:N relation between two different agents’ ontologies. agents are able to discover these concept relations, this will aid them These ontologies represent natural human categorizations of Web as a group in sharing knowledge even though they have diverse on- page bookmarks into concepts, or sets of corresponding instances. tologies. Information agents acting on behalf of a diverse group of Various definitions of an ontology range from simply what is users need a way of discovering relationships between the individu- known to exist in an agent’s world; the categories in a search en- alized ontologies of users. These agents can use these discovered re- gine index; or to more rigorous definitions which lend themselves lationships to help their users find information related to their topic, to constructions of formal ontologies using a description logic. For or concept, of interest. This paper describes a possible approach for example, in the classic AI blocks world domain, the ontology only discovering these relationships while allowing for maximum expres- consists of a table surface and three blocks labeled A, B, and C. The siveness in the agents’ vocabularies. Yahoo! search engine has an ontology which consists of a taxonomy The rest of this paper first discusses related work in Section 2, of its Web page categories and is referred to as a lightweight ontol- describes our approach in Section 3 and then describes our imple- ogy [13]. An example of a more extensive formalized ontology is the mentation in Section 4. Section 5 describes how we evaluated our Cyc Ontology [7]. system and presents the results. Section 6 presents our conclusions We recognize that research using formal knowledge representation and describes future work. languages to create formal ontologies for agent knowledge sharing has made significant strides. These approaches, however, must place some limits on the expressiveness of the vocabulary in order to facil- 2 RELATED WORK itate the use of inference mechanisms for deducing the entailments Manually constructing ontologies by combining different ontologies 1 Department of Electrical and Computer Engineering, University of Iowa, on the same or similar subject into one is called merging [11]. Dif- Iowa City, IA 52242 email: abwill@eng.uiowa.edu 2 Information and Telecommunication Center, Department of Electrical Engi- ferentiated ontologies having terms that are formally defined as con- neering and Computer Science, University of Kansas, Lawrence, KS 66045 cepts and have local concepts that are shared have been addressed email: tsatsoul@ittc.ukans.edu [14]. They use description compatibility measures based on com- paring ontology structures represented as graphs and by identifying This combination of a unique vocabulary and a vector of correspond- similarities as mappings between elements of the graphs. The rela- ing ones and zeroes makes up a concept vector. The concept vector tions they find between concepts are based on the assumption that represents a specific Web page and the actual semantic concept is local concepts inherit from concepts that are shared. Their system represented by a group of concept vectors judged to be similar by the was evaluated by generating description logic ontologies in artifi- user. cial worlds. In our approach, we do not assume that the ontologies Our agents use supervised inductive learning to learn their indi- share commonly labeled concepts but rather a distributed collective vidual ontologies. The output of this ontology learning is semantic memory of objects that can be selectively categorized into the agent’s concept descriptions (SCD) in the form of interpretation rules. For ontology. Our system also differs in that it uses Web page text as in- example, the following is the SCD for the concept in the ontology stances that describe examples of the agent’s concepts. location Arts/Book/Talk/Reviews using a CLIPS rule representation: Machine learning algorithms have been used to learn how to ex- tract information from other Web pages [3]. Their approach uses 1. (defrule Rule 35 (danny 1) manually constructed ontologies with their classes, relations and => training data. The objective of this work is to construct a knowledge (assert (CONCEPT Arts Book Talk Reviews))) base from the World Wide Web and not to find relationships between 2. (defrule Rule 34 (ink 1) concepts in a multi-agent system. => Several information agent systems attempt to deal with some is- (assert (CONCEPT Arts Book Talk Reviews))) sues in using ontologies to find information. IICA, or Intelligent Information Collector and Analyzer, is an ontology-based Internet Each Web page bookmark folder label represents a semantic con- navigation system [5]. IICA gathers, classifies and reorganizes in- cept name. A Web page bookmark folder can contain bookmarks, or formation from the Internet. It uses a common ontology to allow URL’s, pointing to a semantic concept object, or Web page. A book- IICA to make inexact matches between users’ requests and the candi- mark folder can also contain additional folders. Each set of book- date locations. They define their ontologies as weakly structured and marks in a folder is used as training instances for the semantic con- are built from existing thesauruses and technical books consisting of cept learner. The semantic concept learner learns a set of interpre- about 4,500 terms. This system is based on using a common ontol- tation rules for all of the agent’s known semantic concept objects ogy rather than diverse ontologies. For text categorization or classifi- (Figure 1). cation it uses the information retrieval vector space model. Informa- tion agents that can update models of available information sources using inductive, concept learning but applied it to static, relational databases using a formal description logic have been demonstrated [6, 1]. Their system embedded the concept semantics in the initial ontology and in their query reformulation operators. Since they are using a description logic their expressiveness of their vocabulary is limited and would be hampered by the high degree of language ex- pressiveness in the World Wide Web domain. The InfoSleuth Project [2] uses multiple representations of ontologies to help in semantic brokering. Their agents advertise their capabilities in terms of more than one ontology in order to increase the chances of finding a se- mantic match of concepts in the distributed information system. The InfoSleuth system, however, is not attempting to discover relation- ships between concepts in the different ontologies. 3 APPROACH We discuss how our agents represent, learn, share, and interpret con- Figure 1. Supervised inductive learning produces ontology rules cepts using ontologies constructed from Web page bookmark hierar- chies. In particular, we show how we use DOGGIE agents to discov- ery candidate relations between different ontologies. The relations For each of these semantic concept description rules, an associ- are assumed to be general is-a relations. ated certainty value is determined during the learning process. This certainty value is used later during the interpretation process. It is 3.1 Concept Representation and Learning equal to the percentage these rules were successful in interpreting the training set minus an error prediction rate calculated for our par- A semantic concept comprises a group of semantic objects, i.e. Web ticular semantic concept learner [12]. pages, that describe that concept. The semantic object representa- tions we use define each token, i.e. word and HTML tag from the 3.2 Concept-based Queries Web page, as a boolean feature. The entire collection of Web pages, or semantic objects, that were categorized by a user’s bookmark hier- DOGGIE agents use concept-based queries (CBQ) to communicate archy is tokenized to find a vocabulary of unique tokens. This vocab- their requests for concepts related to the query. A CBQ occurs when ulary is used to represent a Web page by a vector of ones and zeroes one agent sends example concepts to other neighboring agents, de- corresponding to the presence or absence of a token in a Web page. termines by the agents’ responses who knows related concepts, and learns new knowledge. This new knowledge can be in the form semantic concept learner. Next, the original CBQ sent out by the Q of learning new semantic concepts or knowledge regarding another agent will be interpreted according to these new semantic concept agent’s ontology. The actual CBQ consists of the concept name, ad- descriptions to see if it knows the CBQ concept as the combination dresses of examples of the concept (i.e. URL’s), and flags indicating of the returned M region concepts. This CCI process integrates the what type of service the user requests. For this concept-based query R agents M region concepts into its own ontology since the exam- scenario, an acquaintance agent is any other agent that the querying ples are input into the ontology under a new concept name. This new agent knows how to locate and communicate with. concept name represents a compound concept. If there is a match between the original CBQ and the new com- pound cluster, then new group knowledge which describes a relation- 3.3 Concept Interpretation and Verification ship between a Q agent’s concept with more than one R agent’s con- The querying (Q) agent will send out a CBQ to its acquaintances. The cepts can be learned. This group knowledge, or CCI rule, is stored in responding (R) agents will use their semantic concept interpreters to the form of a concept relation, or compound cluster translation rule. determine if they think they know related concepts, and will send The Q agent takes the following outlined steps to perform CCI: their responses back to the Q agent. A semantic concept interpreter is a knowledge-based component that can classify concept instances 1. From the R agent response, determine the names of the concepts according to an agent’s local ontology. Each Q and R agent have their to cluster. own local ontologies which represent how they have individually 2. Create a new compound concept using the above names. conceptualized their view of the world. The R agent’s responses may 3. Create a new ontology category by combining instances associ- be a positive (K), neutral (M), or negative (D) interpretation along ated with the compound concept. with the concept name and type. A positive or negative response cor- 4. Re-learn the ontology rules. responds to an interpretation value above the positive or negative in- 5. Re-interpret the CBQ using the new ontology rules including the terpretation threshold, respectively. A neutral response corresponds new concept cluster description rules. to the value falling between these two thresholds. A positive inter- 6. If the concept is verified, store the new concept relation rule. pretation threshold is equal to the percentage accuracy value calcu- lated for a particular concept during the ontology process minus an 4 IMPLEMENTATION error prediction rate value for the particular concept interpreter. A negative interpretation threshold is the lower boundary interpretation In this section, we describe how we implemented our multi-agent value that indicates the concept is not known. The concept name cor- system, the Distributed Ontology Gathering Group Integration En- responds to the bookmark folder the Web pages belong to. The con- vironment (DOGGIE). DOGGIE was used for our investigation into cept type indicates whether the answer to the query is a similar or knowledge sharing and learning among agents with diverse ontolo- related concept. If an R agent has a positive response to the CBQ, gies. DOGGIE agents are multithreaded Java applications with a it will request permission to send examples of its similar or related Swing GUI (Figure 2). concept back to the Q agent. The Q agent can then verify whether the R agent actually knows a similar or related concept by using its own concept interpreter on the examples R sends to it. 3.4 Concept Cluster Integration In this case of multiple M regions in a concept query response, DOG- GIE can apply the concept cluster algorithm (CCI) to look for candi- date relations between ontologies. As we have previously described, the Q agent sends out a CBQ that will be received by R agents. The R agents will send back the results of its interpretation process. In- cluded in this response are the name of the concept(s) it has inter- preted the original concept to be, its interpretation region (K,M, or D), the stored interpretation threshold, the resulting interpretation value, and some examples of the R agent’s concept. Since in our multi-agent system, the agents are willing to perform minimal work for each other, the actual concept cluster integration algorithm is per- formed by the original Q agent instead of an R agent. After the inter- pretation results have been sent back to the Q agent from the R agent, Figure 2. Example DOGGIE Agent GUI with KQML Messages the Q agent must do several things to complete concept cluster inte- gration. First, it must gather all of the returned examples from each of the returned M region concepts. Then it must combine these into a new directory named after a combination of these concept names. The underlying multi-agent communication architecture for the For example, if the M region concepts returned were Sports and DOGGIE system is designed around the Common Object Request NBA, then the new concept cluster directory would be called Sports Broker Architecture (CORBA). CORBA is used as the underlying + NBA. A new ontology category would be created with this label communication mechanism between the DOGGIE agents that can and the returned examples would be combined into this category as be located anywhere on the Internet. The messages between agents the instances that make up this concept. Then the agent must relearn are formatted and sent using the Knowledge Query Manipulation the ontology rules, or semantic concept descriptions (SCD), using its Language (KQML). Each agent in DOGGIE is actually composed of both a CORBA client and a CORBA server process running si- the concepts to build ”narrow” ontologies that only included closely multaneously so that it can both send and receive queries. A sin- related concepts. gle agent sends its concept-based queries (CBQ) using the CORBA client. The agent receives concept-based queries through its CORBA Table 1. Some example Magellan concepts used for ontology server component. The CORBA server and CORBA client are the main communication components for the Agent Engine and Agent Control. Each single DOGGIE agent is made up of five major com- Number ID Concept 1 3 Arts/Architecture Firms ponents: Agent Control, Agent Engine, Agent UI, Semantic Concept 15 170 Business/Companies/Agriculture and Fisheries Learner, Semantic Concept Interpreter (Figure 3). 17 1012 Computing/Hardware/LAN Hardware 31 2090 Health and Medicine/Mental Health/Resources 33 2120 Hobbies/Arts and Crafts/Knitting and Stitching 37 3504 Regional/Travel/Travel Agencies/P through Z 39 3535 Science/Astrononmy and Space/Resources 46 4030 Shopping/Prized Possessions/Collectibles 49 4115 Sports/Basketball/NCAA 50 4127 Sports/College/School Home Pages 5.2 Experiments In this section, we describe how we evaluated the feasibility of find- ing candidate ontology relations using the concept cluster integration algorithm in the context of a multi-agent system. 5.2.1 Hypothesis Figure 3. Architecture of a single DOGGIE Agent It is feasible for agents with diverse ontologies to discover concept relations using concept cluster integration. 5.2.2 Method 5 EVALUATION We selected a sample of ten queries which produced two M region responses and set up the DOGGIE agents to communicate between We show that it may be feasible for agents to discover relationships the respective Q and R agents. We selected the concept cluster in- between diverse ontologies by testing the concept cluster integration tegration option on the DOGGIE agent user interface then sent and algorithm using ontologies constructed from the Magellan [10, 9] processed the queries one at a time. search engine ontology. 5.1 Data Set 5.2.3 Prediction A manually constructed subject ontology from the Magellan site was We expected that the CCI algorithm would produce at least some used [16]. This ontology was grabbed from the Magellan site by a verified concept cluster relations. spider. The spider started from the Magellan homepage and recur- sively followed the links to grab both topics and Web pages listed 5.2.4 Results in each Web page topic. This approach assumed that the Web pages listed under a topic were semantically related to the topic. We used The results of our experiments are located in Table 2 below. Only an existing open Magellan ontology to objectively measure which 20% of our queries produced verified concept cluster relations. Web page instances belonged to particular ontology concepts. The Table 2 shows the experiment number, the original queried con- data used consisted of 50 random concept categories taken from the cept, and the concept responses. It also shows the region type for the Magellan search Web site. The Magellan ontology consisted of 4,385 responses, the delta, or the difference between the stored interpreta- nodes, or concept categories. Each of the concept categories used had tion threshold and the actual threshold, the name of the newly created 20 Web pages in them. Each DOGGIE agent was assigned 5 to 12 concept cluster, and the results of the concept cluster verification. concepts from the Magellan ontology. The concept cluster integra- tion experiments were run in single agent to single agent configura- 5.2.5 Discussion tions. The concepts used were each assigned a unique identification (ID) Our DOGGIE agents discovered two concept relations out of the ten number and some examples of the concepts used are listed in Ta- attempts. The resulting concept relation rules are below: ble 1 below. These were the concepts that we chose from to build our individual agent ontologies. For some of our agent ontologies  K(A1, C2090, K(A4, C5+42)) we randomly chose the concepts used. In others we hand selected  K(A1, C3504, K(A4, C2024+3504)) Table 2. Concept Cluster Integration Experiments Summary REFERENCES [1] J. Ambite and C. Knoblock, ‘Reconciling distributed information sys- tems’, AAAI 1995 Spring Symposium on Information Gathering from # Query Reply 2 4 Cluster V Distributed, Heterogeneous Environmentss, (1995). [2] R. Bayardo, W. Bohrer, R. Brice, A. Cichocki, J. Fowler, A. Helal, 1a 2090 5+42 K 0.04 5+42 Y V. Kashyap, T. Ksiezyk, G. Martin, M. Nodine, M Rashid, 1b 2090 2090 M -0.06 5+42 N M. Rusinkiewicz, R. Shea, C. Unnikrishnan, A. Unruh, and D. Woelk, 2 3504 2024+3504 K 0.07 2024+3504 Y InfoSleuth: Agent-Based Semantic Integration of Information in Open 3 3562 5+3561 M -0.26 5+3561 N and Dynamic Environments, 205–216, Morgan Kaufmann, San Fran- 4 4030 4030 K 0.24 4004+1014 N cisco, 1998. 5a 135 3505+1014 M -0.36 3505+1014 N [3] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, 5b 135 135 M -0.06 3505+1014 N K. Nigam, and S. Slatter, ‘Learning to extract symbolic knowledge from 6 59 59 M -0.08 3504+1014 N the world wide web’, in Proceedings of the 15th National Conference 7 170 170 M -0.50 57+2409 N on Artificial Intelligence (AAAI-98), (1998). 8 4002 57+3505 D -0.46 57+3505 N [4] M. Genesereth and N. Nilsson, Logical Foundations of Artificial Intel- 9 4002 4002 K 0.15 57+3505 N ligence, Morgan Kauffmann, Palo Alto, CA, 1987. 10 3561 3561 - - 57+1027 N [5] M. Iwasume, K. Shirakami, H. Takeda, and T. Nishida, ‘Iica: An ontology-based internet navigation system’, AAAI-96 Workshop on 2 region type Internet-based Information Systems, (1996). [6] C. Knoblock, Y. Arens, and C. Hsu, ‘Cooperating agents for informa- The first equation can be read as ”Agent 1’s concept 2090 is related to tion retrieval’, in Proceedings of the 2nd International Conference on Cooperative Information Systems, Toronto, Canada, (1994). University Agent 4’s concept cluster 5+42”. From Table 1 we note that concept of Toronto Press. 2090 is located in the Magellan ontologies as: Health and Medicine / [7] D. Lenat and R.V. Guha, Building Large Knowledge-Based Systems, Mental Health Resources. Concepts 5 and 42 are: Arts /Architecture / Addison-Wesley, Reading, Mass., 1990. Resources and Professional Organizations and Arts / Books / Genres [8] R. MacGregor, The evolving technology of classification-based knowl- edge representation systems, Principles of Semantic Networks: Explo- / Non-Fiction. Intuitively, it would be difficult to determine such a rations in the Representation of Knowledge, Morgan Kaufmann, 1991. relationship between these concepts. However, if we are using the [9] Magellan. http://www.lib.ua.edu/maghelp.htm, 1998. DOGGIE for AI-assisted Web-browsing, this is a relationship that [10] Magellan. http://magellan.mckinley.com, 1999. the user may wish to explore. [11] H.S. Pinto and J.P. Martins, ‘Reusing ontologies’, AAAI 2000 Spring Similarly, the second equation can be read as ”Agent 1’s concept Symposium on Bringing Knowledge to Business Processes, 77–84, (2000). 3504 is related to Agent 4’s concept cluster 2024+3504”. This con- [12] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kauf- cept relation rule shows an interesting situation in which one agent’s mann, San Mateo, CA, 1993. concept 3504 is related to another agent’s concept 3504 combined [13] S. Staab, J. Angele, S. Decker, M. Erdmann, A. Hotho, A. Madche, with concept 2024. From Table 2 we see that concept 3504 is Re- H. Schnurr, R. Studer, and Y. Sure, Semantic Community Web Portals, 1–13, Computer Networks (Special Issue: WWW9 - Proceedings of the gional/Travel/Travel Agencies/P through Z. Concept 2024 is Health 9th International World Wide Web Conference), Elsevier, Amsterdam, and Medicine/Medicine/Clinics/University Medical Centers. Again, 2000. this newly created concept cluster could be worth exploring by the [14] P. Weinstein and W. Birmingham, ‘Agent communication with differ- user. entiated ontologies: eight new measures of description compatibility’, Technical report, Department of Electrical Engineering and Computer Science, University of Michigan, (1999). [15] A.B. Williams and C. Tsatsoulis, ‘Diverse web ontologies: What intel- ligent agents must teach to each other’, AAAI 1999 Spring Symposium 6 CONCLUSION AND FUTURE WORK on Intelligent Agents in Cyberspace, 115–120, (1999). [16] Xiaolan Zhu, Incorporating Quality Metrics in Agent-Based Central- ized/Decentralized Information Retrieval on the World Wide Web, Ph.D. Our results have demonstrated that our instance-based approach for dissertation, University of Kansas, 1999. discovering candidate relations between ontologies using concept cluster integration is feasible. We believe that further research is re- quired. Our approach does not attempt to identify the specific type of relationship (e.g. part-of) in the ontologies but assumes they con- sist of general is-a relations. For future experiments, the number of instances included in each concept should be increased to insure that the machine learning algorithm has sufficient training examples. Also, a different experiment design should be used to verify that we can take an existing concept, divide it into two concepts, and deter- mine whether DOGGIE can discover the relations between them. We hope that eventually this multi-agent system approach to finding rela- tions can be used in conjunction with formal ontologies constructed using a description logic. ACKNOWLEDGEMENTS We would like to thank the referees for their comments which helped improve this paper.