OntoVote: a scalable distributed votes collecting mechanism for ontology drift on P2P platform Yanfeng Ge, Yong Yu, Xing Zhu, Shen Huang and Min Xu Department of Computer Science and Engineering Shanghai Jiao Tong University, Shanghai, 200030, P.R.China {gyf, yyu, zhuxing, s_huang, xm}@sjtu.edu.cn Abstract gies in P2P environments (in fact, users will not often know what is in the ontologies on the machine, let alone Ontologies provide potential support for knowl- that they perform maintenance on them) and as a result, we edge and content management on P2P platform. must design mechanisms that allow the ontologies to up- Although we can design ontology beforehand for date themselves, in order to cope with ontological drift. an application, it is argued that in P2P environ- [Fensel et al., 2002] has proposed several informal ments, static or predefined ontology cannot sat- mechanisms that use metaphors from social science isfy the ever-changing requirements of all users. (opinion-forming, rumor-spreading, etc). So we propose every user should make proposals In this paper, we propose a more formal mechanism of for what kind of ontology is the most up to his ontology drift that is based on every user’s participating in need. Collecting all these proposals (or votes) proposing the modification of ontology according to his helps to the drift of ontology. This paper presents demands of the application. To relieve the burden of users, OntoVote, a scalable distributed votes collecting proposals can be obtained by mining user activities (so mechanism based on application-level broadcast called emergent semantics [Maedche and Staab, 2001], trees and describes how OntoVote can be applied e.g., by mapping the modification of a directory name to to ontology drift on P2P platform by discussing the modification of a concept in ontology) or by providing several problems involved in the voting process. users with a basic ontology together with visualization tools with which the users can make modifications easily. 1 Introduction The modifications cause every user to hold a local on- With the rapid development of ontology technology and tology. These ontologies are characterized by: P2P computing in the past few years, it has been suggested • They are partly overlapped, but the same concepts may that knowledge and content management on P2P platform be expressed in different words. make use of ontologies to provide enhanced services to • There are a lot of noisy semantics, owing to the wrong users [Fensel et al., 2002]. To attain this object, some activities of users (e.g., a rookie of a domain may mod- limited but beneficial attempts have been made and even ify the domain ontology wrongly). more research plans are on the agenda. For example, the open source project Edutella [Nejdl et al., 2002] aims to • Most of them cannot represent all aspects of the re- provide an RDF-based metadata infrastructure for P2P quirements of users. applications building on the JXTA framework [JXTA]. In order to align concepts, to filter out noisy semantics, [Sato et al., 2002; Nodine et al., 2000; Arumugam et al., and to indicate the principal direction of the development 2002; etc] try to gather and share information and of user requirements, we propose these local ontologies be knowledge with the help of ontologies in P2P or other combined together to construct a common ontology. With distributed environments. a common ontology, we can also improve the efficiency of These attempts presume that ontologies have been semantic search by avoiding too many mappings between constructed beforehand and what they are concerned about ontologies. is how to use ontologies to exchange knowledge and to One possible way to combine the ontologies from all enable efficient and accurate semantic search in distrib- users is votes collecting: we collect the proposals of all uted environments. In many application scenarios, such users to make some analyses; only the semantics hold by a predefined ontologies cannot catch up with the majority of the users (or we can set a threshold for the ever-changing requirements of users. Instead, ontology proportion of users) is adopted by the common ontology. should drift with the appearance of new application re- The various minor semantics collected can also be treated quirements. But just as [Fensel et al., 2002] has stated, one in different ways according to its value in use, which we cannot expect any maintenance to happen on the ontolo- will describe in details afterwards. Practically, a voting organizer (such as a chairman or a by extending the existing application-level broadcast tally clerk) is needed to accomplish the voting task. This mechanisms. organizer can be considered as a server and serves for the common interests of a community by publishing messages 2.1 Basic Implementation of OntoVote to and receiving messages from all other voters. But in P2P In this section we introduce the basic implementation of environments, it may be hard to find any volunteer to serve OntoVote, mainly discussing two important procedures: the community for no evident good. Moreover, using a getCredential() and deliver(msg). getCredential validates server to collect votes will bring about scalability and a voting. deliver handles the messages on broadcast trees. single node failure problems as discussed in many P2P researches. To get rid of such problems, we use OntoVote, Voting Validity a scalable distributed votes collecting mechanism based on Before a new round of voting in a group is initiated, the application-level broadcast trees, to collect votes on P2P root node of the broadcast tree of the group calls getCre- platform. dential to get a voting credential for the voting. A voting The rest of the paper is organized as follows. Section 2 credential is granted only when a majority of group describes the design of OntoVote. Section 3 applies On- members are online so that they are capable to take par- toVote to the process of ontology drift. Section 4 makes a ticipant in the voting, otherwise, the voting results plotted conclusion of our work and proposes our future work. by a minority of group members will be invalid and mis- leading. Current version of OntoVote simply assumes that 2 OntoVote all votes are available, so a voting credential is always granted. This assumption is correct if all peers publish In practice, a voting process can be divided into three their votes to rendezvoux, which are online most of the successive phases. The first is a preparing phase, notify- time, just as in JXTA [JXTA]. ing all participants to get ready their votes. The second is a collecting phase, collecting votes from all participants. Voting Process And the third phase is devoted to publishing the voting The three phases of a voting process is realized in the results to all participants. procedure deliver(msg), which is called whenever a node receives a message whose destination is the node itself. The parameter msg contains the received message. The pseudo code for this procedure, simplified for clarity, in shown in Figure 2. The following variables are used in the pseudo code: msg.type is the message type, which may be PREPARE, SUBMIT or PUBLISH, corresponding to the three phases of a voting. msg.groupID indicates the group the message (a) (b) belongs to. groups is a set of groups that the node has Figure 1. The phases of votes collecting. (a) Phase One: pre- joined in, groups[].children and groups[].parent are paring phase and Phase Three: results publishing phase. (b) bi-directed links of the broadcast tree of the group. To Phase Two: collecting phase. avoid conflicts among different voting tasks, we treat each voting process as a transaction and use transID to distin- guish them from each other globally and uniquely. trans is OntoVote realizes the voting process in a fully decen- a set of transactions that the node is involved in, tralized manner. It uses broadcasts and a reverse operation trans[].votePool is the vote pool which keeps an entry for on an application-level broadcast tree to fulfill the three the votes from every child (i.e., keep the votes from every phases of a voting. As in Figure 1, at the first and the third child separately). phase in (a), notifying messages and voting results are After the root node has got a voting credential, it sends published from root to leaves. At the second phase in (b), a PREPARE messages to its children, thus starting the new node on the tree first collects votes from all of its children, round of voting. On receiving a PREPARE message, a then it sums up the votes from the children together with node sets up a new transaction environment by clearing the its own votes, and submits the total votes to its parent. vote pool for the transaction (line 2). If the node is a leaf To make use of broadcast trees, OntoVote supposes node, it sends a SUBMIT message to itself to start the there are a large number of groups on P2P platform. Every collecting phase (lines 3, 4). Otherwise, it recursively peer can choose several groups to join in. Every group passes on the message to its children (lines 5, 6). forms a reliable application-level broadcast tree with When a node receives a SUBMIT message from a child, mechanisms introduced in some researches [Castro et al., it adds the votes from the child to its vote pool in line 7 (If 2002; Zhuang et al., 2001; etc], which is out of the scope the child has submitted votes before, the old submitted of this paper (one can simply view a broadcast tree as a votes in the vote pool will be replaced with new submitted tree). Votes collecting happens inside a group. OntoVote votes). After all children of the node have submitted votes, provides best-quality votes collecting service (i.e., col- the node adds the votes in the pool and its own votes to- lecting exactly one copy of votes from every participant) gether, then submits the total votes to its parent (lines 10 to deliver(msg) 1 switch msg.type is 2 PREPARE: empty trans[msg.tranID].votePool 3 if(isLeafNode(msg.groupID)) 4 send SUBMIT message to self 5 else ∀ node in groups[msg.groupID].Children 6 send(msg, node) 7 SUBMIT: addToPool(msg.votes, msg.source) 8 if(isRootNode(msg.groupID) AND allChildrenSubmittedVotes()) 9 analyze voting results and send PUBLISH message to self 10 else if (allChildrenSubmittedVotes()) 11 msg.votes=countVotes(trans[msg.tranID].votePool)+local votes 12 send(msg, groups[msg.groupID].parent) 13PUBLISH: updateKB(msg) 14 ∀ node in groups[msg.groupID].Children 15 send(msg, node) Figure 2. The implementation of deliver 12). These lines may be called for several times allowing ensure that every node on the tree would receive the pub- for node failures, which will be discussed in details in the lished message: the new parent sends all published mes- next section. sages it has received or will receive to the new child, and After getting votes from all of its children, the root node the child either relays the messages to it children or dis- extracts some useful knowledge from the votes and re- cards the messages, according to whether or not it has publishes the knowledge to its children (lines 8, 9), thus received the identical messages before. any node in the group can update its local knowledge base However, if node failures occur during collecting phase, (lines 13 to 15). things become a bit more complicated: if a node fails at the As is seen, the basic implementation of OntoVote is very time that some children have submitted votes to it and very simple. But if we want to collect votes in a distributed some not, how about these submitted votes? If the failing fashion reliably and efficiently on network, there are still node has disconnected from the network, the submitted more things to be considered. votes will be lost. To avoid losing votes, the children of the failing node can resubmit votes to a new parent. But if the 2.2 Reliability failing node has not disconnected from the network or if On account of the unreliable nature of Internet, a broadcast the failing node has submitted votes to parent before its tree may break at any time, including during voting failure, straightforward resubmitting will result in redun- process. If we collect votes from rendezvoux, the rate of dant votes on the broadcast tree. In the rest of this section, failure will decrease sharply, but we still cannot avoid we first put forward the repairing protocol of OntoVote for node failures completely. Therefore, OntoVote proposes collecting phase, then we show that this protocol satisfy repairing the broadcast tree for the best-quality collecting our best-quality collecting purpose by leaving out as few purpose. votes as possible and by avoiding counting in the same Periodically, each node in the tree sends a heart beat copy of votes repeatedly. message to its children, if any, and the children respond Repairing Protocol for Collecting Phase with answering messages. A child suspects its parent is faulty when it fails to receive heartbeat messages and so As in Figure 3, let FN be a failing node from the aspect of does the parent, i.e. when two nodes lose in touch with its child CN. CN reconnects to a new parent NPN. We use each other, every one of them will suspect the other has P[X] to denote the parent of a node X. ∆T is a config- failed. But in fact, any one of them may be really faulty, or urable time limit. After CN reconnects to NPN, the re- none of them are faulty, but the link between them is pairing protocol goes as follows: broken. (1) If CN has not submitted votes to FN yet, then CN will Upon detection of the failure of its parent, a child tries to submit votes to NPN after it collects votes from all of connect to a new parent. To maintain the performance of its children. fully distributed votes collecting, the tree’s balance should Else if CN has submitted votes to FN, but the interval be approximately retained, so the new parent is chosen from the submission of CN to the failure of FN is still from the tree nodes that are nearly at the same level with within the time limit ∆T , then CN will submit votes the old parent in a uniformly-random way. to NPN immediately. If node failures occur during publishing phases (i.e., the first or the third phase in Figure 1), it is a trivial task to Else CN submits a null vote as a placeholder to NPN To illustrate the robustness of this protocol, assume in immediately. Figure 3, CN loses in touch with FN and reconnects to (2) After NPN adds the votes from CN to vote pool: NPN. If CN has not submitted votes to FN yet, it is all right for CN to collect votes from all children and to submit the If NPN has never submitted votes before (that is, votes to NPN. Otherwise, CN resubmits votes to NPN and NPN is still waiting for some other children), or if the previously submitted votes to FN should be eliminated NPN finds that the votes from CN are null, then stop. thoroughly. If FN is not disconnected from the network, Else NPN recounts the total votes (including that of P[FN] may find FN is still alive, so FN and its ancestors CN) and resubmits the total votes to P[NPN]. The can delete the copy of votes of CN from their vote pools. recounting and resubmitting process will be iterated Assume before this deleting process is performed till some up the tree until some ancestor of NPN that has not ancestor CN’, CN’ finds its parent FN’ is also faulty and submitted votes yet. reconnects to NPN’, then the deleting process will be (3) If CN has submitted votes to FN and the interval from continued on the new path. Meanwhile, FN’ will start a the submission of CN to the failure of FN is still new deleting process for the votes of CN’, which contains within the time limit ∆T , then FN will delete the the votes of CN. If FN is disconnected from the network, entry of CN in its vote pool. If FN has also submitted then P[FN] will delete votes of FN, which also contains votes to P[FN], it will recount the total votes it has the votes of CN. Such recursive call of the deleting process collected (excluding that of CN) and resubmit the ensures the old copy of the votes of CN is deleted com- total votes to P[FN]. The recounting and resubmit- pletely. ting process will be iterated up the tree until some Obviously, the resubmitting process and the deleting ancestor of FN that has not submitted votes yet. process for the votes of CN are executed along two dif- ferent paths to root, so if the processes can reach the root, they are not likely to always arrive at the same time. To avoid losing votes or introducing redundant votes, after all children have submitted votes, the root node should wait for a period of time that is long enough for the two proc- NPN' FN' esses to be finished. But the problem is that while the root is waiting for the repairing processes for one node, another node may happen to fail. The passive repairing processes will be called again, despite that the votes submitted by the CN' node may have reached the root. As a result, the root will be trapped in an ever-waiting deadlock. We adjust the passive repairing protocol by introducing FN NPN some active ingredient to avoid ever-waiting: If a node has submitted votes to its parent long before it finds the parent is dead (i.e., beyond a time limit ∆T which is much longer than the collecting time of the node), then it simply as- CN sumes that the parent has also submitted votes and there Figure 3. Repairing broadcast trees for best-quality collecting are several ancestors that have received the votes. Because the parent (if it is not disconnected from the network) and every one of the ancestors that have kept the votes longer Explanation of the Repairing Protocol than ∆T will try their best (just as the node itself does) to relay the votes to the root, it is not necessary for the node To explain the repairing protocol, we first neglect the time limit ∆T let ∆T =0). The main idea of the protocol is that to resubmit votes. With this compromised protocol, the loss of votes will when a node finds its parent is faulty, it doesn’t know still occur if several nearest ancestors of a node disconnect whether the submitted votes are lost or not, so it resubmits from the network concurrently, but the probability of such votes to a new parent, and at the same time, the original loss is greatly less than before (the above mentioned ancestors of the node try to eliminate the votes from the scenario). node by recounting and resubmitting the total votes in a bottom up manner. The repairing protocol is passive in 2.3 Efficiency that although the old submitted votes may have been re- layed to many ancestors, they are rolled back to start Recall that before a node submits votes to its parent, it will afresh. sum up the votes in the vote pool together with its own How well does this repairing protocol perform? In par- votes. OntoVote does not export the summation method ticular, can it really guarantee we collect exact one piece but leaves it to application level. The implementation of of votes from every node online even when several nodes the method is highly related to the efficiency of OntoVote: of the tree fail concurrently or subsequently? when more and more votes are collected, message packets that encapsulate votes will become larger and larger. Without additional disposal, the packets will overwhelm ontology. The common ontology is used to provide more the network and the scalability of OntoVote will be no powerful semantic search service. better than that of a client-server model. So we should follow several principles in the design of this method. 3.1 Process of Ontology Drift To understand our principles, one can view a message We use the following process to cope with the drift of a packet that contains votes as a sheet. Every vote has its common ontology: entry on the sheet. The number of entries a sheet can hold (1) When a new group is created, the creator provides a is limited. By combining the entries on several sheets basic common ontology. together to form a new sheet and by relaying the new sheet to parent node, a voting participant fulfils his collecting (2) When a new user joins the group, he is provided with task. Here are the design principles described with the the currently available common ontology. If the user above metaphor: is dissatisfied with the common ontology, he can • Merge identical entries. To save on the space of a sheet, modify it with ontology visualization tools to create a local ontology. The visualization tools can map on- votes devoted to the same candidate should be merged together, that is, each entry should maintain a counter of tology elements to application elements (e.g., direc- the votes that fall in this entry. Actually, this is the tories, bookmarks, etc) to hide ontologies from ele- mentary users. They can also provide powerful sup- original meaning of votes counting. port for expert users to modify ontologies directly • Filter minor entries. Because the number of the entries a and easily. In any case, the modifications of the local sheet can hold is limited, one should filter out the least ontology are translated into the user’s votes on the counting entries when the sheet overflows. The contents common ontology. By tracking user modifications, of the least counting entries are likely to be noisy or system can also partly do the mapping between the unimportant to most users, so it is acceptable to filter local ontology and the common ontology. them out. (3) At a proper time, all votes are collected and the • Choose a proper-sized sheet. Each application should common ontology is modified with the voting results. choose a proper-sized sheet by simulation or by prob- (4) After the new common ontology is published to all ability analysis to avoid a phenomenon that we call “Entry Jolts”: although some vote may be very large peers, the mapping between the local ontology and the new common ontology is adjusted again. altogether, it happens to be filtered out every time be- fore it can accumulate. “Entry Jolts” is determined by (5) Iterate the above process. such factors as data makeup, data distribution on the To put it simply, the whole is an iterated process. Let LO network, filtering algorithms and sheet sizes, etc. To (Local Ontology) denote the local ontology of a user and reduce the probability of “Entry Jolts”, a large-sized let CO (Common Ontology) denote the common ontology sheet is preferred, but we’d better get an upper bound of of a community. LC is the mapping between LO and CO. the size of the sheet, size larger than which brings about The iterated process can be described as follows: no evidently-good performance but more consumption (1) LO=CO, LC=Identity Mapping of the network bandwidth. (2) User modifies LO. System records user modifications In addition to the implementation of the summation and automatically adjusts LC. method, other factors such as the balance of a broadcast tree, which has been addressed above, also affect the ef- (3) At a proper time, system translates modifications into ficiency of OntoVote. But they are beyond our considera- votes, collects all votes and modifies CO background. tions. (4) Publish CO to every peer and adjust LC again. (5) Goto (2). 3 Application of OntoVote to Ontology There are some issues that need to be further addressed in Drift the management of the ontologies involved in this process, In section 1, we give our thought that colleting votes from e.g., how to track user modifications? Why do we keep the all users can drive the drift of a common ontology and in mapping between the common ontology and the local section 2, we show that it is possible to collect votes in P2P ontology and how to do this mapping (Obviously, it is not environments with OntoVote. In this section, we will try to enough to do the mapping between two ontologies just by apply OntoVote to the process of ontology drift. Our im- tracking the changes of either ontology)? How to derive a plementation of ontology drift is part of the knowledge user’s modification proposals (or votes) on the common acquisition module developed under our undergoing PICQ ontology from his modifications of the local ontology? project. PICQ is dedicated to paper sharing on P2P plat- What the system should do if there is a conflict of opinions? form with semantic web technologies. In PICQ, users are In the following sections, we will discuss these problems grouped according to research interest. Every group has a one by one. common ontology. New fields or new application re- quirements will introduce new concepts or attributes in the 3.2 Tracking User Modifications pattern for the deleting of a sub-tree is defined and the root node of the sub-tree is used as a parameter of the pattern. The problem for tracking changes within ontologies or within a knowledge base has been addressed in [Kiryakov Our approach for tracking modifications is illustrated in and Ognyanov, 2002]. [Kiryakov and Ognyanov, 2002] Figure 5. The left structure of the figure denotes the initial common ontology that the user imported from the com- proposes using RDF statements (i.e. triples) instead of resources or literals as the smallest trackable pieces of munity; the right structure is the local ontology. A history knowledge. The two basic types of updates in a repository of modifications is recorded in a way something like that of [Kiryakov and Ognyanov, 2002]: are addition and removal of a statement. To track series of updates that are bundled together according to the logic of the application, it uses batch update that works with the History: repository in a transactional fashion. 1. remove sub-tree(C): remove , remove 2. add DS DS 3.3 Maintenance of Ontology Mapping is_a is_a In P2P applications, every user should be allowed to hold a local ontology to keep his personality. He uses this local P2P P2P ontology to raise queries. The queries are translated into the common ontology before being sent to other peers to is_a is_a improve search efficiency and the recall of search results. So it is of great importance to maintain the mapping be- Pure Hybrid tween the local ontology and the common ontology. By tracking the modifications of the local ontology and Figure 4. Two Different Local Ontologies the common ontology in step (2) and step (3) in the process of ontology drift, we can partly do the mapping between them. For instance, if a resource in either ontology is renamed, the mapping can be adjusted. But how about adding a new resource? How can we know whether or not A the new resource in one ontology can be mapped to some A old resource in another ontology? This problem may be is_a r1 solved with emergent semantics [Maedche and Staab, History 2001; Fensel et al., 2002]: after the user adds a new re- C is_a r1 source, he queries with this new resource for a period of B E time. During this period, emergent semantics helps to find is_a B out the mappings between the new resource and some other resources in the common ontology or other local D ontologies (e.g. same file categorized to different concepts Figure 5. Tracking Modifications from Initial Common Ontol- indicates alignment). The derived mappings are expressed ogy to Current Local Ontology with probabilities, based on the number of the instances that indicating alignment. Different peers may find dif- ferent mappings, or same mappings with different prob- In regard to reflecting the modification proposals of users, abilities. At a proper time, all new resources and the we think atomic update (additions or removals of state- mappings among them are collected and coordinated with ments) plus batch update is almost appropriate and only a a new round of voting. Among the resources that can be small change is made: because we treat modification mapped to each other, the one that wins the vote is adopted proposals as votes and OntoVote requires identical votes by the common ontology, the noisy semantics (votes that to be merged, but two batch updates that represent the are below a threshold) is discarded and the rest are mapped same proposals may not match exactly (e.g., in Figure 4, to the one accepted. This process is something like the both of two users propose the sub-tree rooted at the con- revision of a dictionary in social life: after a new word cept “P2P” be deleted, but the two sets of removed triples emerges, people use this word in their communication and can’t match exactly. One set contains two more triples every one gets a scrap of the meanings of the word (e.g., than the other.), so we propose the application should find synonyms of the word). When a dictionary is revised, induce some patterns from batch updates, which reflect the all meanings of the word is collected and validated, and if intention of a user. Two batch updates are matched if and necessary, the word is lexicalized. only if both their patterns and the parameters of the pat- In practice, the mapping results that are automatically terns are matched. For instance, the deleting of the two obtained and maintained may not always be sound, so a sub-trees rooted at “P2P” in Figure 4 can be matched if a semi-automatic ontology mapping mechanism is preferred, i.e., advanced users are allowed to manually correct the mapping results before or after any round of votes col- his local ontology with the common ontology, if he is lecting. dissatisfied with his local ontology or he does not want his opinions diverge too much from those of the masses, thus 3.4 Generation of Votes the old modification records are cleared and the new Before votes collecting, the modifications of the local modifications of the local ontology can be more easily ontology should be translated into the modification pro- translated into those of the common ontology.\ posals (votes) on the common ontology. Or else, just as It should be noted that our system tries to translate every section 3.2 has stated, although the proposals of two users record in the modification history into a vote. If the are identical, they cannot be merged. modifications do not interact, this approach may work well. Currently, our system generates the votes in a rather However, sometimes an appropriate combination of addi- simple way, which is mainly based on the mapping be- tions and removals may trigger complex "non monotonic" tween the common ontology and the local ontology. Below updates of the ontology, e.g., a user first adds “B” as a we discuss how our approach works in various conditions. sub-concept of “A” and adds “C” as a sub-concept of “B”, Firstly, if the mapping between the common ontology and later, he finds that “C” is unnecessary, so he deletes it and the local ontology is well maintained, and the pattern and adds “B” as a sub-concept of “A” directly. This se- of the modification has been defined by the application, quence of modifications will be translated into a list of then the translation is straightforward. For instance, in votes. If most members of the community believe that “B” Figure 6, assume concept “DC” is mapped to concept is a sub-concept of “A”, it is likely that the last proper “Distributed Computing” and concept “P2P” is mapped to modification of the user will be accepted. However, if concept “Peer-to-Peer”. If a user adds two new concepts many users make the similar wrong modifications before, “Pure” and “Hybrid” under concept “P2P” in his local the wrong modifications may also be accepted. To filter ontology, then we think the user proposes adding these out the wrong modifications, we’d better find out the last concepts under concept “Peer-to-Peer” in the common determination of the user (or the real intention of the user) ontology; If the user deletes the tree rooted at “DC” before we construct a vote from a sequence of related completely, then we think the user proposes deleting the modifications. Unfortunately, our system has not realized tree rooted at “Distributed Computing” completely. this object yet, and we will leave it to the future. 3.5 Resolving Conflicts Distributed It is obvious that there are conflicts among all collected DC Computing proposals, e.g., some users suggest deleting concept “Peer-to-Peer”, while others suggest adding “Pure” as a is_a is_a is_a sub-concept of “Peer-to-Peer”. One straightforward way to resolve a conflict is to Client-Server Peer-to-Peer P2P choose among the conflicting proposals the one with the largest proportion. However, such simple con- flict-resolving mechanism will bring about the instability is_a is_a is_a is_a of the common ontology, especially when the proportion of the adopted proposal is not overwhelming. Flat Hierarchical Pure Hybrid To understand this problem, assume there are totally 100 users in the community. At the beginning, 60 percent Figure 6. Common Ontology and Local Ontology of them suggest adding the concept “Peer-to-Peer” (or concepts that are equivalences of “Peer-to-Peer”) to the common ontology, thus “Peer-to-Peer” is accepted in the Secondly, if no appropriate mapping information is common ontology. In the second round of voting, assume obtained, then we either ignore the modification or 30 users suggest deleting the concept “Peer-to-Peer” transform the modification into a list of sub-modifications, (these 30 users may come from the previously 60 percent according to the specification of the modification pattern. of users, or advanced users who modify common ontology For example, in Figure 6, if no corresponding concept of directly, or users who reimport and modify their local “DC” is found in the common ontology, then we either ontologies, etc.), while 10 users add sub-concepts “Pure” discard the vote or split the vote to generate a new one to under “Peer-to-Peer” and the rest make no modifications. delete the sub-tree rooted at the concept “peer-to-peer”. After the voting, if the system chooses to delete Thirdly, for some modifications (e.g., additions of “Peer-to-Peer”, then the local ontologies that still record statements), the mapping information is not necessary at the addition of “Peer-to-Peer” will propose adding all, i.e., additions of statements that have no relations with “Peer-to-Peer” to the common ontology again in the next other resources in the common ontology are allowed in our round of voting. Because the proportion of this proposal is system. still large enough, it may be adopted again by the common The last but not the least, advanced users are allowed to ontology, which causes the instability of the common modify the local copy of the common ontology directly, if ontology. they are pleased to do that. We also allow a user to replace Intuitively, We can get rid of the instability problem by International Journal of Cooperative Information Systems tracking the history of voting. But till now, this idea has 9:1/2, 2000, pp. 3-28. not been tested yet. Instead, we take a rather simpler measure to ease the instability problem, i.e., we set dif- [Arumugam et al., 2002] Madhan Arumugam, Amit Sheth, ferent thresholds for modifications with different patterns I. Budak Arpinar. Towards Peer-to-Peer Semantic Web: A to be accepted in the common ontology. Because the Distributed Environment for Sharing Semantic common ontology is mainly used to improve search effi- Knowledge on the Web. WWW2002 Workshop on Real ciency and the recall of search results, it is better to contain world RDF and Semantic web applications (2002). more resources than not. So we set low thresholds for additions of statements and high thresholds for deleting [Nejdl et al., 2002] Wolfgang Nejdl, Boris Wolf, Changtao operations. In this way, when a conflict occurs, it is more Qu, Stefan Decker, Michael Sintek et al.. EDUTELLA: A likely that a proposal with an overwhelming vote propor- P2P Networking Infrastructure Based on RDF. In proc. of tion exists, and that the opposite proposals may be too WWW11, May 2002,Hawaii. trivial to bring about instability. [JXTA] Project JXTA: An open, innovative collaboration. 4 Conclusions and Future Work White paper, available at www.jxta.org. Ontology drift is important in many requirement-sensitive [Castro et al., 2002] Miguel Castro, Peter Druschel, P2P applications. We proposed collecting the modification Anne-Marie Kermarrec and Antony Rowstron. Scribe: A proposals (votes) from all users to drive ontology drift. To Large-scale and Decentralized Application-level collect votes on P2P platform, we presented OntoVote, a Broadcast Infrastructure. IEEE Journal on Selected Areas scalable distributed votes collecting mechanism based on in Communication (JSAC), Vol. 20, No, 8, October 2002. application-level broadcast trees. OntoVote is reliable in that it leaves out as few votes as possible and avoids [Zhuang et al., 2001] Shelley Q. Zhuang, Ben Y. Zhao, counting in the same copy of votes repeatedly. We also Anthony D. Joseph, Randy H. Katz and John Kubiatowicz. summarized several design principles of vote counting for Bayeux: An Architecture for Scalable and Fault-tolerant OntoVote to work efficiently. And finally, we tried to Wide-area Data Dissemination. In proc. of the Eleventh apply OntoVote to the process of ontology drift with the International Workshop on Network and Operating discussion of several problems encountered in the appli- System Support for Digital Audio and Video(N OSSDAV cation. 2001), Port Jefferson, NY, June 2001. In future, we will research ontology drift further by obtaining more general modification patterns of ontology. [Maedche and Staab, 2001] Alexander Maedche, Steffen we will also try to make the drifting process more stable. Staab. Ontology Learning for the Semantic Web. IEEE Besides, we will research some further issues of voting, Intelligent Systems, 16(2):72-79, March/April 2001. such as voting security and the using of the opinions of authoritative members. There is also a problem that [Kiryakov and Ognyanov, 2002] Atanas Kiryakov and common ontology based on voting will neglect the views Damyan Ognyanov. Tracking Changes in RDF(S) of an individual (or a small group of people) that brings Repositories. In proc. of 13th International Conference on real innovation and original perspectives on community’s Knowledge Engineering and Knowledge Management point of view. We will find out whether intercommunica- EKAW02, Siguenza, Spain, 1-4 Oct. 2002. tion between peers helps to solve this problem. References [Fensel et al., 2002] Dieter Fensel, Steffen Staab, Rudi Studer and Frank van Harmelen. Peer-2-Peer Enabled Semantic Web for Knowledge Management. Ontol- ogy-based Knowledge Management: Exploiting the Se- mantic Web, Wiley, London, UK, 2002. [Sato et al., 2002] Hiroyuki Sato, Yutaka Abe and Atsushi Kanai. Hyperclip: a Tool for Gathering and Sharing Metadata on Users’ Activities by Using Peer-to-Peer Technology. WWW2002 Workshop on Real world RDF and Semantic web applications (2002). [Nodine et al., 2000] Marian Nodine, Jerry Fowler, Tomasz Ksiezyk, Brad Perry, Malcolm Taylor, Amy Unruh. Active Information Gathering in Infosleuth. In