OntoVote: a scalable distributed votes collecting mechanism for
                            ontology drift on P2P platform

                        Yanfeng Ge, Yong Yu, Xing Zhu, Shen Huang and Min Xu
                             Department of Computer Science and Engineering
                        Shanghai Jiao Tong University, Shanghai, 200030, P.R.China
                              {gyf, yyu, zhuxing, s_huang, xm}@sjtu.edu.cn


                        Abstract                                gies in P2P environments (in fact, users will not often
                                                                know what is in the ontologies on the machine, let alone
    Ontologies provide potential support for knowl-             that they perform maintenance on them) and as a result, we
    edge and content management on P2P platform.                must design mechanisms that allow the ontologies to up-
    Although we can design ontology beforehand for
                                                                date themselves, in order to cope with ontological drift.
    an application, it is argued that in P2P environ-           [Fensel et al., 2002] has proposed several informal
    ments, static or predefined ontology cannot sat-            mechanisms that use metaphors from social science
    isfy the ever-changing requirements of all users.           (opinion-forming, rumor-spreading, etc).
    So we propose every user should make proposals                 In this paper, we propose a more formal mechanism of
    for what kind of ontology is the most up to his             ontology drift that is based on every user’s participating in
    need. Collecting all these proposals (or votes)             proposing the modification of ontology according to his
    helps to the drift of ontology. This paper presents         demands of the application. To relieve the burden of users,
    OntoVote, a scalable distributed votes collecting           proposals can be obtained by mining user activities (so
    mechanism based on application-level broadcast              called emergent semantics [Maedche and Staab, 2001],
    trees and describes how OntoVote can be applied             e.g., by mapping the modification of a directory name to
    to ontology drift on P2P platform by discussing             the modification of a concept in ontology) or by providing
    several problems involved in the voting process.
                                                                users with a basic ontology together with visualization
                                                                tools with which the users can make modifications easily.
1 Introduction                                                  The modifications cause every user to hold a local on-
With the rapid development of ontology technology and           tology. These ontologies are characterized by:
P2P computing in the past few years, it has been suggested         • They are partly overlapped, but the same concepts may
that knowledge and content management on P2P platform                 be expressed in different words.
make use of ontologies to provide enhanced services to             • There are a lot of noisy semantics, owing to the wrong
users [Fensel et al., 2002]. To attain this object, some              activities of users (e.g., a rookie of a domain may mod-
limited but beneficial attempts have been made and even               ify the domain ontology wrongly).
more research plans are on the agenda. For example, the
open source project Edutella [Nejdl et al., 2002] aims to          • Most of them cannot represent all aspects of the re-
provide an RDF-based metadata infrastructure for P2P                  quirements of users.
applications building on the JXTA framework [JXTA].                In order to align concepts, to filter out noisy semantics,
[Sato et al., 2002; Nodine et al., 2000; Arumugam et al.,       and to indicate the principal direction of the development
2002; etc] try to gather and share information and              of user requirements, we propose these local ontologies be
knowledge with the help of ontologies in P2P or other           combined together to construct a common ontology. With
distributed environments.                                       a common ontology, we can also improve the efficiency of
   These attempts presume that ontologies have been             semantic search by avoiding too many mappings between
constructed beforehand and what they are concerned about        ontologies.
is how to use ontologies to exchange knowledge and to              One possible way to combine the ontologies from all
enable efficient and accurate semantic search in distrib-       users is votes collecting: we collect the proposals of all
uted environments. In many application scenarios, such          users to make some analyses; only the semantics hold by a
predefined ontologies cannot catch up with the                  majority of the users (or we can set a threshold for the
ever-changing requirements of users. Instead, ontology          proportion of users) is adopted by the common ontology.
should drift with the appearance of new application re-         The various minor semantics collected can also be treated
quirements. But just as [Fensel et al., 2002] has stated, one   in different ways according to its value in use, which we
cannot expect any maintenance to happen on the ontolo-          will describe in details afterwards.
   Practically, a voting organizer (such as a chairman or a       by extending the existing application-level broadcast
tally clerk) is needed to accomplish the voting task. This        mechanisms.
organizer can be considered as a server and serves for the
common interests of a community by publishing messages            2.1    Basic Implementation of OntoVote
to and receiving messages from all other voters. But in P2P       In this section we introduce the basic implementation of
environments, it may be hard to find any volunteer to serve       OntoVote, mainly discussing two important procedures:
the community for no evident good. Moreover, using a              getCredential() and deliver(msg). getCredential validates
server to collect votes will bring about scalability and          a voting. deliver handles the messages on broadcast trees.
single node failure problems as discussed in many P2P
researches. To get rid of such problems, we use OntoVote,         Voting Validity
a scalable distributed votes collecting mechanism based on        Before a new round of voting in a group is initiated, the
application-level broadcast trees, to collect votes on P2P        root node of the broadcast tree of the group calls getCre-
platform.                                                         dential to get a voting credential for the voting. A voting
   The rest of the paper is organized as follows. Section 2       credential is granted only when a majority of group
describes the design of OntoVote. Section 3 applies On-           members are online so that they are capable to take par-
toVote to the process of ontology drift. Section 4 makes a        ticipant in the voting, otherwise, the voting results plotted
conclusion of our work and proposes our future work.              by a minority of group members will be invalid and mis-
                                                                  leading. Current version of OntoVote simply assumes that
2    OntoVote                                                     all votes are available, so a voting credential is always
                                                                  granted. This assumption is correct if all peers publish
In practice, a voting process can be divided into three           their votes to rendezvoux, which are online most of the
successive phases. The first is a preparing phase, notify-        time, just as in JXTA [JXTA].
ing all participants to get ready their votes. The second is a
collecting phase, collecting votes from all participants.         Voting Process
And the third phase is devoted to publishing the voting           The three phases of a voting process is realized in the
results to all participants.                                      procedure deliver(msg), which is called whenever a node
                                                                  receives a message whose destination is the node itself.
                                                                  The parameter msg contains the received message. The
                                                                  pseudo code for this procedure, simplified for clarity, in
                                                                  shown in Figure 2.
                                                                     The following variables are used in the pseudo code:
                                                                  msg.type is the message type, which may be PREPARE,
                                                                  SUBMIT or PUBLISH, corresponding to the three phases
                                                                  of a voting. msg.groupID indicates the group the message
                (a)                         (b)                   belongs to. groups is a set of groups that the node has
Figure 1. The phases of votes collecting. (a) Phase One: pre-     joined in, groups[].children and groups[].parent are
paring phase and Phase Three: results publishing phase. (b)       bi-directed links of the broadcast tree of the group. To
Phase Two: collecting phase.                                      avoid conflicts among different voting tasks, we treat each
                                                                  voting process as a transaction and use transID to distin-
                                                                  guish them from each other globally and uniquely. trans is
OntoVote realizes the voting process in a fully decen-            a set of transactions that the node is involved in,
tralized manner. It uses broadcasts and a reverse operation       trans[].votePool is the vote pool which keeps an entry for
on an application-level broadcast tree to fulfill the three       the votes from every child (i.e., keep the votes from every
phases of a voting. As in Figure 1, at the first and the third    child separately).
phase in (a), notifying messages and voting results are              After the root node has got a voting credential, it sends
published from root to leaves. At the second phase in (b), a      PREPARE messages to its children, thus starting the new
node on the tree first collects votes from all of its children,   round of voting. On receiving a PREPARE message, a
then it sums up the votes from the children together with         node sets up a new transaction environment by clearing the
its own votes, and submits the total votes to its parent.         vote pool for the transaction (line 2). If the node is a leaf
   To make use of broadcast trees, OntoVote supposes              node, it sends a SUBMIT message to itself to start the
there are a large number of groups on P2P platform. Every         collecting phase (lines 3, 4). Otherwise, it recursively
peer can choose several groups to join in. Every group            passes on the message to its children (lines 5, 6).
forms a reliable application-level broadcast tree with               When a node receives a SUBMIT message from a child,
mechanisms introduced in some researches [Castro et al.,          it adds the votes from the child to its vote pool in line 7 (If
2002; Zhuang et al., 2001; etc], which is out of the scope        the child has submitted votes before, the old submitted
of this paper (one can simply view a broadcast tree as a          votes in the vote pool will be replaced with new submitted
tree). Votes collecting happens inside a group. OntoVote          votes). After all children of the node have submitted votes,
provides best-quality votes collecting service (i.e., col-        the node adds the votes in the pool and its own votes to-
lecting exactly one copy of votes from every participant)         gether, then submits the total votes to its parent (lines 10 to
                        deliver(msg)
                        1 switch msg.type is
                        2 PREPARE: empty trans[msg.tranID].votePool
                        3             if(isLeafNode(msg.groupID))
                        4                send SUBMIT message to self
                        5             else ∀ node in groups[msg.groupID].Children
                        6                send(msg, node)
                        7 SUBMIT: addToPool(msg.votes, msg.source)
                        8          if(isRootNode(msg.groupID) AND allChildrenSubmittedVotes())
                        9             analyze voting results and send PUBLISH message to self
                        10         else if (allChildrenSubmittedVotes())
                        11            msg.votes=countVotes(trans[msg.tranID].votePool)+local votes
                        12            send(msg, groups[msg.groupID].parent)
                        13PUBLISH: updateKB(msg)
                        14             ∀ node in groups[msg.groupID].Children
                        15               send(msg, node)
                                                    Figure 2. The implementation of deliver


12). These lines may be called for several times allowing           ensure that every node on the tree would receive the pub-
for node failures, which will be discussed in details in the        lished message: the new parent sends all published mes-
next section.                                                       sages it has received or will receive to the new child, and
   After getting votes from all of its children, the root node      the child either relays the messages to it children or dis-
extracts some useful knowledge from the votes and re-               cards the messages, according to whether or not it has
publishes the knowledge to its children (lines 8, 9), thus          received the identical messages before.
any node in the group can update its local knowledge base              However, if node failures occur during collecting phase,
(lines 13 to 15).                                                   things become a bit more complicated: if a node fails at the
   As is seen, the basic implementation of OntoVote is              very time that some children have submitted votes to it and
very simple. But if we want to collect votes in a distributed       some not, how about these submitted votes? If the failing
fashion reliably and efficiently on network, there are still        node has disconnected from the network, the submitted
more things to be considered.                                       votes will be lost. To avoid losing votes, the children of the
                                                                    failing node can resubmit votes to a new parent. But if the
2.2 Reliability                                                     failing node has not disconnected from the network or if
On account of the unreliable nature of Internet, a broadcast        the failing node has submitted votes to parent before its
tree may break at any time, including during voting                 failure, straightforward resubmitting will result in redun-
process. If we collect votes from rendezvoux, the rate of           dant votes on the broadcast tree. In the rest of this section,
failure will decrease sharply, but we still cannot avoid            we first put forward the repairing protocol of OntoVote for
node failures completely. Therefore, OntoVote proposes              collecting phase, then we show that this protocol satisfy
repairing the broadcast tree for the best-quality collecting        our best-quality collecting purpose by leaving out as few
purpose.                                                            votes as possible and by avoiding counting in the same
   Periodically, each node in the tree sends a heart beat           copy of votes repeatedly.
message to its children, if any, and the children respond           Repairing Protocol for Collecting Phase
with answering messages. A child suspects its parent is
faulty when it fails to receive heartbeat messages and so           As in Figure 3, let FN be a failing node from the aspect of
does the parent, i.e. when two nodes lose in touch with             its child CN. CN reconnects to a new parent NPN. We use
each other, every one of them will suspect the other has            P[X] to denote the parent of a node X. ∆T is a config-
failed. But in fact, any one of them may be really faulty, or       urable time limit. After CN reconnects to NPN, the re-
none of them are faulty, but the link between them is               pairing protocol goes as follows:
broken.                                                             (1) If CN has not submitted votes to FN yet, then CN will
   Upon detection of the failure of its parent, a child tries to          submit votes to NPN after it collects votes from all of
connect to a new parent. To maintain the performance of                   its children.
fully distributed votes collecting, the tree’s balance should             Else if CN has submitted votes to FN, but the interval
be approximately retained, so the new parent is chosen                    from the submission of CN to the failure of FN is still
from the tree nodes that are nearly at the same level with                within the time limit ∆T , then CN will submit votes
the old parent in a uniformly-random way.                                 to NPN immediately.
   If node failures occur during publishing phases (i.e., the
first or the third phase in Figure 1), it is a trivial task to
     Else CN submits a null vote as a placeholder to NPN              To illustrate the robustness of this protocol, assume in
     immediately.                                                  Figure 3, CN loses in touch with FN and reconnects to
(2) After NPN adds the votes from CN to vote pool:                 NPN. If CN has not submitted votes to FN yet, it is all right
                                                                   for CN to collect votes from all children and to submit the
    If NPN has never submitted votes before (that is,              votes to NPN. Otherwise, CN resubmits votes to NPN and
    NPN is still waiting for some other children), or if           the previously submitted votes to FN should be eliminated
    NPN finds that the votes from CN are null, then stop.          thoroughly. If FN is not disconnected from the network,
     Else NPN recounts the total votes (including that of          P[FN] may find FN is still alive, so FN and its ancestors
     CN) and resubmits the total votes to P[NPN]. The              can delete the copy of votes of CN from their vote pools.
     recounting and resubmitting process will be iterated          Assume before this deleting process is performed till some
     up the tree until some ancestor of NPN that has not           ancestor CN’, CN’ finds its parent FN’ is also faulty and
     submitted votes yet.                                          reconnects to NPN’, then the deleting process will be
(3) If CN has submitted votes to FN and the interval from          continued on the new path. Meanwhile, FN’ will start a
    the submission of CN to the failure of FN is still             new deleting process for the votes of CN’, which contains
    within the time limit ∆T , then FN will delete the             the votes of CN. If FN is disconnected from the network,
    entry of CN in its vote pool. If FN has also submitted         then P[FN] will delete votes of FN, which also contains
    votes to P[FN], it will recount the total votes it has         the votes of CN. Such recursive call of the deleting process
    collected (excluding that of CN) and resubmit the              ensures the old copy of the votes of CN is deleted com-
    total votes to P[FN]. The recounting and resubmit-             pletely.
    ting process will be iterated up the tree until some              Obviously, the resubmitting process and the deleting
    ancestor of FN that has not submitted votes yet.               process for the votes of CN are executed along two dif-
                                                                   ferent paths to root, so if the processes can reach the root,
                                                                   they are not likely to always arrive at the same time. To
                                                                   avoid losing votes or introducing redundant votes, after all
                                                                   children have submitted votes, the root node should wait
                                                                   for a period of time that is long enough for the two proc-
                  NPN'                   FN'                       esses to be finished. But the problem is that while the root
                                                                   is waiting for the repairing processes for one node, another
                                                                   node may happen to fail. The passive repairing processes
                                                                   will be called again, despite that the votes submitted by the
                                   CN'                             node may have reached the root. As a result, the root will
                                                                   be trapped in an ever-waiting deadlock.
                                                                      We adjust the passive repairing protocol by introducing
                              FN          NPN                      some active ingredient to avoid ever-waiting: If a node has
                                                                   submitted votes to its parent long before it finds the parent
                                                                   is dead (i.e., beyond a time limit ∆T which is much longer
                                                                   than the collecting time of the node), then it simply as-
                                   CN                              sumes that the parent has also submitted votes and there
 Figure 3. Repairing broadcast trees for best-quality collecting   are several ancestors that have received the votes. Because
                                                                   the parent (if it is not disconnected from the network) and
                                                                   every one of the ancestors that have kept the votes longer
Explanation of the Repairing Protocol                              than ∆T will try their best (just as the node itself does) to
                                                                   relay the votes to the root, it is not necessary for the node
To explain the repairing protocol, we first neglect the time
limit ∆T let ∆T =0). The main idea of the protocol is that
                                                                   to resubmit votes.
                                                                      With this compromised protocol, the loss of votes will
when a node finds its parent is faulty, it doesn’t know
                                                                   still occur if several nearest ancestors of a node disconnect
whether the submitted votes are lost or not, so it resubmits
                                                                   from the network concurrently, but the probability of such
votes to a new parent, and at the same time, the original
                                                                   loss is greatly less than before (the above mentioned
ancestors of the node try to eliminate the votes from the
                                                                   scenario).
node by recounting and resubmitting the total votes in a
bottom up manner. The repairing protocol is passive in             2.3 Efficiency
that although the old submitted votes may have been re-
layed to many ancestors, they are rolled back to start             Recall that before a node submits votes to its parent, it will
afresh.                                                            sum up the votes in the vote pool together with its own
   How well does this repairing protocol perform? In par-          votes. OntoVote does not export the summation method
ticular, can it really guarantee we collect exact one piece        but leaves it to application level. The implementation of
of votes from every node online even when several nodes            the method is highly related to the efficiency of OntoVote:
of the tree fail concurrently or subsequently?                     when more and more votes are collected, message packets
                                                                   that encapsulate votes will become larger and larger.
Without additional disposal, the packets will overwhelm            ontology. The common ontology is used to provide more
the network and the scalability of OntoVote will be no             powerful semantic search service.
better than that of a client-server model. So we should
follow several principles in the design of this method.            3.1    Process of Ontology Drift
   To understand our principles, one can view a message            We use the following process to cope with the drift of a
packet that contains votes as a sheet. Every vote has its          common ontology:
entry on the sheet. The number of entries a sheet can hold         (1) When a new group is created, the creator provides a
is limited. By combining the entries on several sheets                 basic common ontology.
together to form a new sheet and by relaying the new sheet
to parent node, a voting participant fulfils his collecting        (2) When a new user joins the group, he is provided with
task. Here are the design principles described with the                the currently available common ontology. If the user
above metaphor:                                                        is dissatisfied with the common ontology, he can
   • Merge identical entries. To save on the space of a sheet,         modify it with ontology visualization tools to create a
                                                                       local ontology. The visualization tools can map on-
      votes devoted to the same candidate should be merged
      together, that is, each entry should maintain a counter of       tology elements to application elements (e.g., direc-
      the votes that fall in this entry. Actually, this is the         tories, bookmarks, etc) to hide ontologies from ele-
                                                                       mentary users. They can also provide powerful sup-
      original meaning of votes counting.
                                                                       port for expert users to modify ontologies directly
   • Filter minor entries. Because the number of the entries a         and easily. In any case, the modifications of the local
      sheet can hold is limited, one should filter out the least       ontology are translated into the user’s votes on the
      counting entries when the sheet overflows. The contents          common ontology. By tracking user modifications,
      of the least counting entries are likely to be noisy or          system can also partly do the mapping between the
      unimportant to most users, so it is acceptable to filter         local ontology and the common ontology.
      them out.
                                                                   (3) At a proper time, all votes are collected and the
   • Choose a proper-sized sheet. Each application should              common ontology is modified with the voting results.
      choose a proper-sized sheet by simulation or by prob-
                                                                   (4) After the new common ontology is published to all
      ability analysis to avoid a phenomenon that we call
      “Entry Jolts”: although some vote may be very large              peers, the mapping between the local ontology and
                                                                       the new common ontology is adjusted again.
      altogether, it happens to be filtered out every time be-
      fore it can accumulate. “Entry Jolts” is determined by       (5) Iterate the above process.
      such factors as data makeup, data distribution on the        To put it simply, the whole is an iterated process. Let LO
      network, filtering algorithms and sheet sizes, etc. To       (Local Ontology) denote the local ontology of a user and
      reduce the probability of “Entry Jolts”, a large-sized       let CO (Common Ontology) denote the common ontology
      sheet is preferred, but we’d better get an upper bound of    of a community. LC is the mapping between LO and CO.
      the size of the sheet, size larger than which brings about   The iterated process can be described as follows:
      no evidently-good performance but more consumption           (1) LO=CO, LC=Identity Mapping
      of the network bandwidth.
                                                                   (2) User modifies LO. System records user modifications
In addition to the implementation of the summation                       and automatically adjusts LC.
method, other factors such as the balance of a broadcast
tree, which has been addressed above, also affect the ef-          (3) At a proper time, system translates modifications into
ficiency of OntoVote. But they are beyond our considera-                 votes, collects all votes and modifies CO background.
tions.                                                             (4) Publish CO to every peer and adjust LC again.
                                                                   (5) Goto (2).
3    Application of OntoVote to Ontology                           There are some issues that need to be further addressed in
     Drift                                                         the management of the ontologies involved in this process,
In section 1, we give our thought that colleting votes from        e.g., how to track user modifications? Why do we keep the
all users can drive the drift of a common ontology and in          mapping between the common ontology and the local
section 2, we show that it is possible to collect votes in P2P     ontology and how to do this mapping (Obviously, it is not
environments with OntoVote. In this section, we will try to        enough to do the mapping between two ontologies just by
apply OntoVote to the process of ontology drift. Our im-           tracking the changes of either ontology)? How to derive a
plementation of ontology drift is part of the knowledge            user’s modification proposals (or votes) on the common
acquisition module developed under our undergoing PICQ             ontology from his modifications of the local ontology?
project. PICQ is dedicated to paper sharing on P2P plat-           What the system should do if there is a conflict of opinions?
form with semantic web technologies. In PICQ, users are            In the following sections, we will discuss these problems
grouped according to research interest. Every group has a          one by one.
common ontology. New fields or new application re-
quirements will introduce new concepts or attributes in the
3.2    Tracking User Modifications                             pattern for the deleting of a sub-tree is defined and the root
                                                               node of the sub-tree is used as a parameter of the pattern.
The problem for tracking changes within ontologies or
within a knowledge base has been addressed in [Kiryakov           Our approach for tracking modifications is illustrated in
and Ognyanov, 2002]. [Kiryakov and Ognyanov, 2002]             Figure 5. The left structure of the figure denotes the initial
                                                               common ontology that the user imported from the com-
proposes using RDF statements (i.e. triples) instead of
resources or literals as the smallest trackable pieces of      munity; the right structure is the local ontology. A history
knowledge. The two basic types of updates in a repository      of modifications is recorded in a way something like that
                                                               of [Kiryakov and Ognyanov, 2002]:
are addition and removal of a statement. To track series of
updates that are bundled together according to the logic of
the application, it uses batch update that works with the      History:
repository in a transactional fashion.                         1. remove sub-tree(C): remove <C is_a A>, remove<D
                                                                    is_a C>
                                                               2. add <E is_a A>
                 DS                  DS
                                                               3.3    Maintenance of Ontology Mapping
                is_a              is_a                         In P2P applications, every user should be allowed to hold a
                                                               local ontology to keep his personality. He uses this local
                P2P                  P2P                       ontology to raise queries. The queries are translated into
                                                               the common ontology before being sent to other peers to
                              is_a         is_a                improve search efficiency and the recall of search results.
                                                               So it is of great importance to maintain the mapping be-
                           Pure             Hybrid             tween the local ontology and the common ontology.
                                                                  By tracking the modifications of the local ontology and
          Figure 4. Two Different Local Ontologies             the common ontology in step (2) and step (3) in the process
                                                               of ontology drift, we can partly do the mapping between
                                                               them. For instance, if a resource in either ontology is
                                                               renamed, the mapping can be adjusted. But how about
                                                               adding a new resource? How can we know whether or not
                 A                                             the new resource in one ontology can be mapped to some
                                                  A            old resource in another ontology? This problem may be
           is_a r1                                             solved with emergent semantics [Maedche and Staab,
                            History                            2001; Fensel et al., 2002]: after the user adds a new re-
           C                                is_a      r1       source, he queries with this new resource for a period of
                       B                    E                  time. During this period, emergent semantics helps to find
         is_a                                          B       out the mappings between the new resource and some
                                                               other resources in the common ontology or other local
           D                                                   ontologies (e.g. same file categorized to different concepts
Figure 5. Tracking Modifications from Initial Common Ontol-    indicates alignment). The derived mappings are expressed
               ogy to Current Local Ontology                   with probabilities, based on the number of the instances
                                                               that indicating alignment. Different peers may find dif-
                                                               ferent mappings, or same mappings with different prob-
In regard to reflecting the modification proposals of users,   abilities. At a proper time, all new resources and the
we think atomic update (additions or removals of state-        mappings among them are collected and coordinated with
ments) plus batch update is almost appropriate and only a      a new round of voting. Among the resources that can be
small change is made: because we treat modification            mapped to each other, the one that wins the vote is adopted
proposals as votes and OntoVote requires identical votes       by the common ontology, the noisy semantics (votes that
to be merged, but two batch updates that represent the         are below a threshold) is discarded and the rest are mapped
same proposals may not match exactly (e.g., in Figure 4,       to the one accepted. This process is something like the
both of two users propose the sub-tree rooted at the con-      revision of a dictionary in social life: after a new word
cept “P2P” be deleted, but the two sets of removed triples     emerges, people use this word in their communication and
can’t match exactly. One set contains two more triples         every one gets a scrap of the meanings of the word (e.g.,
than the other.), so we propose the application should         find synonyms of the word). When a dictionary is revised,
induce some patterns from batch updates, which reflect the     all meanings of the word is collected and validated, and if
intention of a user. Two batch updates are matched if and      necessary, the word is lexicalized.
only if both their patterns and the parameters of the pat-        In practice, the mapping results that are automatically
terns are matched. For instance, the deleting of the two       obtained and maintained may not always be sound, so a
sub-trees rooted at “P2P” in Figure 4 can be matched if a      semi-automatic ontology mapping mechanism is preferred,
                                                               i.e., advanced users are allowed to manually correct the
mapping results before or after any round of votes col-                  his local ontology with the common ontology, if he is
lecting.                                                                 dissatisfied with his local ontology or he does not want his
                                                                         opinions diverge too much from those of the masses, thus
3.4      Generation of Votes                                             the old modification records are cleared and the new
Before votes collecting, the modifications of the local                  modifications of the local ontology can be more easily
ontology should be translated into the modification pro-                 translated into those of the common ontology.\
posals (votes) on the common ontology. Or else, just as                     It should be noted that our system tries to translate every
section 3.2 has stated, although the proposals of two users              record in the modification history into a vote. If the
are identical, they cannot be merged.                                    modifications do not interact, this approach may work well.
   Currently, our system generates the votes in a rather                 However, sometimes an appropriate combination of addi-
simple way, which is mainly based on the mapping be-                     tions and removals may trigger complex "non monotonic"
tween the common ontology and the local ontology. Below                  updates of the ontology, e.g., a user first adds “B” as a
we discuss how our approach works in various conditions.                 sub-concept of “A” and adds “C” as a sub-concept of “B”,
   Firstly, if the mapping between the common ontology                   and later, he finds that “C” is unnecessary, so he deletes it
and the local ontology is well maintained, and the pattern               and adds “B” as a sub-concept of “A” directly. This se-
of the modification has been defined by the application,                 quence of modifications will be translated into a list of
then the translation is straightforward. For instance, in                votes. If most members of the community believe that “B”
Figure 6, assume concept “DC” is mapped to concept                       is a sub-concept of “A”, it is likely that the last proper
“Distributed Computing” and concept “P2P” is mapped to                   modification of the user will be accepted. However, if
concept “Peer-to-Peer”. If a user adds two new concepts                  many users make the similar wrong modifications before,
“Pure” and “Hybrid” under concept “P2P” in his local                     the wrong modifications may also be accepted. To filter
ontology, then we think the user proposes adding these                   out the wrong modifications, we’d better find out the last
concepts under concept “Peer-to-Peer” in the common                      determination of the user (or the real intention of the user)
ontology; If the user deletes the tree rooted at “DC”                    before we construct a vote from a sequence of related
completely, then we think the user proposes deleting the                 modifications. Unfortunately, our system has not realized
tree rooted at “Distributed Computing” completely.                       this object yet, and we will leave it to the future.

                                                                         3.5    Resolving Conflicts
                   Distributed                                           It is obvious that there are conflicts among all collected
                                                       DC
                    Computing                                            proposals, e.g., some users suggest deleting concept
                                                                         “Peer-to-Peer”, while others suggest adding “Pure” as a
                 is_a         is_a                  is_a                 sub-concept of “Peer-to-Peer”.
                                                                            One straightforward way to resolve a conflict is to
        Client-Server         Peer-to-Peer             P2P               choose among the conflicting proposals the one with the
                                                                         largest proportion. However, such simple con-
                                                                         flict-resolving mechanism will bring about the instability
        is_a      is_a                          is_a         is_a
                                                                         of the common ontology, especially when the proportion
                                                                         of the adopted proposal is not overwhelming.
 Flat          Hierarchical                  Pure               Hybrid      To understand this problem, assume there are totally
                                                                         100 users in the community. At the beginning, 60 percent
         Figure 6. Common Ontology and Local Ontology                    of them suggest adding the concept “Peer-to-Peer” (or
                                                                         concepts that are equivalences of “Peer-to-Peer”) to the
                                                                         common ontology, thus “Peer-to-Peer” is accepted in the
   Secondly, if no appropriate mapping information is
                                                                         common ontology. In the second round of voting, assume
obtained, then we either ignore the modification or
                                                                         30 users suggest deleting the concept “Peer-to-Peer”
transform the modification into a list of sub-modifications,
                                                                         (these 30 users may come from the previously 60 percent
according to the specification of the modification pattern.
                                                                         of users, or advanced users who modify common ontology
For example, in Figure 6, if no corresponding concept of
                                                                         directly, or users who reimport and modify their local
“DC” is found in the common ontology, then we either
                                                                         ontologies, etc.), while 10 users add sub-concepts “Pure”
discard the vote or split the vote to generate a new one to
                                                                         under “Peer-to-Peer” and the rest make no modifications.
delete the sub-tree rooted at the concept “peer-to-peer”.
                                                                         After the voting, if the system chooses to delete
   Thirdly, for some modifications (e.g., additions of
                                                                         “Peer-to-Peer”, then the local ontologies that still record
statements), the mapping information is not necessary at
                                                                         the addition of “Peer-to-Peer” will propose adding
all, i.e., additions of statements that have no relations with
                                                                         “Peer-to-Peer” to the common ontology again in the next
other resources in the common ontology are allowed in our
                                                                         round of voting. Because the proportion of this proposal is
system.
                                                                         still large enough, it may be adopted again by the common
   The last but not the least, advanced users are allowed to
                                                                         ontology, which causes the instability of the common
modify the local copy of the common ontology directly, if
                                                                         ontology.
they are pleased to do that. We also allow a user to replace
   Intuitively, We can get rid of the instability problem by       International Journal of Cooperative Information Systems
tracking the history of voting. But till now, this idea has        9:1/2, 2000, pp. 3-28.
not been tested yet. Instead, we take a rather simpler
measure to ease the instability problem, i.e., we set dif-         [Arumugam et al., 2002] Madhan Arumugam, Amit Sheth,
ferent thresholds for modifications with different patterns        I. Budak Arpinar. Towards Peer-to-Peer Semantic Web: A
to be accepted in the common ontology. Because the                 Distributed Environment for Sharing Semantic
common ontology is mainly used to improve search effi-             Knowledge on the Web. WWW2002 Workshop on Real
ciency and the recall of search results, it is better to contain   world RDF and Semantic web applications (2002).
more resources than not. So we set low thresholds for
additions of statements and high thresholds for deleting           [Nejdl et al., 2002] Wolfgang Nejdl, Boris Wolf, Changtao
operations. In this way, when a conflict occurs, it is more        Qu, Stefan Decker, Michael Sintek et al.. EDUTELLA: A
likely that a proposal with an overwhelming vote propor-           P2P Networking Infrastructure Based on RDF. In proc. of
tion exists, and that the opposite proposals may be too            WWW11, May 2002,Hawaii.
trivial to bring about instability.
                                                                   [JXTA] Project JXTA: An open, innovative collaboration.
4    Conclusions and Future Work                                   White paper, available at www.jxta.org.
Ontology drift is important in many requirement-sensitive          [Castro et al., 2002] Miguel Castro, Peter Druschel,
P2P applications. We proposed collecting the modification          Anne-Marie Kermarrec and Antony Rowstron. Scribe: A
proposals (votes) from all users to drive ontology drift. To       Large-scale    and     Decentralized   Application-level
collect votes on P2P platform, we presented OntoVote, a            Broadcast Infrastructure. IEEE Journal on Selected Areas
scalable distributed votes collecting mechanism based on           in Communication (JSAC), Vol. 20, No, 8, October 2002.
application-level broadcast trees. OntoVote is reliable in
that it leaves out as few votes as possible and avoids             [Zhuang et al., 2001] Shelley Q. Zhuang, Ben Y. Zhao,
counting in the same copy of votes repeatedly. We also             Anthony D. Joseph, Randy H. Katz and John Kubiatowicz.
summarized several design principles of vote counting for          Bayeux: An Architecture for Scalable and Fault-tolerant
OntoVote to work efficiently. And finally, we tried to             Wide-area Data Dissemination. In proc. of the Eleventh
apply OntoVote to the process of ontology drift with the           International Workshop on Network and Operating
discussion of several problems encountered in the appli-           System Support for Digital Audio and Video(N OSSDAV
cation.                                                            2001), Port Jefferson, NY, June 2001.
   In future, we will research ontology drift further by
obtaining more general modification patterns of ontology.          [Maedche and Staab, 2001] Alexander Maedche, Steffen
we will also try to make the drifting process more stable.         Staab. Ontology Learning for the Semantic Web. IEEE
Besides, we will research some further issues of voting,           Intelligent Systems, 16(2):72-79, March/April 2001.
such as voting security and the using of the opinions of
authoritative members. There is also a problem that                [Kiryakov and Ognyanov, 2002] Atanas Kiryakov and
common ontology based on voting will neglect the views             Damyan Ognyanov. Tracking Changes in RDF(S)
of an individual (or a small group of people) that brings          Repositories. In proc. of 13th International Conference on
real innovation and original perspectives on community’s           Knowledge Engineering and Knowledge Management
point of view. We will find out whether intercommunica-            EKAW02, Siguenza, Spain, 1-4 Oct. 2002.
tion between peers helps to solve this problem.

References
[Fensel et al., 2002] Dieter Fensel, Steffen Staab, Rudi
Studer and Frank van Harmelen. Peer-2-Peer Enabled
Semantic Web for Knowledge Management. Ontol-
ogy-based Knowledge Management: Exploiting the Se-
mantic Web, Wiley, London, UK, 2002.

[Sato et al., 2002] Hiroyuki Sato, Yutaka Abe and Atsushi
Kanai. Hyperclip: a Tool for Gathering and Sharing
Metadata on Users’ Activities by Using Peer-to-Peer
Technology. WWW2002 Workshop on Real world RDF
and Semantic web applications (2002).

[Nodine et al., 2000] Marian Nodine, Jerry Fowler,
Tomasz Ksiezyk, Brad Perry, Malcolm Taylor, Amy
Unruh. Active Information Gathering in Infosleuth. In