<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semantic Query Routing Experiences in a PDMS</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federica Mandreoli¤</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Martoglia¤</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wilma Penzoy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simona Sassatelli¤ ¤DII - University of Modena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reggio Emilia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>fmandreoli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>rmartoglia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>sassatellig@unimo.it</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>yDEIS - University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>- Querying a PDMS means either flooding the network with messages to all peers or taking advantage of a routing mechanism to reformulate a query only on the best peers selected according to some given criteria. As reformulations may lead to semantic approximations, we deem that such approximations can be exploited for locating the semantically best directions to forward a query to. In this paper, we present our experiences in devising and testing a mechanism for effective query routing in a PDMS. In particular, we describe a distributed index mechanism where each peer is provided with a Semantic Routing Index (SRI) for routing queries effectively. We illustrate SRIs' structure, their use and the framework we devised for their incremental update, then we provide an extensive evaluation of their effectiveness through a set of query routing experiments on a variety of scenarios. This work is partially supported by the PRIN WISDOM and FIRB NeP4B national projects.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>title
genre
plot
players</p>
    </sec>
    <sec id="sec-2">
      <title>I. INTRODUCTION</title>
      <p>
        Peer Data Management Systems (PDMSs) represent a recent
evolution of P2P systems, tuning database world semantic
expressiveness and P2P network flexibility. In such kind of
systems, each peer is enriched with a schema that represents
the peer’s domain of interests, and semantic mappings are
locally established between peers’ schemas [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Let us
consider Figure 1 as a sample scenario of a PDMS concerning
data about operas. Each peer is provided with a schema,
for instance XML-based, whereas bold lines denote semantic
mappings between pairs of peers. In order to query a peer in
a PDMS, its own schema is used for query formulation and
mappings are used to reformulate the query over its immediate
neighbors, then over their immediate neighbors, and so on.
Thus, query answers can come from any peer in the network
that is connected through a semantic path of mappings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
More precisely, any relevant peer may add new answers to a
given query and different paths to the same peer may yield
different answers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this context, query processing can
either flood the network with messages to all peers or take
advantage of a routing mechanism to reformulate a query only
on the best peers selected according to some given criteria. In
particular, in a semantic web perspective, a query posed over
a given peer should be forwarded to the most relevant peers
that offer semantically related results among its immediate
neighbors first, then among their immediate neighbors, and
so on. As an example, let us consider the following query,
posed on the schema of Peer1: “Retrieve the main singers
of the opera entitled Aida”. The Peer1’s neighbors Peer2 and
Peer3 might be considered mirroring peers as to the portion
music
lyrics
opera CD
title
      </p>
      <p>Peer5
pop</p>
      <p>concert
overture date singer
main vocalist</p>
      <p>booklet
opera title
name</p>
      <p>Peer6
authorsynopsis
biography</p>
      <p>Fig. 1. Reference example
of the schemas involved in the query above; as to the second
step of query reformulation, Peer5 is more relevant than Peer4
and Peer6, since it deals with lyric music data, instead of
written operas. For these reasons, the answers obtained from
path Peer3-Peer5 fit better the query conditions than those
from paths Peer2-Peer4-Peer6 and Peer2-Peer6.</p>
      <p>
        In this paper, we describe our experiences in devising
and testing a mechanism for semantic query routing in a
PDMS [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Since, when a query is forwarded through
a semantic path it undergoes a multi-step reformulation which
may involve a chain of semantic approximations, our proposal
is to exploit such approximations for selecting the direction
which is more likely to provide the best results to a given query.
Our perspective is knowledge-based, in that the routing of a
query is guided by the semantic mappings between the peers.
In particular, we leverage on scores which extend the semantic
mappings in order to measure the semantic compatibility
between neighboring peers. However, the information provided
by the semantic mappings stored in a peer is not enough for
deciding which is the best path. In fact, being Peer2 and Peer3
mirrors, the semantic approximation of the query would be
measured as identical for both directions. Therefore, broadly
speaking, some kind of information about the relevance of
the whole semantic paths should be available in the network,
maintained up-to-date, and easily accessible for query routing
purposes. The solution we propose relies on a distributed-index
mechanism called Semantic Routing Index (SRI) which
summarizes, for each concept of its peer’s schema, the semantic
approximation “skills” of each subnetwork reachable from its
immediate neighbors, and thus gives a hint of the relevance
of the data which can be reached in each path. For instance,
the Peer1’s SRI will contain two entries, one for the upward
subnetwork and one for the downward one. The semantic
knowledge stored in a SRI is summarized on the available
directions in order to maintain the size of the semantic index
proportional to the number of neighbors, thus scaling well in
a PDMS scenario.
      </p>
      <p>In the following sections: We first introduce SRIs (Section
II) by describing their structure, the framework we devised for
their incremental update and their use as distributed indices
for query routing, then we provide an extensive evaluation
(Section III) of their effectiveness through a set of query
routing experiments on a variety of scenarios. Finally, in
Section IV we relate to other approaches in the literature and
we discuss future extensions to our work.</p>
    </sec>
    <sec id="sec-3">
      <title>II. SEMANTIC ROUTING INDICES</title>
      <p>
        In our reference scenario each peer pi in a set of peers
P stores local data, modelled upon a local schema Si that
describes its semantic contents, and it is connected through
semantic mapping to some other neighboring peers. A semantic
mapping M (Si; Sj ) can be established from a source schema
Sj to a target schema Si, and it defines how to represent Si
in terms of Sj ’s vocabulary. In particular, it associates each
concept in Si to a corresponding concept in Sj according to a
score, denoting the degree of semantic similarity between the
two concepts. A formal definition of semantic mappings can
be found in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where the fuzzy theoretical framework of our
work is presented.
      </p>
      <p>In order to efficiently answer a query in such a distributed
context, query routing appears as a key issue. In particular, it
is fundamental to be able to forward the query towards the
zones of the network which can potentially provide results
semantically nearest to it.</p>
      <p>Our idea is that each peer maintains cumulative information
summarizing the semantic approximation capabilities w.r.t.
its schema of the whole subnetworks routed by each of its
neighbors. In particular, each peer keeps such information in
a local data structure which we call Semantic Routing Index
(SRI). Thus, a peer p having n neighbors and m concepts in its
schema stores an SRI structured as a matrix with m columns
and n + 1 rows, where the first row refers to the knowledge
on the local schema of peer p. Each entry SRI[i][j] of this
matrix contains a score expressing how the j-th concept is
semantically approximated by the subnetwork routed by i-th
neighbor, i.e. by each path in the pj ’s subnetwork.</p>
      <p>A sample fragment of Peer1’s SRI from the reference
example is shown in Fig. 2. Notice that the scores of the first
row represent the scores of the concepts of Peer1 in the self
mapping, which we assume to be the identity relation. Further,
the space required for storing a SRI at a peer is proportional to
the number of the peer’s neighbors, and thus quite modest w.r.t.
opera
1.0
0.6
0.7
main singer
1.0
0.4
0.6
…
…
…
…
the number of peers which usually join a PDMS. This makes
our distributed-index mechanism scalable in a P2P context.</p>
      <sec id="sec-3-1">
        <title>A. SRI Evolution Framework</title>
        <p>Since SRIs summarize the semantic information offered by
the network, they change whenever the network itself changes.
This may occur in response to either the joining/leaving of
peers, or to changes in peers’ schemas. We first focus our
attention on the evolution of the PDMS’s topology.</p>
        <p>SRIs evolution is managed in an incremental fashion as
follows. As a base case, the SRI of an isolated peer p
having schema S is made of the single row [1; : : : ; 1], i.e.,
it contains the membership grades of the concepts in S in the
self mapping. This row expresses the semantic approximation
offered by the subnetwork rooted in p, yet made of the only
peer p.</p>
        <p>A simplification of the process of Peer1’s SRI update
when Peer1 connects to Peer2 is shown in Fig. 3 (P1 and
P2 in the figure). When a peer connects to another peer,
each one aggregates its own SRI SRI by rows, according
to an appropriate aggregation function g. The result of this
aggregation operation and the schema S are then sent to the
other peer. After a peer, say pi, receives such knowledge
from the other peer, say pj , a semantic mapping M (Si; Sj )
is established between Si and Sj . Then, pi extends its SRI
SRIi with a new row for pj . The membership grades of this
newly created row are obtained in two steps: 1) M (Si; Sj ) is
composed, according to an appropriate composition function,
with the aggregated SRI provided by pj to obtain the extension
of the semantic paths originating from pj (represented by the
aggregated SRI) with the connection between pi and pj ; 2)
the so obtained result is then aggregated with M (Si; Sj ) to
include the semantic path connecting pi with pj .</p>
        <p>
          Aggregation and composition are operations for which a
fuzzy interpretation is given in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Possible choices for
composition are the algebraic product or the minimum. Several
options are also possible for aggregation, for instance
functions such as the maximum, the algebraic sum or the travel
function, which is inspired to a function commonly used in
travel demand applications when modelling the aggregation of
several alternatives [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. To verify the effectiveness of different
alternatives, we performed several exploratory tests, whose
results are given in Section III.
        </p>
        <p>Afterwards, both peers pi and pj need to inform their own
other neighbors that a change occurred in the network and thus
they have to update their SRIs accordingly. To this end, each
peer, say pi, sends to each other neighbor pik an aggregate
of its SRI, excluding the pik ’s row. When pik receives such
opera writer …
1.0 1.0 …
0.5 0.2 …
0.8 0.3 …
P2</p>
        <p>P6
SRIP2 drama title …
P2 1.0 1.0 …
P4 0.3 0.7 …
P6 0.4 0.4 …</p>
        <p>Updates for P2
aggregated information, it updates the i-th row of its SRI by
recomputing the membership values as discussed above.</p>
        <p>Disconnections are treated in a similar way as connections.
When a node disconnects from the network, each of its
neighbors must delete the row of the disconnected peer from
its own SRI and then inform the remaining neighbors that a
change on its own subnetwork has occurred by sending new
aggregates of its SRI to them. A similar procedure applies in
case of modifications of the semantic knowledge maintained
at each peer, for instance when a new concept is added to the
peer’s schema. When many changes occur in the PDMS, a
careful policy of updates propagation may be adopted. For
instance, when changes has a little impact on its SRI, a
peer may also decide not to notify the network. This would
reduce the amount of exchanged messages as well as the
computational costs due to SRI manipulation.</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Exploiting SRIs for Query Routing</title>
        <p>
          When a peer p needs to forward a query q, it accesses
its own SRI for determining the neighboring peers which
are most semantically related to the concepts in q. For the
sake of clarity, we start simple, and we assume the query q
refers to a single concept C. The choice of the semantically
best neighboring peers is done by evaluating the column
corresponding to C in the local SRI. In particular, the highest
values in this column make the corresponding neighbors to
be the selected peers. For instance, given the query main
singer on Peer1 whose SRI is shown in Fig. 2, Peer3 will
be preferred to Peer2. In the general case of a complex query
involving more concepts, the choice of the best neighbors is
given by applying scoring rules which, for each neighboring
peer pi, combine the corresponding grades in the SRI for all
the corresponding concepts in q. As to how these values can be
effectively combined, see [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Once the best neighboring peer
pi is found, the corresponding semantic mapping M (Si; Sj )
is exploited to unfold the query q in q0. q0 is then routed
towards the subnetwork, where, starting from pi which in
turn evaluates q0 and returns its local results to p, the process
possibly reiterates.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>III. SRI IN ACTION In this section we discuss a selection of experiments we performed to test the effectiveness of SRIs for query routing.</title>
      <sec id="sec-4-1">
        <title>A. Experimental Setting</title>
        <p>
          For our experiments we used a simulation framework able
to reproduce the main conditions characterizing a PDMS
environment. In particular, we employed SimJava 2.0, a discrete,
event-based, general purpose simulator, which allowed us to
evaluate the impact of exploiting SRIs for query routing.
Through this framework we modelled scenarios corresponding
to networks of semantic peers, each with its own schema
describing a particular reality. We chose peers belonging to
different semantic categories, where the schemas of the peers
in the same category describe the same topic from different
points of view. As in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], the schemas are derived from real
world-data sets, collected from many different available web
sites, such as the DBLP Computer Society Bibliography and
the ACM SIGMOD Record and enlarged with new schemas
created by introducing structural and terminological variations
on the original ones. Then, we distributed these schemas in the
network in a clustered way, i.e. the schemas belonging to the
same semantic category are more likely to belong to peers
connected through semantic mappings. This reflects realistic
scenarios where nodes with semantically similar content are
often clustered together. As to the topology of the semantic
network of mappings connecting the peers, we tested our
techniques on different alternatives: We started with tree
networks and then explored more realistic and uncontrolled ones
generated with the BRITE topology generator tool [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The
mean size of our networks was in the order of some hundreds
of nodes. As we will see, this is sufficient for our effectiveness
testing purposes. In particular, in these last cases, we also
exploited the Barabasi option, which exploits mathematical
distributions in order to generate topologies similar to actual
networks. In such complex networks the problem of cycles can
not be ignored. Indeed, in order to avoid the presence of cyclic
paths in the SRI updates propagation, when a peer connects
to the network a cycle detection mechanism based on global
unique identifiers, as in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], is adopted.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>B. SRI Evaluation</title>
        <p>In order to evaluate the effectiveness of our approach, we
started with an initial testing phase executed on a simple tree
network. In Figure 4-a a portion of this network is depicted,
where peers belonging to the same category are identified by
the same color. In particular, peers in the figure belong to
three different categories: sport (peers 2 and 4), music (5, 7,
8 and 9) and publications (1, 3, 6 and 10). We considered two
alternative scenarios: the original one as shown in Figure 4-a,
and the one obtained by swapping the peers included in the
dotted regions. In Figure 4-b the first two tables show how the
scores in peer 5 routing index change when its subnetwork,
originally including three peers of the same peer 5 category,
is replaced by a subnetwork of peers belonging to a different
category. In these tables, for each concept, six different scores
2
1</p>
        <p>Peer5 &gt;7 Max-Prod Max-Min Sum-Prod Sum-Min Tr-Prod Tr-Min
tracklist 0.0251 0.2689 0.0420 0.2689 0.0093 0.1685
track 0.0080 0.1325 0.0122 0.1325 0.0025 0.1326
singer 0.0647 0.3750 0.1042 0.3750 0.0253 0.3188
albumTitle 0.0686 0.3848 0.1080 0.3848 0.0269 0.2333
Peer5 &gt;1 Max-Prod Max-Min Sum-Prod Sum-Min Tr-Prod Tr-Min
tracklist 0.0079 0.1907 0.0141 0.1982 0.0026 0.0945
track 0.0000 0.0530 0.0000 0.0530 0.0002 0.0444
singer 0.0409 0.2153 0.0642 0.2153 0.0161 0.2153
albumTitle 0.0023 0.0127 0.0027 0.0127 0.0002 0.0127
Gr Ratio Max-Prod Max-Min Sum-Prod Sum-Min Tr-Prod Tr-Min
tracklist 3.18 1.41 2.98 1.36 3.51 1.78
track n/a 2.50 n/a 2.50 14.77 2.99
singer 1.58 1.74 1.62 1.74 1.58 1.48
albumTitle 29.83 30.30 40.00 30.30 174.44 18.31
(a) Experimental scenario
(b) Effectiveness of different functions on Peer5’s SRI
are reported, corresponding to the results obtained applying
different mathematical functions implementing aggregation
and composition operations. In particular, the possible tested
alternatives for aggregation and composition are: a) maximum
and product; b) maximum and minimum; c) algebraic sum and
product; d) algebraic sum and minimum; e) travel function
and product; f) travel function and minimum. In this type
of tests, the key parameter for effectiveness evaluation is the
growth ratio, i.e. the measure of how bigger are the scores of
the original scenario w.r.t. the alternative one; we show these
values in the last table of Figure 4. As expected, the scores
of the original scenario are significantly higher (growth ratio
greater than 1), reflecting that peer 5 concepts are semantically
approximated in a better way by the subnetwork of peers
belonging to the same peer 5 category. As to the use of the
composition function, all the combinations involving product
show a higher growth ratio (for example for “albumTitle”
we have 40 with algebraic sum and almost 175 with travel
function). As to the use of the aggregation function, all the
possibilities show a satisfying behavior, but we observed that
the travel function more clearly discriminates the “good”
subnetworks. Thus, in the following tests we will refer to the
product and travel functions.</p>
      </sec>
      <sec id="sec-4-3">
        <title>C. Query routing evaluation</title>
        <p>In the second phase of our testing, we considered larger
and more complex network topologies and we simulated the
querying process by instantiating different queries on
randomly selected peers and concepts and propagating them until
a stop condition is reached. We considered two alternatives:
stopping the querying process when a given number of peers
(messages) has been queried and measuring the quality of
the results (satisfaction) or, in a dual way, stopping when
a given satisfaction is obtained and measuring the required
number of messages. Satisfaction is a specifically introduced
quantity that grows proportionally to the goodness of the
results returned by each queried peer. Each contribution is
computed by composing the semantic mappings scores of the
traversed peers (the same composition function exploited for
SRI construction is applied). The search strategy employed
is the depth-first search (DFS): Each peer receiving a query
produces an answer on its local schema, then selects its most
promising neighbor among the unvisited ones and forwards the
query to it. In this section, we compare our neighbor selection
mechanism based on SRIs (SRI) with a mechanism where
the selection of the next neighbor to be visited is based on
the semantic mapping values (SemMap) and with a baseline
corresponding to a random strategy (Random). Notice that all
the results we present are computed as a mean on several
hundreds query executions.</p>
        <p>We started by studying the behaviour of the different routing
strategies when we gradually vary the stop conditions in tree
networks. Figure 5-a shows the trend of the number of required
messages for a given satisfaction goal, while, from a dual
perspective, Figure 5-b shows the trend of the obtained
satisfaction at a given message limit. From both points of view, the
Random strategy is outdistanced by the other two. Further, the
difference between the SemMap and SRI performance appears
closer at the initial part of the graphs but becomes increasingly
more significant at growing stop conditions. This means that
SRIs are indeed able to discriminate better subnetworks to
explore and consequently increase the satisfaction at each step
in a more substantial way. Nevertheless, tree topologies may
not be considered completely realistic for a PDMS setting and
may facilitate the SRI routing process, because in such kind
of topologies different subnetworks are not overlapped and it
is consequently probably easier to identify the better ones.</p>
        <p>For these reasons, we deepened our tests by introducing
realistic network topologies, also involving redundant and
cyclic paths. The results referring to these situations are shown
in Figure 6, which presents the graphs of Figure 5 for the
new environments. Even though the distance between SRIs
and SemMap is slightly reduced due to the complications
introduced, we can see that, despite the new unfavourable
scenarios, the trend of these graphs are roughly similar to the
previous ones. Specifically, even in this case, we can observe
that the SRI curves are clearly separated from the others,
reflecting that the SRIs’ ability to identify the best subnetworks
s
e
g
a
s
s
e
m
f
o
r
e
b
m
u
N
s
e
g
a
s
s
e
m
f
o
r
e
b
m
u
N</p>
        <sec id="sec-4-3-1">
          <title>Random</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>SemMap</title>
          <p>SRI
2.5
11.76
5.51
4.09</p>
          <p>3
13.06
6.02
4.81
1.6
1.55
1.5
to explore facilitates a faster retrieval of the highest ranked
results.</p>
          <p>The previous tests show that by exploiting SRI query
routing performance is improved. However, what is the optimal
trade-off between such improvements in routing effectiveness
and the cost of SRIs’ updates? In order to give an answer
we performed another typology of experiments (Figure 7),
aiming to verify how the performances of the SRI routing
mechanism are affected when we vary the update horizon,
i.e. how far (number of hops) an update on a peer schema
propagates in the network, limiting the portion of the nodes
the SRI scores summarize. The curves represent the messages
trend for growing values of the horizon, starting from the
baseline 1 which corresponds to the SimMap case, and for two
different satisfaction goals. Observing the graph, it is clear that
increasing the horizon allows us to perform a better routing
mechanism, because it relies on more precise information. In
particular, from our tests, horizon extensions up to 8 clearly
provide significant benefits, then the results appear to stabilize
(see Figure 7). Notice that the use of an horizon, limiting the
updates propagation for the SRIs scores, clearly introduces a
kind of approximation on the information stored, but it is also
useful in limiting the SRI maintenance costs. Therefore, due to
the above considerations, since higher horizons lead to larger
propagation costs, we consequently estimate that an horizon
value of 8 represents a good trade-off (this is also the value
at which all tests in this section have been performed).</p>
          <p>Finally, the last type of test we present explores a possible
enhancement on the routing process, involving a mechanism of
pruning on the selection of the paths to follow: We introduce a
threshold for the selection of neighbors to which propagate the
queries, preventing the exploration of those paths characterized
by small mapping and/or SRI scores and consequently leading
to uninteresting results. The graph shows the results for both
SemMap and SRI routing strategy, expressing the number of
messages necessary to reach a given satisfaction goal when
we apply three different pruning thresholds. Notice that when
we use a zero threshold, and consequently apply no pruning
mechanism, we obtain the same results presented earlier in
the section. As can be seen, increasing the threshold leads
to significant savings in the number of messages for both
strategies. However, performing pruning on the basis of SRIs
appears to perform better in every situation, showing a small
but consistent number of saved messages w.r.t SemMap.</p>
          <p>IV. RELATED WORK AND CONCLUDING REMARKS
As envisioned by the Semantic Web, the need of
complementing the Web with more semantics has spurred much
efforts towards a rich representation of data. To this end,
knowledge representation languages (e.g., XML, RDF, and
sat 1.5
sat 1.6</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>SemMap SRI</title>
          <p>SRI horizon
SemMap SRI (h=2) SRI (h=4) SRI (h=6) SRI (h=8) SRI (h=10)
24.89 24.42 23.69 22.94 22.84 22.83
26.13 25.67 24.26 23.55 23.29 23.27</p>
        </sec>
        <sec id="sec-4-3-4">
          <title>Pruning off 24.89 22.84</title>
        </sec>
        <sec id="sec-4-3-5">
          <title>Pruning med 7.73 7.2315</title>
        </sec>
        <sec id="sec-4-3-6">
          <title>Pruning high 3.45 3.09</title>
          <p>
            (a) ...SRI horizon
(b) ...pruning
OWL) has flourished in recent years. In this view, peer data
management systems (PDMSs) have been introduced as a
solution to the problem of large-scale sharing of semantically
rich data [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. Indeed, a key challenge when querying a large set
of peers is query routing, i.e., the capability of selecting a small
subset of relevant peers to forward a query to. Much research
work in the P2P area has focused on this issue [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ],
[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ], [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. Some of these works discuss
id/keyword-based search of documents [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], some
assume a common vocabulary/ontology is shared by peers in
the network [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], some address scalability of query routing
by means of a properly tailored super-peer topology for the
network [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], or by adapting their own semantic topology
according to the observation of query answering [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
          </p>
          <p>
            Most of these proposals are based on IR-style and
machinelearning techniques [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ], [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]. Basically,
they utilize measures that rely on keyword statistics, on the
probability of keywords to appear into documents, on the
number of documents that can be found along a path of
peers, on caching/learning-from the number of results returned
for a query. Then, all of them (but [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]) provide routing
techniques which either assume distributed indices which are
indeed conceptually global [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], or support completely
decentralized search algorithms which, nevertheless, exploit
information about neighboring peers only. More precisely, the
only work [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] proposes a routing mechanism which does not
limit the peer’s capability of selecting peers to the information
available at a 1-hop horizon, rather it extends this view
by using summaries of subnetworks’ content to provide a
direction to send a query to.
          </p>
          <p>
            Nevertheless, querying a PDMS is different than querying
a P2P system, primarily because of the presence of
heterogeneous schemas at the peers. On the other hand, the
novelty in a PDMS lies in its ability to exploit the transitive
relationships among such schemas for query answering [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ],
[
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. In this scenario, our work aims to support query routing
in a PDMS, and it appears to be the first having this purpose.
The main differences between our proposal and the P2P
techniques discussed above are: 1) We do not assume any
global characterization of documents in the network; 2) We
move in a PDMS scenario, then assuming the presence of
schemas describing the content of peers’ data, as well as
pairwise semantic relationships between the peers’ schemas;
3) We make a schema-based rather than a key(word)-based
search; 4) inspired to [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], we rely on fully distributed semantic
indices, called SRIs, which summarize the semantics (rather
than the number of documents as in [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]) that can be retrieved
following a given direction in the network; 5) in order to
cope with schema heterogeneity, we rank (subnetwork of)
peers according to the semantic similarity occurring between
concepts in the peers’ schemas. The experiments we conducted
on a simulation environment led to very encouraging results.
          </p>
          <p>
            As a final consideration, it should be noted that the focus
of this paper was not on schema matching effectiveness,
even if the approach for query routing we propose is heavily
dependent on the used matches. Effective approaches such as
[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], which exploits the semantics and the structure of the
available schemas, or [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ], which also exploit the information
given by surrounding nodes, could be adopted for schema
matching.
          </p>
          <p>
            In the future, SRIs could be integrated in a more general
framework together with other approaches such as [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]
which are orthogonal to ours, and which cover complementary
aspects such as knowledge on quantitative information, as well
as on novelty of results, so as to blend different dimensions
a peer can be queried on. Then, as also stated in [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], the
best peer has been understood as a peer that has the most
knowledge. Other aspects one might include in the evaluation
of peers are properties like latency, costs, etc.
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ives</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Suciu</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Tatarinov</surname>
          </string-name>
          , “
          <article-title>The Piazza Peer Data Management System,”</article-title>
          <source>IEEE TKDE</source>
          , vol.
          <volume>16</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>787</fpage>
          -
          <lpage>798</lpage>
          ,
          <year>July 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Tane</surname>
          </string-name>
          , “EDUTELLA:
          <article-title>Searching and Annotating Resources within an RDF-based P2P Network</article-title>
          .”
          <source>in Proc. of WWW Intl. Workshop on the Semantic Web</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kantere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kementsietsidis</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kiringa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and J.</given-names>
            <surname>Mylopoulos</surname>
          </string-name>
          , “
          <article-title>The hyperion project: from data integration to data coordination,” SIGMOD Record</article-title>
          , vol.
          <volume>32</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Tatarinov</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , “
          <article-title>Efficient Query Reformulation in Peer Data Management Systems,”</article-title>
          <source>in Proc. of SIGMOD</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mandreoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martoglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tiberio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sassatelli</surname>
          </string-name>
          , and W. Penzo, “
          <article-title>Using Semantic Mappings for Query Routing in a PDMS Environment,”</article-title>
          <source>in Proc. of SEBD</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mandreoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martoglia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sassatelli</surname>
          </string-name>
          , and W. Penzo, “SRI:
          <article-title>Exploiting Semantic Information for Effective Query Routing in a PDMS,”</article-title>
          <source>in Proc. of WIDM</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>“</surname>
            <given-names>BRITE</given-names>
          </string-name>
          ,” http://www.cs.bu.edu/brite/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Crespo</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          , “
          <article-title>Routing Indices for Peer-to-</article-title>
          <string-name>
            <surname>Peer</surname>
            <given-names>Systems</given-names>
          </string-name>
          ,”
          <source>in Proc. of ICDCS</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>I. Stoica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kaashoek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          , “
          <article-title>Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications,”</article-title>
          <source>in Proc. of SIGCOMM</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Aberer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudre</surname>
          </string-name>
          ´-Mauroux, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hauswirth</surname>
          </string-name>
          , “The Chatty Web: Emergent Semantics Through Gossiping,”
          <source>in Proc. of WWW</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , “
          <article-title>Using Information Retrieval Techniques to Route Queries in an InfoBeacons Network,”</article-title>
          <source>in Proc. of DBISP2P</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Siebes</surname>
          </string-name>
          , , and
          <string-name>
            <surname>F. van Harmelen</surname>
          </string-name>
          , “
          <article-title>Peer Selection in Peer-toPeer Networks with Semantic Topologies,”</article-title>
          <source>in Proc. of ICNSW</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Koloniari</surname>
          </string-name>
          and E. Pitoura, “
          <article-title>Content-Based Routing of Path Queries in Peer-to-</article-title>
          <string-name>
            <surname>Peer</surname>
            <given-names>Systems</given-names>
          </string-name>
          ,”
          <source>in Proc. of EDBT</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bender</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Triantafillou</surname>
          </string-name>
          , and G. Weikum, “IQN Routing:
          <article-title>Integrating Quality and Novelty in P2P Querying and Ranking,”</article-title>
          <source>in Proc. of EDBT</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          et al., “
          <article-title>Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-to-Peer Networks,”</article-title>
          <source>in Proc. of WWW</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tempich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Wranik</surname>
          </string-name>
          , “REMINDIN':
          <article-title>Semantic Query Routing in Peer-to-Peer Networks Based on Social Metaphors,”</article-title>
          <source>in Proc. of WWW</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mandreoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martoglia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Tiberio</surname>
          </string-name>
          , “
          <article-title>Approximate Query Answering for a Heterogeneous XML Document Base,”</article-title>
          <source>in Proc. of WISE</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , “
          <article-title>Corpus-based schema matching</article-title>
          ,”
          <source>in Proc. of ICDE</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>