1


                    Supporting Evacuation Missions with
                    Ontology-Based SPARQL Federation
                                Audun Stolpe, Jonas Halvorsen and Bjørn Jervell Hansen
                                     Norwegian Defence Research Establishment (FFI)
                                                        P O Box 25
                                                   2027 Kjeller, Norway
                            Email: {audun.stolpe | jonas.halvorsen | bjorn-jervell.hansen}@ffi.no


Abstract—We study ontology-based SPARQL federation in sup-          The main contribution of this paper consists in defining a
port of coordinated action by deployed units in military oper-      novel federation strategy specifically designed for domains
ations. It is presumed that bandwidth is limited and unstable.      that share the general characteristics of this case. The most
Thus, we need an approach that generates few HTTP requests.         important characteristics are firstly that bandwidth is limited
Existing techniques employ join-order heuristics that may cause     so the total communication costs induced by the number of
requests to multiply as a factor of the number of joins in a
query. This can easily lead to an amount of traffic that exceeds
                                                                    HTTP request is a non-negligible factor, and secondly that the
network capacity. We propose an approach that builds an in-         network topology is dynamic, i.e. sources may come and go.
memory excerpt of the remote sources, sending one request to        Our aim is thus to define a federation strategy that is sound and
each source. A query is answered against this excerpt, which is a   complete with respect to query answering, issues a minimal
provably sound and complete representation of the sources wrt.      number of HTTP requests, and is compatible with run-time
query answering. The paper ends with a case study involving         detection of sources.
three military sources used for planning evacuation missions.
                                                                    The paper is organized as follow: In Section II we identify a
                                                                    list of tentative desiderata that our federation strategy should
                                                                    satisfy. The desiderata points to using an ontology-based data
                      I.   I NTRODUCTION                            access paradigm, which is explained in Section III. Section
                                                                    IV outlines our solution, which is based on querying against
The planning of evacuation missions is a complex and impor-         an excerpt, or cropping, of the remote sources relative to an
tant process in military operations. One of the most challenging    incoming query. The case study is presented in Section VI, and
aspects, is making all necessary information available to the       the main experiences drawn from the case study is presented
decision makers. These information fragments will typically         in Section VII.
be distributed across different systems.
                                                                    The paper assumes familiarity with W3C’s Semantic Web
This is particularly the case when the military force con-          technology stack, in particular RDF, OWL, and SPARQL.
ducts its operations according to network-based concepts, like      Readers not familiar with these technologies are referred to
NATO’s Network Enabled Capability, henceforth NNEC [1].             [2], [3], and [4] for an introduction.
The primary objective when conducting operations according
to this concept, is to support the creation of a high degree of
shared situational awareness among decision makers in order                   II.   C HARACTERISTICS OF THE DOMAIN
to obtain increased mission effectiveness. A prerequisite for
achieving this, is extensive information sharing and a robust       The NNEC concept presupposes a network-based environment
scheme for information integration, enabling decision makers        in which information about own and enemy units is typically
to retrieve and utilize all relevant information when needed.       distributed across several autonomous data sources contributed
So far, the emphasis of the technical work on NNEC has              by coalition members. In order to support evacuation planning,
been on how to make information available throughout the            these information fragments need to be integrated in order for
environment. However, in order for NNEC to be of use                the decision makers to obtain the highest possible degree of sit-
to decision makers, the challenge of establishing a robust          uational awareness. This involves tackling some idiosyncratic
scheme for information integration ultimately also needs to         challenges:
be addressed. This is the focus of the research reported on in
this paper.                                                           • The information systems are in general semantically het-
                                                                        erogeneous, especially in coalition operations, and cannot
We present an information integration approach that combines            be accessed in a coherent and unified way,
query rewriting with data federation, and we study it in relation     • the underlying communication network often relies on IP
to an example from military evacuation planning based on live           radios, and is hampered by limited bandwith, latency, and
reporting of incidents over IP radio networks.                          limited range,


                                               STIDS 2013 Proceedings Page 141
                                                                                                                                       2


  • the network topology is highly dynamic, meaning that           To that end, sources are selected at query time based on the
    information systems can appear and disappear at any time,      outcome of the reasoning process, by inspecting DNS records
    and                                                            that are multicasted in the network (cf. section VI.). The
  • the shared information is mission-critical, which makes        entire federation process can thus be seen as comprised of
    it crucial that the integration scheme yields correct and      two distinct steps. First, the query is rewritten into a query
    exhaustive data.                                               expressed directly in terms of the data according to the domain
                                                                   model expressed by the ontology. Next, the rewritten query
These characteristics means that we want to define an infor-       is decomposed into sub-queries that yield a set of mutually
mation integration approach that:                                  exhaustive partial answers extracted from each of the selected
                                                                   sources.
 A allows a user to access available sources in a unified way,
 B utilizes the available bandwith efficiently, particularly by    It is not automatically the case, however, that the above men-
   restricting the number of HTTP requests to the remote           tioned steps are separable. That is, depending on the expressive
   sources,                                                        power of the ontology language, reasoning may require data-
 C allows the relevant sources to be discovered at run-time,       access. This is not the case for the class of ontology languages
   and                                                             that are first-order rewritable (cf. [5], [6]). This notion was first
 D guarantees the soundness and completeness of query              introduced by Calvanese et al. ([5]) in the context of the class
   answering.                                                      of ontology languages called description logics. A description
                                                                   logic L—more generally an ontology language—is first-order
            III.   O NTOLOGY- BASED DATA ACCESS                    rewritable if, for every ontology ⌃ expressed in L and a query
                                                                   Q , Q can be compiled into a first-order query Q⌃ that A)
Based on the desiderata from section II, we decided to use an      compiles away all concepts from the ontology ⌃, and B) is
ontology to mitigate heterogeneities and to provide uniform        such that given a data repository R, Q⌃ evaluated over R
access to the data. That is, we based our approach on the          yields exactly the same result as Q evaluated against R and
paradigm usually called ontology-based data access in which        ⌃.
a conceptual model—the ontology—is used to express the             First-order rewritable ontology languages is a crucial presuppo-
relationship between the content of the respective sources, and    sition behind our approach. It is important precisely because it
to act as a single query interface towards them.                   ensures that the reasoning process can be decoupled from data
According to the W3C Web Ontology Working Group1 , an              access. This has two hugely beneficial consequences: First,
ontology defines a set of concepts or term used to describe        the complexity of reasoning remains unaffected as data size
and represent some domain of information in an abstract way        increases. In other words the computation time allocated to
that gives a formal semantics to the data in question. More        reasoning will not vary with changes in the network topology
specifically, an ontology gives the semantics of the data in the   and/or availability of sources—recall that we do not assume
form of a set of logical axioms that explicate the relationship    these to stay fixed. Secondly, since reasoning can be decoupled
between classes of data items, and it enables computers to         from data access, it does not affect the federation process per
reason over the data as part of the process of answering a         se. That is, source selection can be performed independently of
query. One particular form that this process can take, is that     the reasoning process, which means, among other things, that
in which a query formulated in terms of the concepts of the        a query which is run repeatedly will only have to be rewritten
ontology is successively refined until the query can be executed   once.
directly against the data. This is usually referred to as query    As always there is a price to pay, though. The property of
rewriting and forms the basis for our approach, as explained       first-order rewritability imposes a serious constraint on the
in the next section.                                               expressivity of an ontology language, and, as explained in
Ontology-based data access is useful in all scenarios in which     section VII, must be very carefully selected in order to be able
accessing data in a unified and coherent way is difficult. This    capture the salient aspects of our case. In particular, it turns
may happen for several reasons. The data sources may have          out that none of the standard fragments of the W3C-endorsed
been developed for different purposes by different agencies        ontology language OWL will do.
or institutions, may not have a coherent design, and may not       As regards HTTP-minimality (by which we mean keeping the
record similar types of information in the same manner. A          number of required HTTP requests as low as possible), we
well-designed ontology gives a unified view of the domain in       found existing approaches to federated query processing not
terms of the concepts that are of interest to the user.            to be well suited. One of the main reasons is that they all rely
                                                                   on forms of join-order heuristics that tends to multiply the
                   IV.   O UTLINE OF APPROACH                      number of HTTP requests as a factor of the depth and number
                                                                   of joins: a standard distributed join algorithm will evaluate a
Our federation engine is designed to be suitable for a dynamic     query iteratively one triple at a time, while propagating values
network topology, in accordance with our listed desiderata.        in a nested loop join fashion. This multiplies HTTP request in
  1 http://www.w3.org/2001/sw/WebOnt/
                                                                   proportion to the result sets returned by evaluating each join


                                               STIDS 2013 Proceedings Page 142
                                                                                                                                                 3


                              Q              ⌃                                   the federator component which performs service discovery at
                                                                                 run-time (cf. Section VI) to identify live and relevant sources.
                                  Rewrite                                        Relevance here means signature overlap, where a signature
                                                                                 is understood as a set of RDF properties. The extent of the
                                                                                 overlap between the signature of the query and the signature
                                    Q⌃
                                                                                 of a given endpoint determines a SPARQL CONSTRUCT query
                                                                                 which will be routed to that endpoint.
                              Distribute
                                                                                 The CONSTRUCT queries are designed to adhere to a logical
                                                                                 form which is sufficiently structured to enable us to guarantee
                                    ...                                          the soundness and completeness of the query answering pro-
                R1      R2                       Rn                              cess wrt. the set of sources R, as explained in more detail in
                                                                                 the next sections. Taken together with the obvious minimality
                                                                                 of our approach wrt. the number of HTTP requests—only a
                                   Crop                                          single request is sent to each source—as well as the relevance-
                                                                                 based per-query discovery of sources, we conclude that our
                                                                                 approach meets all our tentative desiderata A) to D).
                                  Evaluate


                                   Answer
                                                                                              V.   P ROPERTIES OF THE CROPPING

Figure 1.   System overview
                                                                                 In this section we formally define the notion of the cropping
                                                                                 of a distributed set of sources R relative to a query Q, and we
                                                                                 state its essential properties. We shall assume familiarity with
argument.2 Admittedly, there are several improved versions of                    SPARQL syntax and semantics (cf. [12]).
this algorithm on offer. The bound join technique implemented                    Notation. We use Ri , where i is in some index set I, to denote
in FedX [9], for instance, groups several instances of a join                    RDF graphs—variably referred to as sources, repositories or
argument in a single subquery using the SPARQL UNION                             endpoints. A SPARQL SELECT query is a pair hP, ~xi, where
construct. This reduces the number of request with a factor                      P is a SPARQL graph pattern and ~x a vector of elements of
equivalent to the the number of instances in the grouped query                   variables. Similarly, a CONSTRUCT query is a tuple hT, P i,
(ibid.).3 Yet, experimental evaluation shows that the number                     where T is a basic graph pattern and P is a union of such.
of HTTP request can still grow quite fast in the number                          T will be identified with the CONSTRUCT block of the query,
of joins.45 . What is common to all these approaches is that                     aka. the template, and P with the WHERE block, aka. the query
the number of HTTP request varies in the number of results                       pattern. We shall allow ourselves the convenience of blurring
returned by the sub-queries. It is a design goal of our approach,                the distinction between SPARQL queries on the one hand and
in contrast, to make a factor of the size of the query only.                     sets and families of triple patterns on the other. Where Q :=
To that end, we designed our federation engine to evaluate                       hP, ~xi is a SELECT query and G an RDF graph we denote
the query, not against the sources directly, but against an                      the result of evaluating Q against G as Q(G), and similarly
excerpt, or cropping as we call it, that is pulled from the                      for CONSTRUCT queries. The proofs of the claims that follow
sources by sending a single HTTP-request to each. Unlike                         can be found in technical report [13].
traditional warehousing strategies, however, our local copy is                   As mentioned in the previous section, our approach to federa-
not persisted, but exists only in-memory for the duration of                     tion is signature-based in the sense that the RDF properties that
the query execution process. It is essentially a snapshot of that                are found in a query are used for routing different sub-queries
part of the remote sources which is relevant for answering                       to different endpoints. This is a common strategy (cf. [8], [9])
the query in question. In realistic cases, the cropping is much                  for which we claim no originality. Now, given a query pattern
smaller than the total amount of data that it is extracted from                  P the relevant subset of P in relation to a source Ri is defined
(see Section V-A).                                                               as the maximal subset of P whose signature is contained in
An overview of the resulting system, is shown in Figure 1: the                   the signature of Ri . We shall denote this set as ⇢(P, i).
system takes as input a SPARQL query Q, and a collection                         Recapitulating briefly, our federation engine is designed to
of aligned ontologies ⌃, which are used by the rewriter to                       be HTTP-minimal, as well as sound and complete wrt. to
produce the query Q⌃ . This rewritten query is next handed to                    query answering over the selected sources. A strategy that
  2 DARQ [7] and SPLENDID [8] both implement a version of this algorithm.
                                                                                 supports all three is to execute the query against an in-memory
  3 Similar techniques are considered in [8] and [10]
                                                                                 representation of the remote sources rather than against the
  4 See e.g. the results for FedX on Life Sciences 3 query from the FedBench     sources themselves. More specifically our federation engine
suite.                                                                           routes a single CONSTRUCT query to each of the selected
  5 Another notable optimization is the star-shaped pattern technique of [11].   sources—achieving HTTP minimality—whereas the logical
Numbers of requests are however not reported in this study.                      form of this construct query is defined in such a manner as


                                                         STIDS 2013 Proceedings Page 143
                                                                                                                                     4


to guarantee that the answer to the query assembled from the            for every ?x 2 dom( j ) \ dom( k ).
selected sources is both correct and complete with respect to
those sources. Here soundness and completeness means that if            Our CONSTRUCT queries now become:
R is a set of sources selected for federation, then the answer          Definition 3: For a set ofSsources R := {Ri }i2I and a query
that the federator provides to a query Q should be exactly the          pattern P : C (P, i) = h f (s(P, i)), f (s(P, i))i where f is
same as the one that would be obtained were Q to be evaluated           some separation function for s(P, i).
conventionally over a single repository holding the union of
the data sets in R. To the best of our knowledge, our strategy          Example 2: Suppose we have two endpoints JOCWatch and
is currently the only one that guarantees that this is the case.        MedWatch,6 and the following conjunctive graph pattern P :
                                                                         ?mission medics:missionType medics:Rescue.
The logical form in question is in turn defined by distinguish-          ?mission medics:jocWatchIncident ?incident.
ing between exclusive and non-exclusive triples in a query               ?incident jocw:status ?stat.
pattern P . Exclusive triples are those that are satisfied, if at       Suppose further that each property prefixed by medics
all, at one endpoint only. Non-exclusive triples, on the other          belongs to the signature of MedWatch, that each property
hand, may be satisfied by two or more. Exclusive triples can            prefixed by jocw belongs to the signature of JOCWatch, and
safely be grouped together and executed against the source              that the jocw:status property belongs to both. The queries
for which it is exclusive in as a single conjunctive pattern.           that are routed to the respective endpoints are then:
Non-exclusive triples, however, must be shipped to the remote
                                                                        MedWatch:
sources as separate UNION clauses. This holds even if a group           CONSTRUCT {
of triple patterns are relevant to exactly the same sources since        ?_1 medics:missionType medics:Evac.
                                                                         ?_1 medics:jocwIncident ?_2.
an answer to the original query may require joining triples              ?_3 jocw:status ?_4. } WHERE {
across these sources. This gives rise to the following definition        { ?_1 medics:missionType medics:Evac.
of the set of clauses induced by P and Ri :                                ?_1 medics:jocwIncident ?_2.}
                                                                           UNION
Definition 1 (Clause set): For Ri a source and P a query                 { ?_3 jocw:status ?_4.}}

pattern: s(P, i) := {✏(P, i)} [ {{t} : t 2 ⇢(P, i) \ ✏(P, i)}           JOCWatch:
                                                                        CONSTRUCT {
Here ✏(P, i) denotes the exclusive group of a pattern P relative         ?_1 jocw:instigator ?_2.
                                                                         ?_3 jocw:status ?_4.} WHERE {
to Ri .                                                                  { ?_1 jocw:instigator ?_2.}
                                                                           UNION
Now, the basic idea behind our federation strategy is to use the         { ?_3 jocw:status ?_4.}}
set of clauses induced by P and Ri to define a CONSTRUCT
query that extrapolates the part of Ri that is relevant for             The cropping may now be defined as follows:
answering P . The most straightforward way to do that may
seem to be to use the clause set itself as a query pattern, whilst      Definition 4 (Cropping):
                                                                                      S          Put Q := hP, ~xi and R = {Ri }i2I .
using the set-theoretic union of its elements as a template. Call       Then AQR := i2I C (P, i)(Ri ).
this the naive strategy. Interestingly, the naive strategy, whilst      We now have:
complete, is not sound. Consider the following rather abstract
example:                                                                Theorem 1 (Soundness/Completeness):
                                                                                        S                   Let R be any set of
                                                                        sources, then Q( R) = Q(AQR ) for any SELECT query Q.
Example 1: Let G be the RDF graph containing only the
two triples s := (c1, p, d1) and t := (c2, q, d2), and assume           Note that here Q is the SELECT query that is being posed
a clause-set {{(?s, p, ?o)}, {(?s, q, ?s)}}. The corresponding          to the system, whereas AQR , i.e. the cropping, is the result
naive CONSTRUCT query is:                                               of assembling the results of the CONSTRUCT queries that are
  CONSTRUCT {?s q ?o. ?s p ?o.}                                         required for providing an excerpt guaranteed to answer it.
  WHERE {{?s p ?o} UNION {?s q ?o}}
                                                                        As regards time complexity, since every CONSTRUCT query
Executing this query against G will produce a graph containing          C (P, i) is in union normal form, Corollary 1 of [12] im-
the triple (c1, q, d1).                                                 mediately entails that the cropping can be built efficiently.
The example shows that the naive strategy may create bindings           In our actual implementation, the CONSTRUCT queries that
in the resulting graph that do not exist in the graph that is           are allocated to the respective endpoints are, moreover, all
queried. To counteract this effect it is necessary to standardize       executed in parallel, so time is not a precarious measure.
apart the elements of the clause-sets before using taking the
union and using it as a CONSTRUCT template. To this end we
introduce the notion of a separation function:                          A. Restricting the size of the cropping

Definition 2 (Separation function): Let S := {c1 , . . . , cn } be      Although time is not a precarious measure, the size of result
a clause set, and let i be a uniform substitution of variables          sets quickly becomes an issue. The CONSTRUCT queries that
for variables in ci . A separation function f for S is a function       are passed around to the remote endpoints, if not constrained,
s. t. 1) f (S) = { 1 (c1 ), . . . , n (cn )}, and 2) j (?x) 6= k (?x)    6 These systems are described in VI


                                                  STIDS 2013 Proceedings Page 144
                                                                                                                                               5


may well distribute a triple pattern (?a, rdf : type, ?b) to all                                   VI.     C ASE
remote endpoints, in effect requesting huge chunks of the data
contained in each.                                                   The case we use to exemplify our approach is based upon the
                                                                     following scenario: A military analyst is monitoring planned
Now, there is no need in our approach for join-ordering
                                                                     medical evacuation flight missions, and is on the lookout for
heuristics in the conventional sense, since, per the approach,
                                                                     missions that might be threatened by enemy activity. If this
joins are either executed remotely, or executed locally by
                                                                     is the case, she is also interested in finding friendly units
a standard query processing engine after the cropping has
                                                                     able to counter the particular threat. More specifically, the
been built. Rather, what we do, is to build the cropping
                                                                     information requirement is the following: Find all medical
incrementally by assessing the relative selectivity of triple
                                                                     evacuation missions and friendly units such that a) the mission
patterns and processing the most selective ones first. We can
                                                                     can be classified as being threatened; and b) that the friendly
only describe this procedure in general outline here:
                                                                     unit can handle the specific type of threat that the enemy poses.
The selectivity of a triple pattern may be assessed along several
                                                                     Normally, in order to obtain an answer to this information
dimensions. For instance, studies show that a triple pattern with
                                                                     requirement, the analyst has to keep an eye on several systems,
a literal in object position will usually be more selective than
                                                                     as information about evacuation flights and information about
one with a URL in the same position [14]. Moreover, triple
                                                                     enemy activity are usually not kept in the same system. With
patterns can be ordered in a plausible sequence of decreasing
                                                                     the aid of a system as outlined so far in this paper, however,
selectivity based on the distribution, and position, of variables
                                                                     the analyst can pose a query formulating what she is looking
in the pattern (? denotes a variable):
                                                                     for and let the information integration system take care of the
(s, p, o)     (s, ?, o)   (?, p, o)     (s, p, ?)     (?, ?, o)      rest.
(s, ?, ?)   (?, p, ?) (?, ?, ?)
                                                                     To evaluate the approach and the case outlined above, we
The entire set of heuristic rules that we have used in our           conducted an experiment at NATO CWIX 20137 set as close
solution can be found in [14].                                       as possible to the dynamic and multinational environment of
                                                                     NNEC. The experiment involved three operational information
Now, the idea is to build the cropping in layers by employing        systems: 1) JOCWatch, information on incidents of relevance
the following three-step procedure: 1) construct the graph           to the command in an event log, 2) MedWatch, a system for
corresponding to the most selective patterns pertaining to each      medical mission tracking designed to support the planning,
endpoint 2) extract variable bindings from the cropping so far,      logging and monitoring of medical evacuation missions, and
and 3) pass them on to the next iteration as constraints for the     3) Track Source, a unit tracking service providing times-
next round of queries.                                               tamped geopositional information regarding friendly units in
Step 2, the extraction of variable bindings, is realized by re-      the field. The information in MedWatch and JOCWatch were
using the triple patterns as SELECT queries that are evaluated       made available through SPARQL endpoints by using D2R [15],
against the cropping as it exists so far, whereas the propagation    while the Track Source service had a native SPARQL interface.
of values from one layer to the next is realized with the            In addition, all sources were supplied with a service description
VALUES feature of SPARQL 1.1, which allows a set of                  according to the SPARQL 1.1 specification, and each source
bindings to be shipped with a query in order to constrain the        made available its ontology at an URL described in the service
answers.                                                             description.

Our procedure is designed to treat each exclusive group as           Our prototype system performed service discovery using
an atomic unit, since exclusive groups are likely to be more
                                                                        • mDNS8 for broadcasting and discovering the presence of
selective as a set. For the same reason, they are given maximum
                                                                          information sources,
priority, That is, the first layer of the incremental construction
                                                                        • DNS-SD9 for high-level description of the source in
of the cropping consists of the result of executing the exclusive
                                                                          terms of pointers to query endpoint location and content
groups as CONSTRUCT queries. The subsequent layers are then
                                                                          description location, and
constructed from the non-exclusive patterns by rating them
                                                                        • the SPARQL 1.1 service description and VoID10 vocabu-
according to the heuristic criteria.
                                                                          laries for describing source content.
This procedure is sufficient to ensure that very unconstrained
patterns such as (?s, ?p, ?o) will be processed late, when           This approach addressed the NNEC needs as outlined in
bindings are available for some of its variables. It can also        section II, and had the advantage that it was independent
be tuned to give low priority to predicates from existing RDF        of a central registry, thus eliminating the issue of network
vocabularies that are known to have a low selectivity rate,
                                                                        7 Coalition Warrior Interoperability eXploration, eXperimentation and eX-
such as e.g. rdf : type or dcterms : title. The procedure
                                                                     amination, eXercise – an annual NATO event aimed at improving alliance-wide
preserves the soundness and completeness of the cropping wrt.        interoperability
the underlying sources, and, although the number of HTTP                8 http://www.multicastdns.org/
request is no longer minimal it is constant in a small factor of        9 http://www.dns-sd.org/
the number of triple patterns in the original query.                    10 http://www.w3.org/TR/void/


                                                STIDS 2013 Proceedings Page 145
                                                                                                                                                               6


                                    JOCWatch                                 Concept                     Definition
                                                                             ThreatenedMission           MedWatch missions that are related to a
                                                                                                         ThreateningIncident
        jw:Hostile                     <event>                               ThreateningIncident         All JOCWatch incidents that are related to a
               jw:instigator                                rdf:type                                     ThreateningEvent
        rdf:type                            jw:incident                      ThreateningEvent            All JOCWatch events that are both a
                                                                                                         MilitaryOperation (from the JOCWatch
                                                                                                         ontology) and a HostileEvent
      <instigator>                   <incident>               jw:SAFIRE      HostileEvent                All events that has a HostileInstigator
                                                                             HostileInstigator           All event participants that are classified as being
                                                                                                         hostile.
                                            med:incident                     Relation                    Definition
                       MedWatch                                              canHandle                   A relation between a military unit type and the
                                                                                                         type of events those units types are equipped to
                         rdf:type                       useront:canHandle                                handle.
       med:Mission                   <mission>
                                                                             hasEvent                    If a mission involves an incident, and there exists
                                                                                                         an event that belongs to the same incident (inverse
                                                                                                         property), then the event is also related to the
                                                                                                         mission
                                    TrackSource
                                                                                  Table I.     R ELEVANT DEFINITIONS IN THE USER ONTOLOGY
                         nf:unit                  rdf:type
            <track>                    <unit>                 nf:Artil
                                      nf:lon
       nf:posinf                                                                 SELECT ?mission ?unit
                                                                                 WHERE{
                          nf:lat                                                    ?mission a useront:ThreatenedMission.
        <posdata>                    “49.12352”              “51.76352”
                                                                                    ?mission useront:hasEvent ?event.
                                                                                    ?event a useront:ThreateningEvent.
                                                                                    ?event wgs84:lat ?elat.
                                                                                    ?event wgs84:long ?elong.
Figure 2.    Conceptual relationship between data sources                           ?unit useront:canHandle ?event.
                                                                                    ?unit useront:hasPosition ?pos.
                                                                                    ?pos wgs84:lat ?ulat.
                                                                                    ?pos wgs84:long ?ulong.
                                                                                 }
                                                                            Figure 3.   SPARQL query representing the information request
fragmentation.
The relationship between the information sources in our ex-
periment is illustrated in Figure 2. In the figure we see that              earlier can now be expressed by the query in Figure 3. 11
MedWatch missions are (potentially) related to JOCWatch
events through a shared incident. Furthermore, the JOCWatch                 Here ThreateningEvent, hasEvent, canHandle, and
events are typed according to category e.g. as a SAFIRE event,              hasPosition are terms specific to the user’s vocabulary, see
which is an event that involves hostile surface-to-air fire. In the         table VI. Posing this query to any of the information sources
figure we also see that units in the Track Source are typed                 would not return any answers. In our experiment, this query
(e.g. as Artillery), and that it contains positional data.                  was decomposed and distributed as per the approach outlined
Additionally, in the figure, a stipulated line is drawn from                earlier.
Artillery to SAFIRE, indicating that, units of the former                   The main motivation behind this experiment was to test
kind are equipped to counter those of the latter. This relation             whether it is feasible to provide a decision maker with the
does not actually belong to the data, but is defined in the user            means to request information using her own terms and without
ontology useront, and is key to the formulation of the user’s               presupposing detailed knowledge about a fixed set of sources.
information need.                                                           This creates a coupling between the requesting system and the
The experiment included four main ontologies: a JOCWatch                    information sources that is loose enough to adapt to a changing
ontology, a MedWatch ontology, a Track Source ontol-                        network topology, something that should be highly relevant in
ogy, and an ontology containing the concepts used by the user               the NNEC environment. Our strategy of combining once-per-
of the system.                                                              query federation with rewriting worked well for our sample
                                                                            case, and proves that the idea is sound in general outline. To
In this particular case, the user ontology is derived from and              be sure, if the system is to to scale well—both in terms of
expresses the data models of the respective sources. We are                 efficiency and usability—there are some serious issues that
thus assuming that, although available sources may come and                 need to be addressed having to do with the expressiveness of
go, we know their data models. This is not overly unrealistic,              the ontology language and the complexity of reasoning in it.
since NATO-wide standardization is part of the NNEC con-                    We record some findings in the next section.
cept. The assumption means that we do not have to match
ontologies at run-time. A user ontology less tightly coupled
with the source ontologies and run-time ontology matching is                  11 In reality, we apply filtering of friendly units based on distance from

something we plan to look more into in future work.                         events using the haversine formula and a threshold value. As this does not
                                                                            contribute to understanding the general approach we have left it out of the
Given the user ontology, the information requirement described              example.


                                                        STIDS 2013 Proceedings Page 146
                                                                                                                                    7


         VII.    E XPERIENCES AND O BSERVATIONS                          backwards (i.e. traversing the inverse of the relation).

As explained in section IV, it is an essential presupposition       The problem with 1) is that it has a binary predicate in the
of our approach to federation that the ontology that is used        conclusion. For that reason, it cannot be expressed as a class
to provide access to the underlying sources be expressed in         inclusion axiom. A description logic axiom is either a class
a first-order rewritable language. This is necessary in order       axiom or a relationship axiom (aka. role axiom) but cannot be
to separate reasoning from source selection, thus making the        a mix of the two. Indeed, a class inclusion axiom—irrespective
system able to adapt to a dynamic network topology by               of the particular brand of description logic that is being used—
selecting sources at run-time.                                      cannot, by design, express cross-references between antecedent
                                                                    and consequent in two or more variables, as our axioms
Yet, it is not given that the structure and relationship between    require.
the sources that we selected for our case-study, as illustrated
in Figure 2, can in fact be expressed in a first-order rewritable   Description logics typically allow us to state axioms like the
language.                                                           following (in description logic notation):

Choosing an ontology language in the DL-Lite family of                • Artillery v 9canHandle.SAFFIRE
description logics would have been natural for several reasons:       • Artillery v 8canHandle.SAFFIRE
First,these languages are specifically designed to stay within
the boundaries of first-order rewritability. Secondly, DL-Lite      At first glance, these may seem to come close to what we
forms the basis of the W3C-endorsed QL language profile             wish to say, but that is not really the case. The first says that
of OWL 2, and so has an XML serialization, and enjoys the           an artillery unit can handle some surface-to-air-fire event, but
status of an official recommendation. Finally, several efficient    it does not identify the event. The second says that an artillery
rewriters already exist for the DL-Lite family of languages,        unit can only handle surface-to-air-fire events, although it may
which, if we could use them, would of course leverage the           not be able to handle all of them. What we wish to say
burden of implementing our own federation engine.                   though us that all artillery units can handle all surface-to-air-
                                                                    fire events.
As it turns out, however, our case cannot be expressed in any
of the standard OWL2 profiles, nor in any other description         Taking stock, there is thus, to the best of our knowledge, no
logic we are currently aware of. This is due to a combination       first-order rewritable description logic capable of expressing
of features exemplified by the structure of our sources. Refer-     the structure and interrelationship between our selected sample
ring to Figure 2 there are mainly two sources of expressive         of military information sources. As it turns out, though, there
complexity:                                                         is a different family of ontology languages altogether that
                                                                    is sufficiently expressive for our needs, namely the family
 1) As indicated by the stipulated arrow labelled                   of general existential rules aka. existential datalog [6], more
    useront:canHandle connecting TrackSource                        specifically the language of weakly recursive datalog. Weakly
    to the JOCWatch database, we wish to add axioms                 recursive datalog is strictly more expressive than any first-
    to the ontology that classify which kind of vehicle             order rewritable description logic—and more importantly, it
    or unit that is equipped to counter which kind of               is sufficiently expressive to express 1) and 2) above, thus
    hostile event—for instance artillery in the case of a           capturing the salient features of our case. Our federation engine
    surface-to-air attack. Encoding this knowledge in the           is therefore equipped with a rewriter that expects an ontology
    ontology is necessary in order to enable the user to            to be encoded in weakly recursive datalog.
    query the sources for available military support within         Although, this choice is more or less forced upon us by
    a given diameter from a threatened position. However,           the characteristics of the case, it does not mean of course
    it requires that we be able to state in the ontology            that the choice of recursive datalog as our ontology language
    that certain combinations of unit types and events              does not come with its own set of drawbacks. First of all,
    constitute sufficient conditions for the unit and event in      existential datalog in general does not currently have the kind
    question to stand in the canHandle relation. Stated             of institutionalized support that the OWL family of languages
    more formally, we need to have axioms of the form               enjoys. Secondly, and much for the same reason, it has far
    8x8y.Artillery(x)^SAF IRE(y) ! canHandle(x, y),                 less endorsement from the software industry in terms of tool
    for all the appropriate combinations of units and events.       support.
 2) Presupposing that the canHandle relation has been               In fact, we could not find an existing rewriter for weakly
    axiomatized, we further need to express in the ontology         recursive datalog, and therefore had to build one from scratch.
    that finding the position of a unit involves traversing the     Alas, implementing a correct rewriter does not entail that one
    TrackSource graph from the reported latitudes and               has implemented an efficient one, and although queries over
    longitudes through the relations nf:posinf, nf:unit             weakly recursive datalog ontologies are first-order rewritable,
    and rdf:type via useront:canHandle to an as-                    the size of the rewriting itself may be exponential in the size
    sociated hostile event. This is a fairly long and intricate     of the original query. Thus, without a considerable amount
    path that requires traversing relations forwards as well as     of research being devoted to optimization, the rewriter is not


                                               STIDS 2013 Proceedings Page 147
                                                                                                                                                        8


likely to perform well for any large class of cases. Theoretical              [2] W3C, “RDF Primer,” http://www.w3.org/TR/rdf-primer/, February
results are encouraging, though. In particular, results from [16]                 2004.
shows that there is a minimal rewriting of any query over a set               [3] ——,        “OWL       2    Web      Ontology      Language     Primer,”
of weakly recursive datalog rules. Computing such a minimum,                      http://www.w3.org/TR/2009/REC-owl2-primer-20091027/,            October
                                                                                  2009.
however, remains a topic for future research.
                                                                              [4] ——,            “SPARQL            1.1         Query          Language,”
                                                                                  http://www.w3.org/TR/sparql11-query/, March 2013.
                      VIII.    R ELATED W ORK                                 [5] D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, A. Poggi,
                                                                                  M. Rodriguez-Muro, and R. Rosati, “Ontologies and Databases: The
                                                                                  DL-Lite Approach,” in Semantic Technologies for Informations Systems.
Several studies have addressed the problem of decomposing                         Springer, 2009.
a SPARQL query into sub-queries that can be allocated to a                    [6] G. Gottlob, G. Orsi, and A. Pieris, “Ontological Queries: Rewriting
distributed set of remote sources. Notable examples include                       and Optimization (Extended Version),” Computing Research Repository,
[17], [7] [18], [19], [20], [9], [11] and [8]. All of these                       vol. abs/1112.0343, 2011.
studies belong to what we would call the join-order heuristics                [7] B. Quilitz and U. Leser, “Querying distributed RDF data sources
paradigm, and, unlike the present paper, none gives particular                    with SPARQL,” in Proceedings of the 5th European Semantic Web
attention to establishing framework that is both sound/com-                       Conference (ESWC ’08), 2008.
plete and request-minimal. Moreover, the listed reports focus                 [8] O. Görlitz and S. Staab, “SPLENDID: SPARQL Endpoint Federation
                                                                                  Exploiting VOID Descriptions,” in Proceedings of the 2nd International
exclusively on federating queries that are expressed directly in                  Workshop on Consuming Linked Data (COLD 2011), 2011.
terms of the data. To the best of our knowledge there are very                [9] A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt,
few contributions that address the question of how to combine                     “FedX: Optimization Techniques for Federated Query Processing on
query federation with reasoning, where reasoning cuts across                      Linked Data,” in Proceedings of the 10th International Semantic Web
several sources.                                                                  Conference (ISWC 2011), 2011.
                                                                             [10] J. Zemanek and S. Schenk, “Optimizing SPARQL Queries over Dis-
                                                                                  parate RDF Data Sources through Distributed Semi-Joins,” in Inter-
                         IX.    C ONCLUSION                                       national Semantic Web Conference (Posters and Demos), ser. CEUR
                                                                                  Workshop Proceedings, vol. 401, 2008.
In this paper we have established a sound, complete and                      [11] G. Montoya, M.-E. Vidal, and M. Acosta, “A heuristic-based approach
request-minimal baseline for query federation. Our approach                       for planning federated sparql queries.” in COLD, ser. CEUR Workshop
is signature-based and compatible with a run-time selection of                    Proceedings, J. Sequeda, A. Harth, and O. Hartig, Eds., vol. 905.
sources. It is therefore particularly suitable for domains that                   CEUR-WS.org, 2012.
are characterized by low bandwidth and a dynamic network                     [12] M. Arenas, C. Gutierrez, and J. Pérez, “Foundations of RDF Databases,”
topology. We have further described an example from military                      in Reasoning Web. Semantic Technologies for Information Systems.
                                                                                  Springer, 2009.
evacuation planning to illustrate the usefulness of the approach.
                                                                             [13] B. J. Hansen, J. Halvorsen, and A. Stolpe, “Information integration
In order to mitigate the heterogeneities between sources, as                      experiment at NATO CWIX 2012,” Norwegian Defence Research
well as to present the data in a vocabulary that is familiar                      Establishment (FFI), FFI/RAPPORT-2012/01543, 2012.
to the user, we found it expedient to use an ontology to                     [14] P. Tsialiamanis, L. Sidirourgos, I. Fundulaki, V. Christophides, and
provide a unifying layer above the information sources. Any                       P. Boncz, “Heuristics-based query optimisation for sparql,” ser. Pro-
incoming query is therefore rewritten according to the ontology                   ceedings of EDBT ’12. ACM, 2012.
before being passed on to the sources. However, we have that                 [15] C. Bizer and R. Cyganiak, “Publishing Relational Databases on the
ontology-based data access is not coherent with our federation                    Semantic Web,” http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/, Au-
                                                                                  gust 2009.
strategy unless the ontology is formulated in a language that
                                                                             [16] C. Civili and R. Rosati, “A broad class of first-order rewritable tuple-
is first-order rewritable such that reasoning can be decoupled                    generating dependencies,” in Datalog in Academia and Industry, ser.
from data access. In realistic cases like ours, one quickly                       Lecture Notes in Computer Science, P. Barceló and R. Pichler, Eds.
transcends the expressive capabilities of familiar first-order                    Springer Berlin Heidelberg, 2012, vol. 7494, pp. 68–80.
rewritable OWL fragments such as OWL2-QL. In our case we                     [17] M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, and E. Ruckhaus,
overcame this limitation by resorting to a decidable fragment                     “Anapsid: An adaptive query processing engine for sparql endpoints.” in
of existential datalog.                                                           Proceedings of the 10th International Semantic Web Conference (ISWC
                                                                                  2011), 2011.
                                                                             [18] C. Basca and A. Bernstein, “Avalanche: Putting the spirit of the web
                        ACKNOWLEDGMENTS                                           back into semantic web querying,” in ISWC Posters&Demos, 2010.
                                                                             [19] A. Harth, K. Hose, M. Karnstedt, A. Polleres, K.-U. Sattler, and
The NATO systems participating in the experiments reported in                     J. Umbrich, “Data summaries for on-demand queries over linked data,”
                                                                                  in Proceedings of the 19th international conference on World wide web,
this paper was made available to us by the NATO C3 Agency.                        ser. Proceedings WWW ’10. New York, NY, USA: ACM, 2010, pp.
                                                                                  411–420.
                                                                             [20] Y. Li and J. Heflin, “Using reformulation trees to optimize queries over
                              R EFERENCES                                         distributed heterogeneous sources,” in Proceedings of 9th the Interna-
                                                                                  tional Semantic Web Conference (ISWC), ser. Proceedings ISWC’10.
 [1]   P. Bartolomasi, T. Buckman, A. Campell, J. Grainger, J. Mahaffey,          Springer-Verlag, 2010.
       R. Marchand, O. Kruidhof, C. Shawcross, and K. Veum, “NATO
       Network Enabled Capability Feasibility Study, Version 2.0,” NATO C3
       Agency, Tech. Rep., October 2005.


                                                      STIDS 2013 Proceedings Page 148