A semantic stateless service description language

                                  P. A. Bonatti and L. Sauro

                                 Università di Napoli Federico II


       Abstract. Complexity issues and the requirements on semantic web application
       in the Life Science domains recently motivated a few works on stateless service
       description languages [1, 5]. With stateless services, it is possible to reason about
       the semantic relationships between inputs and outputs, while keeping matchmaking
       and composition decidable. In this paper we extend the languages introduced in [1]
       and [5] with more general forms of composition and other constructs. We provide
       formal syntax and semantics and some preliminary results on the complexity of
       service comparison. These complexity results rely on hybrid formalisms involving
       both logic programming rules and description logics.


1 Introduction
The area of semantic web services is concerned with the declarative, knowledge based
specification of web service semantics applied to service matchmaking (i.e., finding a ser-
vice that matches a given specification), verification and automated composition. There
is a conspicuous literature on the topic, enriched by several competing standards, such as
OWL-S, WSMO, and WSDL-S.
     When the semantic description involves dynamic behavioral aspects such as itera-
tions, the tasks of matchmaking and composition easily become undecidable. This mo-
tivated a few works on stateless services [1, 5], that behave like functions or database
queries. With stateless services, it is possible to move beyond a mere description of input
and output types and capture the relationships between inputs and outputs, while keeping
matchmaking and composition decidable. Stateless services are interesting because they
are common in the domain of Life Sciences [5]. Moreover, they can be paired with a
workflow language supporting procedural constructs like BPEL4WS with the purpose of
supporting the dynamic binding of atomic activities.
     In this paper we extend the languages introduced in [1] and [5] with more general
forms of composition and other constructs. We provide formal syntax and semantics and
some preliminary results on the complexity of service comparison, a basic reasoning
task that underlies both matchmaking and composition (cf. [1]). These complexity re-
sults rely on hybrid formalisms involving both rules and description logics. The language
we adopt admits a graphic presentation (that may be appreciated by users with limited
programming skills) as well as textual representation that resembles relational query and
programming languages enough to be familiar to programmers.
     We start with some examples (Sec. 2) followed by a brief summary of description
logic notions (Sec. 3). Then we formalize our service description logic language SDLfull
(Sec. 4). Service comparison is reduced to an intermediate logic programming formula-
tion and then to queries against description logic knowledge bases in Sec. 5, which allow
to derive complexity results (Sec.6).
2 A running example

Services receive input messages and return output messages. Such messages are struc-
tured objects (as in WSDL), consisting of a set of attribute-value pairs, such as

                     {street="Via Toledo", numb=128}.

Following [1], we assume that services can be like queries, that is, a single input message
may be mapped onto a set of homogeneously structured output messages. Formally this
means that a service can be abstracted by any set of pairs (min , mout ), with multiple pairs
sharing the same min .
    Now assume an underlying ontology defines the concepts Place, Map, Coord, and
Address, and that every Place has the attributes hasAddr, hasMap, hasCoord. In
turn each address has the attributes hasCity, hasStr and hasNum. Consider a service
Mapservice that takes input messages with attributes city and street, and returns the
map of the surrounding area in a message with the single field result. Mapservice can
be described in our language with the following expression:
    select result:=hasMap from all Place
          with hasAddr.hasCity = city, hasAddr.hasStr = street .
    This description can be easily adapted to describe similar services. For example, a
specialized map service that works only for southern cities can be described by defining
a concept SouthernCity in the underlying ontology and restricting Mapservice with
the expression:
    Mapservice restricted to SouthernCity(city) .
    Portals can be described with unions. Given two map services for Europe and China,
called Euromap and Chinamap, a portal that covers both areas can be described by:
    union (Euromap, Chinamap).
    Intersections are supported, too. Now suppose that Euromap is more reliable than the
generic Mapservice, then it may be preferable to use Euromap when possible. This can be
done with conditionals (temporarily assume that Euromap and Chinamap have the same
input message type as Mapservice, with the city field):
    if EuropeanCity(city) then Euromap else Mapservice .
    A relevant task is composition, our framework supports composition through dataflow
graphs by which the output of some services can be fed as input to other services. For
example, let Addr2coor be a service that takes city and street and returns the as-
sociated coordinates lat and lon; then let Coor2map be a service that returns the map
associated to the given coordinates, called latitude and latitude by this service.
The composition of these two services can be specified with the dataflow graph in Fig. 1.
We support also a textual representation:
    CompoundMap:
          in city, street
          out result
          C := Addr2coor(in)
          out := Coor2map(latitude:=C.lat, longitude:=C.lon) .
    In order to combine different services it may be necessary to adapt and restructure
their inputs and outputs (e.g. consider the above example for conditionals when Euromap
and Chinamap have different input message types). Here is an example of a variant of
                                  Fig. 1. A dataflow graph


Mapservice whose input city is forced to be Naples (a constant in the knowledge base),
and whose output is renamed:
     RestructuredMapServ:
          in street
          out map
          C := Mapservice (street:=in.street, city:=Naples)
          out.map:=C.result .
     In general we allow a message element to be fed as an input to multiple other services,
so dataflow graphs can be arbitrary DAGs. This was not allowed in [1]
     Our framework allows to reason about different specifications. The basic reasoning
task is service comparison, that given two service descriptions S1 and S2 checks whether
all the input-output message pairs in the semantics of S1 are also in the semantics of S2 ;
in that case we write S1 vKB,Σ S2 , where KB is the underlying ontology and Σ con-
tains the service definitions. By comparing services one may look for stronger or weaker
services (cf. [1]). If Addr2coor and Coor2map are correctly specified (say, with select
expressions), then our framework can verify that CompoundMap vKB,Σ Mapservice and
Mapservice vKB,Σ CompoundMap, thereby concluding that in the absence of a direct
implementation of Mapservice, an equivalent service can be obtained by composing the
implementations of Addr2coor and Coor2map as specified by CompoundMap (dynamic
service replacement). Service comparison can also be a basis for automated composition
that, however, lies beyond the scope of this paper.
     Syntactically speaking, the service description language illustrated above lies some-
where in between relational algebra and a programming language. A major difference
with respect to both is that descriptions are linked to an ontology, so it is possible to
distinguish—say—a hash table that associates people with their age from another hash
table (with the same implementation) that associates people with their credit card num-
ber. Clearly, such differences are crucial for tasks such as service discovery and dynamic
binding of workflow activities to services. Procedural constructs cover assignments and
conditionals; only iterations are not supported, and this has a few advantages: (i) the
main reasoning tasks are decidable, (ii) the language is easier to use for people with no
programming background.


3 Preliminaries
The vocabulary of the description logics we deal with in this paper consists of the fol-
lowing pairwise disjoint countable sets of symbols: a set of atomic concepts At, a set of
individual names In, and a set of atomic roles AR , with a distinguished subset of names
AtR ⊆ AR denoting transitive roles.
    A role is either an expression P or P − , where P ∈ AR . Let R range over roles. The
set of concepts is the smallest superset of At such that if C, D are concepts, then >, ¬C,
C u D, ∃.C, and ∃≤n R.C are concepts.
    Semantics is based on interpretations of the form I = h∆I , ·I i where ∆I is a set of
individuals and ·I is an interpretation function mapping each A ∈ At on some AI ⊆ ∆I ,
each a ∈ In on some aI ∈ ∆I , and each R ∈ AR on some RI ⊆ ∆I × ∆I . Moreover,
if R ∈ AtR , then RI is transitive. The meaning of inverse roles is (R− )I = {hy, xi |
hx, yi ∈ RI } . Next we define the meaning of compound concepts. By ] S we denote the
cardinality of S.

                        AIρ = AI          (A ∈ At)      >I = ∆I
                    (¬C)Iρ = ∆I \ CρI                   (C u D)Iρ = CρI ∩ DρI
                                ©                                  ª
                  (∃R.C)Iρ =        x | ∃y.hx, yi ∈ RI ∧ y ∈ CρI
                                ©                                       ª
               (∃≤n R.C)Iρ =        x | ] {y | hx, yi ∈ RI ∧ y ∈ CρI } ≤ n .

 Other standard constructs (∀R.C, ⊥, t) can be derived from the above concepts.
    A general concept inclusion (GCI) is an expression C v D where C and D are
concepts. A role inclusion is an expression R1 v R2 where R1 and R2 are roles. An
assertion is an atom like A(a) or P (a, b) where A ∈ At, R ∈ AR , and {a, b} ∈ In . A
TBox is a set of GCIs; a role hierarchy is a set of role inclusions; an ABox is a set of
assertions. Finally, a DL knowledge base (DL KB) is a triple hT , H, Ai consisting of a
TBox, a role hierarchy and an ABox.
    An interpretation I satisfies a (concept or role) inclusion E1 v E2 iff E1I ⊆ E2I .
Moreover I satisfies an assertion A(a) (resp. P (a, b)) iff aI ∈ AI (resp. (aI , bI ) ∈ P I ).
A model of a DL KB is any I that satisfies all the inclusions and the assertions of the KB.
    The above description logic is known as SHIQR+ . By disallowing transitive roles
we get SHIQ. By disallowing ∃≤n R.C, transitive roles and role hierarchies one gets the
logic ALCI. ALC is obtained by further dropping inverse roles. The logic EL supports
only u, ∃R.>, and GCIs built from these constructs.
    Moreover, there exists a rather different extension of ALC called DLR, supporting
n-ary relations (n > 2) that we will mention in the following but we do not report here
due to space limitations. Its definition and relevant results can be found in [3].


4 Syntax and semantics of SDLfull
Our service description language, called SDLfull , extends the DL vocabulary with an in-
finite supply of constants Nc , service names Ns , and message attribute names Na . SDLfull
describes functional and knowledge-based aspects of web-services. Therefore, as usual
functional programming languages, it does not define a service as a set of state variables
and a sequence of statements which update them, but as the functional composition of
stateless expressions that have to be evaluated.
Definition 1. The language of service expressions is the least set Expr containing:
 – (service calls) all S ∈ Ns ;
 – (set operators) all expressions op(E1 , . . . , En ) such that {E1 , . . . , En } ⊆ Expr and
   op ∈ {union, intersection};
 – (conditionals) all expressions if L then E1 [else E2 ] (the else clause is optional)
   such that
     • {E1 , E2 } ⊆ Expr, and
     • L is a list of conditions of the form t = u, t 6= u, A(t), or ¬A(t), where
        {t, u} ⊆ Nc ∪ Na
 – (selections) all expressions select a1 := r1 , . . . an := rn from all D with L, such
   that
     • ai ∈ Na (1 ≤ i ≤ n) ;
     • ri is a role path (in the language of the underlying ontology) (1 ≤ i ≤ n) ;
     • D is a concept (in the language of the underlying ontology);
     • L is a list of bindings pi = ti (1 ≤ i ≤ m) where each pi is a role path (in the
        language of the underlying ontology), and ti ∈ Nc ∪ Na ;
 – (message restructuring) all expressions a1 := t1 , . . . an := tn such that ai ∈ Na and
   ti ∈ Na ∪ Nc (1 ≤ i ≤ n) ;
 – (restrictions) all expressions E restricted to L such that L is a list of conditions
   (see conditionals above).

    A service consists of a dataflow graph which evaluates data by means of functional
nodes. Each functional node represents a stateless expression which may have multiple
inputs and outputs denoted by parameter names. Edges in a dataflow graph are used to
connect the output of a functional node with the inputs of (possibly many) other func-
tional nodes. In order to specify which output is connected to which input, edges are also
labeled with attribute names and, as the inputs and the outputs of different expressions
may be labeled with different parameter names, edges do not connect directly two func-
tional nodes, but connect functional nodes with parameter nodes that are intended to fix
name mismatches.
    Dataflow graphs are defined as follows:

Definition 2. A dataflow graph with name S is a tuple hS, NS , ES , name S , expr S i where

 – S ∈ Ns ;
 – NS is a finite set of nodes, partitioned into functional and parameter nodes, denoted
   by fun(NS ) and par(NS ), respectively;
 – ES is a finite set of edges; ES ⊆ (fun(NS ) × par(NS )) ∪ (par(NS ) × fun(NS ));
 – name S : par(NS ) ∪ ES → Na is a labelling function;
 – expr S : fun(NS ) → Expr is a labelling function.

Moreover, dataflow graphs are required to be directed acyclic graphs (DAGs).

The parameter nodes with no incoming edges (resp. no outgoing edges) will be called the
input nodes (resp. output nodes) of the graph. In Fig. 1, ovals and small circles represent
functional and parameter nodes, respectively; input and output nodes are colored in gray.
    The dependency graph of a set of dataflow graphs Σ is hΣ, Ei, where E is the set of
all pairs (G1 , G2 ) such that the name of G2 occurs in the label of some functional node
in G1 . We say that Σ is acyclic if its dependency graph is.

Definition 3. A service specification Σ is a finite, acyclic set of dataflow graphs with
mutually different names.
    Edge labels should match the input/output message attributes of the service expres-
sions labeling functional nodes. This requirement is formalized in terms of typing. In this
paper we only deal with a structural form of typing (centred around message attribute
names); the problem of ensuring–say—that the connected input/output attributes lat
and latitude in Fig. 1 belong respectively to two “compatible” concepts C1 and C2
such that C1 v C2 has already been tackled in the literature (including [5]). We will deal
with it in the full paper.

Definition 4. A (message) type is a finite set T ⊆ Na

Definition 5. The input type of a dataflow graph G = hS, NS , ES , name S , expr S i with
respect to a specification Σ is the set

                   inΣ (G) = {name S (n) | n is an input node of G} .

The output type of a dataflow graph G = hS, NS , ES , name S , expr S i with respect to a
specification Σ is the set

                 outΣ (G) = {name S (n) | n is an output node of G} .

Definition 6. The input type of a service expression E with respect to a specification Σ,
denoted by inΣ (E), is recursively specified as follows:

 – if E = S ∈ Ns , then inΣ (E) equals inΣ (G) where G has name S;
 – inΣ (op(E1 , E2 )) = inΣ (E1 ) ∪ inΣ (E2 );
 – inΣ (if C then E1 else E2 ) = inΣ (E1 ) ∪ inΣ (E2 ) ∪ {a ∈ Na | a occurs in C};
 – inΣ (select A from all D with R) = {a ∈ Na | a occurs in R};
 – inΣ (a1 := t1 , . . . an := tn ) = Na ∩ {t1 , . . . tn }.

    The output type of a service expression E with respect to a specification Σ, denoted
by outΣ (E), is recursively specified as follows:

 – if E = S ∈ Ns , then outΣ (E) equals outΣ (G) where G has name S;
 – outΣ (op(E1 , E2 )) = outΣ (E1 ) ∩ outΣ (E2 );
 – outΣ (if C then E1 else E2 ) = outΣ (E1 ) ∩ outΣ (E2 );
 – outΣ (select a1 := r1 , . . . an := rn from all D with R) = {a1 , . . . an };
 – outΣ (a1 := t1 , . . . an := tn ) = {a1 , . . . an }.

About the above definition: Intuitively, all input parameters have to be supplied in order
to call a service; therefore if the components of a compound service have different input
types, then the compound service must take their union to be sure that all component
services can be invoked. Symmetrically, the only outputs one can count on are those
returned by all the component services; this is why intersection is used here.

Definition 7. A specification Σ is well-typed iff for all dataflow graphs hS, NS , ES ,
name S , expr S i ∈ Σ, and for all functional nodes k ∈ fun(NS ),

 – in(k) equals the set of labels of the incoming edges of k;
 – out(k) contains the set of labels of the outgoing edges of k.
From now on we assume that all service specifications are well-typed unless stated oth-
erwise.
    The semantics of service expressions and dataflow graphs is defined in terms of
worlds that specify the extension of concepts and roles, as well as the behavior of each
service. From a semantic perspective, a message is a partial function defined over the
message’s attributes, that returns for each attribute its value.

Definition 8. A ∆-message is a partial function m : Na → ∆ .

The message’s range ∆ will sometimes be omitted when irrelevant or obvious.
    Now a world is simply a combination of a DL interpretation (that interprets the terms
defined in the underlying ontology) plus an interpretation of service names (i.e. atomic
services).

Definition 9. A world is a tuple W = h∆W , ·W , [·]W i such that
 – h∆W , ·W i is an interpretation of the knowledge base;
 – [·]W maps every service name S ∈ Ns on a set [S ]W of ∆W -message pairs.

To ensure that service name evaluation reflects the given service specification, we have
to specify the semantics of the terms and expressions used in dataflow graph labels.

Definition 10. The evaluation tW (m) of a term t ∈ Nc ∪ Na with respect to a world W
and a message m, is m(t) if t ∈ Na , and tW otherwise.

Definition 11. The evaluation E W (m) of a service expression E with respect to a world
W and a message m is recursively defined as follows:
 – if E = S ∈ Ns , then E W (m) = {m0 | (m, m0 ) ∈ [S ]W } ;
 – union(E1 , E2 )W (m) = E1W (m) ∪ E2W (m) ;
 – intersection(E1 , E2 )W (m) = E1W (m) ∩ E2W (m) ;
 – (if C then E1 else E2 )W (m) = E1W (m) if C W (m) is true, E2W (m) otherwise;
   moreover, C W (m) is true iff
      • for all t ¯ u in C, tW (m) ¯ uW (m) holds (¯ ∈ {=, 6=}),
      • and for all literals A(t) and ¬B(u) in C, tW (m) ∈ AW and uW (m) 6∈ B W ;
 – (select a1 := r1 , . . . an := rn from all D with R)W (m) is the set of all m0 such
   that, for some x ∈ DW ,
      • for all r ¯ t in R there exists y ∈ rW (x) such that y ¯ tW (m) holds (¯ ∈ {=
        , 6=});
      • m0 (ai ) ∈ riW (x) (1 ≤ i ≤ n); m0 is undefined in every other case;
 – (a1 := t1 , . . . an := tn )W (m) = {m0 } where the domain of m0 is a1 , . . . an and
   m0 (ai ) = m(ti ) (1 ≤ i ≤ n) .

    The evaluation of service compositions (i.e. dataflow graphs) is defined in a declar-
ative way: each parameter node must be assigned a value (an element of ∆W ) in a way
that is compatible with the input-output behavior of each functional node:

Definition 12. The evaluation [G]W of a graph G = hS, NS , ES , name S , expr S i w.r.t.
W is the set of all ∆W -message pairs (min , mout ) such that for some function σ :
par(NS ) → ∆W , the following conditions hold:
 – for all input nodes n ∈ NS , min (name S (n)) = σ(n);
 – for all output nodes n ∈ NS , mout (name S (n)) = σ(n);
 – min and mout are undefined for every other attribute name;
 – for all n ∈ fun(NS ), it must hold that mnout ∈ expr S (n)W (mnin ), where mnin and
   mnout are defined as follows: for all a ∈ Na ,
     • if there exists an edge (n0 , n) with name S (n0 , n) = a, let mnin (a) = σ(n0 ),
     • if there exists an edge (n, n00 ) with name S (n, n00 ) = a, let mnout (a) = σ(n00 ),
     • mnin and mnout are undefined for all other inputs.

Definition 13. A world W is a model of a specification Σ with respect to a knowledge
base KB iff

 1. h∆W , ·W i is a model of KB ;
 2. for all names S of a dataflow graphs G ∈ Σ, [S ]W = [G]W .

   If W is a model of Σ, then it is not hard to see that since Σ is acyclic (by definition),
[·]W is uniquely determined by h∆W , ·W i (i.e. service specifications are deterministic).
     The next definition specifies when a service S1 is a weakening of S2 (equivalently, S2
is a strengthening of S1 ) [1]. These relations are the basis for service comparison.

Definition 14. S1 vKB,Σ S2 iff for all models W of Σ w.r.t. KB , [S1 ]W ⊆ [S2 ]W .

    Roughly speaking, if S2 is a strengthening of S1 , then for any given input, S2 returns
more answers than S1 . See [1] for a discussion of the different applications of strength-
ening and weakening in our reference scenarios.


5 Service comparison

Definition 15. The service comparison problem is defined as follows: given KB , Σ, and
two service names S1 and S2 , decide whether S1 vKB,Σ S2 .

    By translating service specifications into logic programming rules, service subsump-
tion checking can be reduced to containment of unions of conjunctive queries (UCQ)
against DL knowledge bases. In turn, this problem can be reduced to the evaluation of
UCQs against DL knowledge bases.


5.1 Rules and queries

Consider rules like A ← L1 , . . . , Ln where A is a logical atom, each Li is a literal (i.e.
either an atom or a negated atom), possibly of the form t = u or t 6= u. As usual, let
head(r) = A and body(r) = {L1 , . . . , Ln } . We restrict our attention to function-free
rules only: terms will be restricted to constants in In and variables.
    The predicates in body(r) may be defined in a DL knowledge base, i.e. unary and
binary predicates may belong to At and AR , respectively. If all the predicates occurring in
body(r) belong to At and AR and body(r) contains no occurrences of ¬, then we call r a
conjunctive query (CQ). A union of conjunctive queries (UCQ) is a set of CQs having the
same predicate name in the head. We add superscripts 6=, ¬ if the corresponding symbol
may occur in body(r); for example UCQ¬ denotes the unions of conjunctive queries that
may contain negative literals in the body.
     Let P be a set of rules and I be an interpretation. Let an I-substitution be a sub-
stitution that replaces each constant a by aI , and each variable with an element of ∆I .
I-substitutions are a useful tool for defining the semantics of rules and queries.
     Usually queries are evaluated against a knowledge base, and the answer is restricted
to the individual constants that explicitly occur in the ABox (e.g. see [8]). In particular, a
tuple c of constants is a certain answer of a CQ r against a DL KB K iff

 – the constants in c occur in K; moreover, for some substitution σ defined on the vari-
   ables of head(r),
 – head(rσ) has the form p(c);
 – for all models I of K, there exists an I-substitution θ such that every literal in
   body(rσθ) is satisfied by I, that is,
     • for all A(d) (resp. ¬A(d)) in body(r0 σθ), d ∈ AI (resp. d 6∈ AI ) ;
     • for all P (d, e) (resp. ¬P (d, e)) in body(r0 σθ), (d, e) ∈ P I (resp. (d, e) 6∈ P I ) ;
     • all literals d = e and d 6= e in body(r0 σθ) are true.

The set of all certain answers
                             S of a CQ r against K will be denoted by c ans(r, K). For a
UCQ Q, let c ans(r, K) = r∈Q c ans(r, K).
   In this paper, we will also query the models of a knowledge base and introduce what
we call unrestricted answers, that are built from the domain elements of the models.1
This definition applies to all sets of rules (not only CQs and UCQs).
   The I-reduct of P , P I , is the set of all rules r such that for some r0 ∈ P and some
I-substitution σ,

 – all literals belonging to body(r0 σ) whose predicate is in At ∪ AR ∪ {=, 6=} are satis-
   fied by I;
 – r is obtained from r0 σ by removing from body(r0 σ) all the literals whose predicate
   is in At ∪ AR ∪ {=, 6=}.

Note that the I-reduct of a UCQ is always a set of facts.
    We will denote by lm(P I ) the least Herbrand model of P I . The unrestricted answer
to a predicate p in P against I is u ans(p, P, I) = {c | p(c) ∈ lm(P I )} .


5.2 The reduction

We proceed by illustrating the tranlation of service specifications into logic programs.
Syntactically, such programs are like queries, but have the unrestricted semantics, like our
service descriptions; so they provide a nice intermediate step for the complete reduction
of service comparison to certain answers. In order to simplify the presentation, we assume
that service specifications are normalized by replacing subexpressions with new services,
 1
     This notion differs from the many hybrid combinations of rules and DLs (see [7] for a survey).
     The latter are still rather close to querying DL KBs and their answers are restricted to the con-
     stants occurring in the rules or in the KB. Moreover, the purpose is different: those combination
     are supposed to be knowledge representation formalisms, while our semantics is merely a tech-
     nical device to link service comparison to query answering against DL KBs.
so that no constructs are nested (all subexpressions are service names). We use further
service names to guarantee that if a dataflow graph has more than one functional node,
then all nodes are labelled with service names only. Finally, we assume that message
attributes are renamed so that different functional nodes never share any message attribute
name. Clearly, the above normalizations take polynomial time.
      Then for each service name S defined in the specification, we define an atom
pS (Xf1 , . . . , Xfm , Yg1 , . . . , Ygn ), where pS is a fresh predicate symbol, and f1 , . . . , fm
(resp. g1 , . . . , gn ) is the lexicografic ordering of in(S) (resp. out(S)). We denote the
above atom by HS .
      Now each service S whose dataflow graph has multiple functional nodes with labels
S1 , . . . , Sn , can be translated into one rule (HS ← HS1 , . . . , HSn )σ, where the substitu-
tion σ unifies all variables Ygi and Xfj such that some parameter node has an incoming
edge labelled with gi and an outgoing edge labelled fj .
      Next consider an S whose dataflow has a single functional node labelled E. If E
is union(S1 , . . . , Sn ) then S can be translated into n rules HS ← HSi (1 ≤ i ≤ n).
Symmetrically, if E = intersection(S1 , . . . , Sn ) then S can be translated into one
rule HS ← HS1 , . . . , HSn .
      When E = if c1 , . . . , cn then S1 else S2 , S is translated into the rules HS ←
[c1 ], . . . , [cn ], S1 and HS ← [c̄i ], S2 (1 ≤ i ≤ n). Here each ci is a condition and
[ci ] denotes its tranlation; c̄i denotes the complement of ci , e.g. if ci is x = y then c̄i
is x 6= y; if ci = A(x) then c̄i = ¬A(x). The translation [ci ] consists in turning each
message attribute f into the corresponding variable Xf .
      The translation of select a1 := r1 , . . . , an := rn from all A with p1 = t1 , . . .
pm = tm is HS ← A(Z), [a1 := r1 ], . . . , [an := rn ], [p1 = t1 ], . . . , [pm = tm ], where
Z is a fresh variable. Each [ai := ri ] consists of the translation of the role path ri into
a conjunction of binary atoms (using fresh variables at the intermediate steps), plus the
atom Yai = V , where V is the last variable introduced in the translation of ri . Similarly,
each [pi = ti ] consists of the translation of the role path pi plus u = V , where V is the
last variable introduced in the translation of pi , and u = ti if ti ∈ Nc , otherwise (i.e. if
ti ∈ Na ) u = Xti .

Example 1. In our running example, a condition like hasAddr.hasStr=street is
translated into hasAddr(Z, V1 ), hasStr(V1 , V2 ), Xstreet = V2 , where V1 , V2 are
new variables.

Due to space limitations we omit the (straightforward) translation of message restructur-
ing and restrictions.
    Let us denote the translation of a specification Σ with PΣ . The above translation is
pretty natural and it is not hard to see that it preserves the meaning of the given normal
specification under unrestricted query evaluation, as stated by the following theorem.

Theorem 1. Let Σ be a normalized service specification and let PΣ be its translation.
Let S be the name of a graph G ∈ Σ and f1 , . . . , fm (resp. g1 , . . . , gn ) be the lexico-
graphic ordering of in(S) (resp. out(S)).
     Then for all models W of Σ w.r.t. KB , (t1 , . . . , tm , u1 , . . . , un ) ∈ u ans(pS , PΣ , W)
iff for some message pair (m, m0 ) ∈ [S ]W , m(fi ) = ti and m0 (gj ) = uj (1 ≤ i ≤
m, 1 ≤ j ≤ n).
    The above result can be reformulated in terms similar to query containment. For
all predicates pS1 and pS2 , let pS1 ⊆KB,Σ pS2 iff for all models W of Σ w.r.t. KB ,
u ans(pS1 , PΣ , W) ⊆ u ans(pS2 , PΣ , W).
Corollary 1. For all normalized specifications Σ, S1 vKB,Σ S2 iff pS1 ⊆KB,Σ pS2 .

6 Complexity results
In this section we exploit Theorem 1 and the many recent complexity results on certain
query answers against DL knowledge bases to derive a preliminary set of complexity
bounds for our service description language.
    In order to illustrate decidable cases and complexity sources, we introduce a uniform
notation for the fragments of our service description language SDLfull :
 – SD restricts the language by forbidding union, else, negative conditions (such as
   r 6= t and ¬A(t)), and equality within conditions (equality is allowed in the with
   clause of selections);
 – superscripts u and e stand for union and else, respectively; when they are present,
   the language supports the corresponding constructs;
 – similarly, superscripts =, 6= and ¬ stand for conditions with equalities, disequalities
   and concept complements, respectively;
 – the superscript k imposes that the maximum nesting level of union and else is
   bounded by a constant k.
For example, SDu,6= stands for the sublanguage of SDLfull supporting union and condi-
tions with disequalities, but neither else nor negative conditions like ¬A(t). By SDk,u,6=
we denote a similar language, where the nesting level of union is bounded by a constant
k.
    In this preliminary paper, we adopt the following reduction to obtain a first set of
decidability results and complexity upper bounds:
 1. Service comparison in Σ is reduced to unrestricted answer containment in PΣ by
    Theorem 1; note that PΣ can be constructed in polynomial time from Σ;
 2. unrestricted answer containment is further reduced to unrestricted containment of
    CQ¬,6= /UCQ¬,6= by unfolding PΣ ; unfolding means that whenever an atom B in the
    body of some rule r unifies with the head of some rule r0 , then B is replaced with
    body(r0 ) (as in SLD resolution); the process is exhaustively repeated; if multiple
    rules r1 , . . . , rn unify with B, then r is replaced with all n possible rewritings; since
    PΣ is acyclic (because Σ is), the unfolding process terminates, however it may in-
    crease the size of PΣ exponentially when some predicates are defined by multiple
    rules;
 3. finally, if (the unfolding of) PΣ is positive (i.e., it contains no negations nor any
    disequality), then unrestricted answer containment in the unfolded version of PΣ is
    reduced to certain answering of CQs/UCQs against DL knowledge bases, see Theo-
    rem 2 below.
   Theorem 2 says that there exists a PTIME reduction of unrestricted CQ (resp. UCQ)
containment to the evaluation of certain answers of CQs (resp. UCQs) against DL knowl-
edge bases.
Theorem 2. Let Σ be a normalized specification and let PΣU be the unfolding of PΣ . For
i = 1, 2, let Qi be the definition of pSi , i.e. the set of rules r ∈ PΣU with pSi in head(r)
(where S1 and S2 are the names of two graphs in Σ).
    If PΣ is positive, then checking whether pS1 ⊆KB,Σ pS2 can be reduced in poly-
nomial time to evaluating for all q ∈ Q1 an answer c ans(Q2 , KB q ), where KB q is
obtained from KB by binding the variables in q to fresh constants, and adding the in-
stantiated body to KB ’s ABox as a set of assertions.
    More precisely, for each q ∈ Q1 , one has to check whether the tuple of fresh constants
assigned to the variables in head(q) belongs to c ans(Q2 , KB q ). Basically, the reduction
is centred around a form of skolemization.
    Note that this result is slightly different from the known relationships between query
answering and query containment, since pS1 ⊆KB,Σ pS2 is based on a nonstandard (un-
restricted) notion of evaluation, similar to the one used for service comparison.
    The above reduction suffices to derive complexity bounds for positive PΣ . Note that
PΣ is positive when else, 6=, and ¬ are not supported, that is, in SDu and its fragments.
When Σ is formulated in SDu , then the unfolding of PΣ may be exponentially larger, as
the translation of unions into rules introduces predicates defined by multiple rules. It is
not hard to see, however, that SDk,u specifications lead to unfoldings that are only poly-
nomially larger than PΣ . Then the above reduction steps tell us that complexity of service
comparison within SDk,u and its fragments is bounded by the complexity of computing
certain answers against DL KBs; for SDu there is a further exponential explosion due to
unfolding.
    The complexity of query answering is NP-complete for EL [9], EXPTIME-complete
for DLR [3], and co3NEXPTIME complete for SHIQ (cf. [6]). Moreover, query con-
tainment w.r.t. empty knowledge bases is NP-hard [4], and it is not difficult to see that
the complexity of the standard reasoning tasks in ALC with general TBoxes (EXPTIME-
complete) provides a lower bound to CQ answering against ALC KBs, so the upper
bounds for EL and ALC are strict. These observations support the following theorem:
Theorem 3. The complexity of service comparison in SD(X ) and SDk,u (X ) is
 – NP-complete for X = EL;
 – EXPTIME-complete for X ranging from ALC to DLR;
 – in co3NEXPTIME for X = SHIQ.
     If the underlying description logic supports unrestricted negation (or equivalently,
atomic negation and GCI), then negative literals in rule and query bodies (if any) can be
internalized in the KB in a simple way: just replace each literal ¬A(t) with Ā(t) where
Ā is a fresh atom, and extend the TBox with the axioms Ā v ¬A and ¬A v Ā. Internal-
ization makes it possible to support constructs such as negated conditions, disequalities,
and else, that introduces negation implicitly through the translations [c̄i ]. After remov-
ing negation from PΣ via internalization, we can exploit the available complexity results
for the extensions of ALC (that allow internalization).
Theorem 4. The complexity of service comparison in SD¬ (X ), and between SDk,e (X )
and SDk,u,e,¬ (X ) is
 – in EXPTIME for X = EL;
 – EXPTIME-complete for X ranging from ALC to DLR;
 – in co3NEXPTIME for X = SHIQ.
Remark 1. EL does not support negation, therefore internalization is not possible. In the
above theorem we inherit the upper bound for ALC, but whether this is a tight bound is
still an open question.
    From the above results, we derive upper complexity bounds for more general logics,
without any nesting bounds. In the absence of nesting bounds and in the presence of dis-
junctive constructs like unions and conditionals, the unfolding of PΣ may be exponential.
Theorem 5. The complexity of service comparison in SDu,e,¬ (X ) is
 – in 2-EXPTIME for X = EL;
 – in 2-EXPTIME for X ranging from ALC to DLR;
 – in 4-EXPTIME for X = SHIQ.
Also in this case, whether these bounds are tight is still an open question.
    Currently, we do not know whether SDLfull or even its fragment SD6= are decidable.
There exist some undecidability results for CQs and UCQs with disequalities, and we
conjecture they can be carried over to service comparison. This will be a subject for
future work.


                                      EL                ALC
                                                        DLR                 SHIQ
                           k,u
                SD, SD       NP-complete EXPTIME-complete in co3NEXPTIME
                      SD¬ (in EXPTIME) EXPTIME-complete in co3NEXPTIME
           SDk,e , SDk,e,¬ (in EXPTIME) EXPTIME-complete in co3NEXPTIME
         SDk,u,e , SDk,u,e,¬
         SDu , SDe , SDu,e (in 2-EXPTIME) in 2-EXPTIME      in 4-EXPTIME
     SD , SDe,¬ , SDu,e,¬
       u,¬


                     Table 1. Some complexity results for decidable cases


7 Related work
The language introduced in [1], SDL(X ), was based on an embedding of service compar-
ison into subsumption in an expressive description logic, µALCIO. With the reduction
to query containment we adopt here, it is possible to support service intersection and
dataflow graphs, even if they violate the quasi-forest structure of µALCIO. Moreover,
we provide an articulated complexity analysis not available in [1].
    The idea of formalizing services as queries has been first introduced in [5]. The lan-
guage adopted there is simpler than ours: only one construct combining our selection
and restriction, and a form of composition where output and input messages must per-
fectly match. The semantics of services in [5] is restricted to the constants occurring in a
KB rather than domain elements. Furthermore, all upper bounds provided there are EX-
PTIME or beyond. Currently our NP bounds for EL identify the most efficient service
description logics in the literature. Moreover, even in the hardest cases, our language is
never more complex than [5].
    In OWL-S, services are described by means of preconditions, postconditions, and
add/delete lists. Pre- and postconditions are like ABoxes; add/delete lists specify the side
effects of the services. The same mechanism can describe functional services. WSMO is
built upon an articulated model, including user roles and goals, that lead to a planning-
like view of service composition. In WSDL-S, WSDL service specifications (that are
basically type definitions) are bound to concepts defined in an underlying ontology. No
good computational results are currently available for any of the above standards.

8 Conclusions and future work
SDLfull and its fragments are rich service description languages that—however—enjoy
numerous decidability results (reported in Table 1), and in some case (SDk,u (EL)) ser-
vice comparison is significantly less complex than in previous competing logics. Encour-
aging experimental results are available for an analogous problem [2]. We are planning
an experimental implementation based on the same technology.
    Many issues need further work, here we mention just the main open problems. Au-
tomated service composition needs efficient heuristics for quickly selecting promising
candidate dataflows. The bounds for EL reported in parentheses in Table 1 are simply
inherited from more complex logic and it is not obvious whether they are tight. Dise-
qualities and negation over roles would be helpful, but the undecidability results of [8]
warn that some restrictions may be needed. It would also be interesting to check whether
service comparison can be in NP also for other low-complexity logics such as the DL-lite
family.

References
1. Piero A. Bonatti. Towards service description logics. In JELIA, LNCS 2424, pages 74–85.
   Springer, 2002.
2. Piero A. Bonatti and F. Mogavero. Comparing rule-based policies. In 9th IEEE Int. Work. on
   Policies for Distributed Systems and Networks (POLICY 2008), pages 11–18, 2008.
3. D. Calvanese, G. De Giacomo, and M. Lenzerini. Conjunctive query containment and answering
   under description logic constraints. ACM Trans. Comput. Log., 9(3), 2008.
4. A.K. Chandra and P.M. Merlin. Optimal implementation of conjunctive queries in relational
   data bases. In Proc. Ninth Annual ACM Symp. on Theory of Computing, pages 77–90, 1976.
5. D. Hull, E. Zolin, A. Bovykin, I. Horrocks, U. Sattler, and R. Stevens. Deciding semantic
   matching of stateless services. In Proc. of AAAI 2006. AAAI Press, 2006.
6. Magdalena Ortiz, Diego Calvanese, and Thomas Eiter. Data complexity of query answering in
   expressive description logics via tableaux. J. of Automated Reasoning, 41(1):61–98, 2008.
7. Riccardo Rosati. Integrating ontologies and rules: Semantic and computational issues. In Rea-
   soning Web, LNCS 4126, pages 128–151. Springer, 2006.
8. Riccardo Rosati. The limits of querying ontologies. In ICDT, LNCS 4353, pages 164–178.
   Springer, 2007.
9. Riccardo Rosati. On conjunctive query answering in EL. In Description Logics. CEUR-WS.org,
   2007.