A semantic stateless service description language P. A. Bonatti and L. Sauro Università di Napoli Federico II Abstract. Complexity issues and the requirements on semantic web application in the Life Science domains recently motivated a few works on stateless service description languages [1, 5]. With stateless services, it is possible to reason about the semantic relationships between inputs and outputs, while keeping matchmaking and composition decidable. In this paper we extend the languages introduced in [1] and [5] with more general forms of composition and other constructs. We provide formal syntax and semantics and some preliminary results on the complexity of service comparison. These complexity results rely on hybrid formalisms involving both logic programming rules and description logics. 1 Introduction The area of semantic web services is concerned with the declarative, knowledge based specification of web service semantics applied to service matchmaking (i.e., finding a ser- vice that matches a given specification), verification and automated composition. There is a conspicuous literature on the topic, enriched by several competing standards, such as OWL-S, WSMO, and WSDL-S. When the semantic description involves dynamic behavioral aspects such as itera- tions, the tasks of matchmaking and composition easily become undecidable. This mo- tivated a few works on stateless services [1, 5], that behave like functions or database queries. With stateless services, it is possible to move beyond a mere description of input and output types and capture the relationships between inputs and outputs, while keeping matchmaking and composition decidable. Stateless services are interesting because they are common in the domain of Life Sciences [5]. Moreover, they can be paired with a workflow language supporting procedural constructs like BPEL4WS with the purpose of supporting the dynamic binding of atomic activities. In this paper we extend the languages introduced in [1] and [5] with more general forms of composition and other constructs. We provide formal syntax and semantics and some preliminary results on the complexity of service comparison, a basic reasoning task that underlies both matchmaking and composition (cf. [1]). These complexity re- sults rely on hybrid formalisms involving both rules and description logics. The language we adopt admits a graphic presentation (that may be appreciated by users with limited programming skills) as well as textual representation that resembles relational query and programming languages enough to be familiar to programmers. We start with some examples (Sec. 2) followed by a brief summary of description logic notions (Sec. 3). Then we formalize our service description logic language SDLfull (Sec. 4). Service comparison is reduced to an intermediate logic programming formula- tion and then to queries against description logic knowledge bases in Sec. 5, which allow to derive complexity results (Sec.6). 2 A running example Services receive input messages and return output messages. Such messages are struc- tured objects (as in WSDL), consisting of a set of attribute-value pairs, such as {street="Via Toledo", numb=128}. Following [1], we assume that services can be like queries, that is, a single input message may be mapped onto a set of homogeneously structured output messages. Formally this means that a service can be abstracted by any set of pairs (min , mout ), with multiple pairs sharing the same min . Now assume an underlying ontology defines the concepts Place, Map, Coord, and Address, and that every Place has the attributes hasAddr, hasMap, hasCoord. In turn each address has the attributes hasCity, hasStr and hasNum. Consider a service Mapservice that takes input messages with attributes city and street, and returns the map of the surrounding area in a message with the single field result. Mapservice can be described in our language with the following expression: select result:=hasMap from all Place with hasAddr.hasCity = city, hasAddr.hasStr = street . This description can be easily adapted to describe similar services. For example, a specialized map service that works only for southern cities can be described by defining a concept SouthernCity in the underlying ontology and restricting Mapservice with the expression: Mapservice restricted to SouthernCity(city) . Portals can be described with unions. Given two map services for Europe and China, called Euromap and Chinamap, a portal that covers both areas can be described by: union (Euromap, Chinamap). Intersections are supported, too. Now suppose that Euromap is more reliable than the generic Mapservice, then it may be preferable to use Euromap when possible. This can be done with conditionals (temporarily assume that Euromap and Chinamap have the same input message type as Mapservice, with the city field): if EuropeanCity(city) then Euromap else Mapservice . A relevant task is composition, our framework supports composition through dataflow graphs by which the output of some services can be fed as input to other services. For example, let Addr2coor be a service that takes city and street and returns the as- sociated coordinates lat and lon; then let Coor2map be a service that returns the map associated to the given coordinates, called latitude and latitude by this service. The composition of these two services can be specified with the dataflow graph in Fig. 1. We support also a textual representation: CompoundMap: in city, street out result C := Addr2coor(in) out := Coor2map(latitude:=C.lat, longitude:=C.lon) . In order to combine different services it may be necessary to adapt and restructure their inputs and outputs (e.g. consider the above example for conditionals when Euromap and Chinamap have different input message types). Here is an example of a variant of Fig. 1. A dataflow graph Mapservice whose input city is forced to be Naples (a constant in the knowledge base), and whose output is renamed: RestructuredMapServ: in street out map C := Mapservice (street:=in.street, city:=Naples) out.map:=C.result . In general we allow a message element to be fed as an input to multiple other services, so dataflow graphs can be arbitrary DAGs. This was not allowed in [1] Our framework allows to reason about different specifications. The basic reasoning task is service comparison, that given two service descriptions S1 and S2 checks whether all the input-output message pairs in the semantics of S1 are also in the semantics of S2 ; in that case we write S1 vKB,Σ S2 , where KB is the underlying ontology and Σ con- tains the service definitions. By comparing services one may look for stronger or weaker services (cf. [1]). If Addr2coor and Coor2map are correctly specified (say, with select expressions), then our framework can verify that CompoundMap vKB,Σ Mapservice and Mapservice vKB,Σ CompoundMap, thereby concluding that in the absence of a direct implementation of Mapservice, an equivalent service can be obtained by composing the implementations of Addr2coor and Coor2map as specified by CompoundMap (dynamic service replacement). Service comparison can also be a basis for automated composition that, however, lies beyond the scope of this paper. Syntactically speaking, the service description language illustrated above lies some- where in between relational algebra and a programming language. A major difference with respect to both is that descriptions are linked to an ontology, so it is possible to distinguish—say—a hash table that associates people with their age from another hash table (with the same implementation) that associates people with their credit card num- ber. Clearly, such differences are crucial for tasks such as service discovery and dynamic binding of workflow activities to services. Procedural constructs cover assignments and conditionals; only iterations are not supported, and this has a few advantages: (i) the main reasoning tasks are decidable, (ii) the language is easier to use for people with no programming background. 3 Preliminaries The vocabulary of the description logics we deal with in this paper consists of the fol- lowing pairwise disjoint countable sets of symbols: a set of atomic concepts At, a set of individual names In, and a set of atomic roles AR , with a distinguished subset of names AtR ⊆ AR denoting transitive roles. A role is either an expression P or P − , where P ∈ AR . Let R range over roles. The set of concepts is the smallest superset of At such that if C, D are concepts, then >, ¬C, C u D, ∃.C, and ∃≤n R.C are concepts. Semantics is based on interpretations of the form I = h∆I , ·I i where ∆I is a set of individuals and ·I is an interpretation function mapping each A ∈ At on some AI ⊆ ∆I , each a ∈ In on some aI ∈ ∆I , and each R ∈ AR on some RI ⊆ ∆I × ∆I . Moreover, if R ∈ AtR , then RI is transitive. The meaning of inverse roles is (R− )I = {hy, xi | hx, yi ∈ RI } . Next we define the meaning of compound concepts. By ] S we denote the cardinality of S. AIρ = AI (A ∈ At) >I = ∆I (¬C)Iρ = ∆I \ CρI (C u D)Iρ = CρI ∩ DρI © ª (∃R.C)Iρ = x | ∃y.hx, yi ∈ RI ∧ y ∈ CρI © ª (∃≤n R.C)Iρ = x | ] {y | hx, yi ∈ RI ∧ y ∈ CρI } ≤ n . Other standard constructs (∀R.C, ⊥, t) can be derived from the above concepts. A general concept inclusion (GCI) is an expression C v D where C and D are concepts. A role inclusion is an expression R1 v R2 where R1 and R2 are roles. An assertion is an atom like A(a) or P (a, b) where A ∈ At, R ∈ AR , and {a, b} ∈ In . A TBox is a set of GCIs; a role hierarchy is a set of role inclusions; an ABox is a set of assertions. Finally, a DL knowledge base (DL KB) is a triple hT , H, Ai consisting of a TBox, a role hierarchy and an ABox. An interpretation I satisfies a (concept or role) inclusion E1 v E2 iff E1I ⊆ E2I . Moreover I satisfies an assertion A(a) (resp. P (a, b)) iff aI ∈ AI (resp. (aI , bI ) ∈ P I ). A model of a DL KB is any I that satisfies all the inclusions and the assertions of the KB. The above description logic is known as SHIQR+ . By disallowing transitive roles we get SHIQ. By disallowing ∃≤n R.C, transitive roles and role hierarchies one gets the logic ALCI. ALC is obtained by further dropping inverse roles. The logic EL supports only u, ∃R.>, and GCIs built from these constructs. Moreover, there exists a rather different extension of ALC called DLR, supporting n-ary relations (n > 2) that we will mention in the following but we do not report here due to space limitations. Its definition and relevant results can be found in [3]. 4 Syntax and semantics of SDLfull Our service description language, called SDLfull , extends the DL vocabulary with an in- finite supply of constants Nc , service names Ns , and message attribute names Na . SDLfull describes functional and knowledge-based aspects of web-services. Therefore, as usual functional programming languages, it does not define a service as a set of state variables and a sequence of statements which update them, but as the functional composition of stateless expressions that have to be evaluated. Definition 1. The language of service expressions is the least set Expr containing: – (service calls) all S ∈ Ns ; – (set operators) all expressions op(E1 , . . . , En ) such that {E1 , . . . , En } ⊆ Expr and op ∈ {union, intersection}; – (conditionals) all expressions if L then E1 [else E2 ] (the else clause is optional) such that • {E1 , E2 } ⊆ Expr, and • L is a list of conditions of the form t = u, t 6= u, A(t), or ¬A(t), where {t, u} ⊆ Nc ∪ Na – (selections) all expressions select a1 := r1 , . . . an := rn from all D with L, such that • ai ∈ Na (1 ≤ i ≤ n) ; • ri is a role path (in the language of the underlying ontology) (1 ≤ i ≤ n) ; • D is a concept (in the language of the underlying ontology); • L is a list of bindings pi = ti (1 ≤ i ≤ m) where each pi is a role path (in the language of the underlying ontology), and ti ∈ Nc ∪ Na ; – (message restructuring) all expressions a1 := t1 , . . . an := tn such that ai ∈ Na and ti ∈ Na ∪ Nc (1 ≤ i ≤ n) ; – (restrictions) all expressions E restricted to L such that L is a list of conditions (see conditionals above). A service consists of a dataflow graph which evaluates data by means of functional nodes. Each functional node represents a stateless expression which may have multiple inputs and outputs denoted by parameter names. Edges in a dataflow graph are used to connect the output of a functional node with the inputs of (possibly many) other func- tional nodes. In order to specify which output is connected to which input, edges are also labeled with attribute names and, as the inputs and the outputs of different expressions may be labeled with different parameter names, edges do not connect directly two func- tional nodes, but connect functional nodes with parameter nodes that are intended to fix name mismatches. Dataflow graphs are defined as follows: Definition 2. A dataflow graph with name S is a tuple hS, NS , ES , name S , expr S i where – S ∈ Ns ; – NS is a finite set of nodes, partitioned into functional and parameter nodes, denoted by fun(NS ) and par(NS ), respectively; – ES is a finite set of edges; ES ⊆ (fun(NS ) × par(NS )) ∪ (par(NS ) × fun(NS )); – name S : par(NS ) ∪ ES → Na is a labelling function; – expr S : fun(NS ) → Expr is a labelling function. Moreover, dataflow graphs are required to be directed acyclic graphs (DAGs). The parameter nodes with no incoming edges (resp. no outgoing edges) will be called the input nodes (resp. output nodes) of the graph. In Fig. 1, ovals and small circles represent functional and parameter nodes, respectively; input and output nodes are colored in gray. The dependency graph of a set of dataflow graphs Σ is hΣ, Ei, where E is the set of all pairs (G1 , G2 ) such that the name of G2 occurs in the label of some functional node in G1 . We say that Σ is acyclic if its dependency graph is. Definition 3. A service specification Σ is a finite, acyclic set of dataflow graphs with mutually different names. Edge labels should match the input/output message attributes of the service expres- sions labeling functional nodes. This requirement is formalized in terms of typing. In this paper we only deal with a structural form of typing (centred around message attribute names); the problem of ensuring–say—that the connected input/output attributes lat and latitude in Fig. 1 belong respectively to two “compatible” concepts C1 and C2 such that C1 v C2 has already been tackled in the literature (including [5]). We will deal with it in the full paper. Definition 4. A (message) type is a finite set T ⊆ Na Definition 5. The input type of a dataflow graph G = hS, NS , ES , name S , expr S i with respect to a specification Σ is the set inΣ (G) = {name S (n) | n is an input node of G} . The output type of a dataflow graph G = hS, NS , ES , name S , expr S i with respect to a specification Σ is the set outΣ (G) = {name S (n) | n is an output node of G} . Definition 6. The input type of a service expression E with respect to a specification Σ, denoted by inΣ (E), is recursively specified as follows: – if E = S ∈ Ns , then inΣ (E) equals inΣ (G) where G has name S; – inΣ (op(E1 , E2 )) = inΣ (E1 ) ∪ inΣ (E2 ); – inΣ (if C then E1 else E2 ) = inΣ (E1 ) ∪ inΣ (E2 ) ∪ {a ∈ Na | a occurs in C}; – inΣ (select A from all D with R) = {a ∈ Na | a occurs in R}; – inΣ (a1 := t1 , . . . an := tn ) = Na ∩ {t1 , . . . tn }. The output type of a service expression E with respect to a specification Σ, denoted by outΣ (E), is recursively specified as follows: – if E = S ∈ Ns , then outΣ (E) equals outΣ (G) where G has name S; – outΣ (op(E1 , E2 )) = outΣ (E1 ) ∩ outΣ (E2 ); – outΣ (if C then E1 else E2 ) = outΣ (E1 ) ∩ outΣ (E2 ); – outΣ (select a1 := r1 , . . . an := rn from all D with R) = {a1 , . . . an }; – outΣ (a1 := t1 , . . . an := tn ) = {a1 , . . . an }. About the above definition: Intuitively, all input parameters have to be supplied in order to call a service; therefore if the components of a compound service have different input types, then the compound service must take their union to be sure that all component services can be invoked. Symmetrically, the only outputs one can count on are those returned by all the component services; this is why intersection is used here. Definition 7. A specification Σ is well-typed iff for all dataflow graphs hS, NS , ES , name S , expr S i ∈ Σ, and for all functional nodes k ∈ fun(NS ), – in(k) equals the set of labels of the incoming edges of k; – out(k) contains the set of labels of the outgoing edges of k. From now on we assume that all service specifications are well-typed unless stated oth- erwise. The semantics of service expressions and dataflow graphs is defined in terms of worlds that specify the extension of concepts and roles, as well as the behavior of each service. From a semantic perspective, a message is a partial function defined over the message’s attributes, that returns for each attribute its value. Definition 8. A ∆-message is a partial function m : Na → ∆ . The message’s range ∆ will sometimes be omitted when irrelevant or obvious. Now a world is simply a combination of a DL interpretation (that interprets the terms defined in the underlying ontology) plus an interpretation of service names (i.e. atomic services). Definition 9. A world is a tuple W = h∆W , ·W , [·]W i such that – h∆W , ·W i is an interpretation of the knowledge base; – [·]W maps every service name S ∈ Ns on a set [S ]W of ∆W -message pairs. To ensure that service name evaluation reflects the given service specification, we have to specify the semantics of the terms and expressions used in dataflow graph labels. Definition 10. The evaluation tW (m) of a term t ∈ Nc ∪ Na with respect to a world W and a message m, is m(t) if t ∈ Na , and tW otherwise. Definition 11. The evaluation E W (m) of a service expression E with respect to a world W and a message m is recursively defined as follows: – if E = S ∈ Ns , then E W (m) = {m0 | (m, m0 ) ∈ [S ]W } ; – union(E1 , E2 )W (m) = E1W (m) ∪ E2W (m) ; – intersection(E1 , E2 )W (m) = E1W (m) ∩ E2W (m) ; – (if C then E1 else E2 )W (m) = E1W (m) if C W (m) is true, E2W (m) otherwise; moreover, C W (m) is true iff • for all t ¯ u in C, tW (m) ¯ uW (m) holds (¯ ∈ {=, 6=}), • and for all literals A(t) and ¬B(u) in C, tW (m) ∈ AW and uW (m) 6∈ B W ; – (select a1 := r1 , . . . an := rn from all D with R)W (m) is the set of all m0 such that, for some x ∈ DW , • for all r ¯ t in R there exists y ∈ rW (x) such that y ¯ tW (m) holds (¯ ∈ {= , 6=}); • m0 (ai ) ∈ riW (x) (1 ≤ i ≤ n); m0 is undefined in every other case; – (a1 := t1 , . . . an := tn )W (m) = {m0 } where the domain of m0 is a1 , . . . an and m0 (ai ) = m(ti ) (1 ≤ i ≤ n) . The evaluation of service compositions (i.e. dataflow graphs) is defined in a declar- ative way: each parameter node must be assigned a value (an element of ∆W ) in a way that is compatible with the input-output behavior of each functional node: Definition 12. The evaluation [G]W of a graph G = hS, NS , ES , name S , expr S i w.r.t. W is the set of all ∆W -message pairs (min , mout ) such that for some function σ : par(NS ) → ∆W , the following conditions hold: – for all input nodes n ∈ NS , min (name S (n)) = σ(n); – for all output nodes n ∈ NS , mout (name S (n)) = σ(n); – min and mout are undefined for every other attribute name; – for all n ∈ fun(NS ), it must hold that mnout ∈ expr S (n)W (mnin ), where mnin and mnout are defined as follows: for all a ∈ Na , • if there exists an edge (n0 , n) with name S (n0 , n) = a, let mnin (a) = σ(n0 ), • if there exists an edge (n, n00 ) with name S (n, n00 ) = a, let mnout (a) = σ(n00 ), • mnin and mnout are undefined for all other inputs. Definition 13. A world W is a model of a specification Σ with respect to a knowledge base KB iff 1. h∆W , ·W i is a model of KB ; 2. for all names S of a dataflow graphs G ∈ Σ, [S ]W = [G]W . If W is a model of Σ, then it is not hard to see that since Σ is acyclic (by definition), [·]W is uniquely determined by h∆W , ·W i (i.e. service specifications are deterministic). The next definition specifies when a service S1 is a weakening of S2 (equivalently, S2 is a strengthening of S1 ) [1]. These relations are the basis for service comparison. Definition 14. S1 vKB,Σ S2 iff for all models W of Σ w.r.t. KB , [S1 ]W ⊆ [S2 ]W . Roughly speaking, if S2 is a strengthening of S1 , then for any given input, S2 returns more answers than S1 . See [1] for a discussion of the different applications of strength- ening and weakening in our reference scenarios. 5 Service comparison Definition 15. The service comparison problem is defined as follows: given KB , Σ, and two service names S1 and S2 , decide whether S1 vKB,Σ S2 . By translating service specifications into logic programming rules, service subsump- tion checking can be reduced to containment of unions of conjunctive queries (UCQ) against DL knowledge bases. In turn, this problem can be reduced to the evaluation of UCQs against DL knowledge bases. 5.1 Rules and queries Consider rules like A ← L1 , . . . , Ln where A is a logical atom, each Li is a literal (i.e. either an atom or a negated atom), possibly of the form t = u or t 6= u. As usual, let head(r) = A and body(r) = {L1 , . . . , Ln } . We restrict our attention to function-free rules only: terms will be restricted to constants in In and variables. The predicates in body(r) may be defined in a DL knowledge base, i.e. unary and binary predicates may belong to At and AR , respectively. If all the predicates occurring in body(r) belong to At and AR and body(r) contains no occurrences of ¬, then we call r a conjunctive query (CQ). A union of conjunctive queries (UCQ) is a set of CQs having the same predicate name in the head. We add superscripts 6=, ¬ if the corresponding symbol may occur in body(r); for example UCQ¬ denotes the unions of conjunctive queries that may contain negative literals in the body. Let P be a set of rules and I be an interpretation. Let an I-substitution be a sub- stitution that replaces each constant a by aI , and each variable with an element of ∆I . I-substitutions are a useful tool for defining the semantics of rules and queries. Usually queries are evaluated against a knowledge base, and the answer is restricted to the individual constants that explicitly occur in the ABox (e.g. see [8]). In particular, a tuple c of constants is a certain answer of a CQ r against a DL KB K iff – the constants in c occur in K; moreover, for some substitution σ defined on the vari- ables of head(r), – head(rσ) has the form p(c); – for all models I of K, there exists an I-substitution θ such that every literal in body(rσθ) is satisfied by I, that is, • for all A(d) (resp. ¬A(d)) in body(r0 σθ), d ∈ AI (resp. d 6∈ AI ) ; • for all P (d, e) (resp. ¬P (d, e)) in body(r0 σθ), (d, e) ∈ P I (resp. (d, e) 6∈ P I ) ; • all literals d = e and d 6= e in body(r0 σθ) are true. The set of all certain answers S of a CQ r against K will be denoted by c ans(r, K). For a UCQ Q, let c ans(r, K) = r∈Q c ans(r, K). In this paper, we will also query the models of a knowledge base and introduce what we call unrestricted answers, that are built from the domain elements of the models.1 This definition applies to all sets of rules (not only CQs and UCQs). The I-reduct of P , P I , is the set of all rules r such that for some r0 ∈ P and some I-substitution σ, – all literals belonging to body(r0 σ) whose predicate is in At ∪ AR ∪ {=, 6=} are satis- fied by I; – r is obtained from r0 σ by removing from body(r0 σ) all the literals whose predicate is in At ∪ AR ∪ {=, 6=}. Note that the I-reduct of a UCQ is always a set of facts. We will denote by lm(P I ) the least Herbrand model of P I . The unrestricted answer to a predicate p in P against I is u ans(p, P, I) = {c | p(c) ∈ lm(P I )} . 5.2 The reduction We proceed by illustrating the tranlation of service specifications into logic programs. Syntactically, such programs are like queries, but have the unrestricted semantics, like our service descriptions; so they provide a nice intermediate step for the complete reduction of service comparison to certain answers. In order to simplify the presentation, we assume that service specifications are normalized by replacing subexpressions with new services, 1 This notion differs from the many hybrid combinations of rules and DLs (see [7] for a survey). The latter are still rather close to querying DL KBs and their answers are restricted to the con- stants occurring in the rules or in the KB. Moreover, the purpose is different: those combination are supposed to be knowledge representation formalisms, while our semantics is merely a tech- nical device to link service comparison to query answering against DL KBs. so that no constructs are nested (all subexpressions are service names). We use further service names to guarantee that if a dataflow graph has more than one functional node, then all nodes are labelled with service names only. Finally, we assume that message attributes are renamed so that different functional nodes never share any message attribute name. Clearly, the above normalizations take polynomial time. Then for each service name S defined in the specification, we define an atom pS (Xf1 , . . . , Xfm , Yg1 , . . . , Ygn ), where pS is a fresh predicate symbol, and f1 , . . . , fm (resp. g1 , . . . , gn ) is the lexicografic ordering of in(S) (resp. out(S)). We denote the above atom by HS . Now each service S whose dataflow graph has multiple functional nodes with labels S1 , . . . , Sn , can be translated into one rule (HS ← HS1 , . . . , HSn )σ, where the substitu- tion σ unifies all variables Ygi and Xfj such that some parameter node has an incoming edge labelled with gi and an outgoing edge labelled fj . Next consider an S whose dataflow has a single functional node labelled E. If E is union(S1 , . . . , Sn ) then S can be translated into n rules HS ← HSi (1 ≤ i ≤ n). Symmetrically, if E = intersection(S1 , . . . , Sn ) then S can be translated into one rule HS ← HS1 , . . . , HSn . When E = if c1 , . . . , cn then S1 else S2 , S is translated into the rules HS ← [c1 ], . . . , [cn ], S1 and HS ← [c̄i ], S2 (1 ≤ i ≤ n). Here each ci is a condition and [ci ] denotes its tranlation; c̄i denotes the complement of ci , e.g. if ci is x = y then c̄i is x 6= y; if ci = A(x) then c̄i = ¬A(x). The translation [ci ] consists in turning each message attribute f into the corresponding variable Xf . The translation of select a1 := r1 , . . . , an := rn from all A with p1 = t1 , . . . pm = tm is HS ← A(Z), [a1 := r1 ], . . . , [an := rn ], [p1 = t1 ], . . . , [pm = tm ], where Z is a fresh variable. Each [ai := ri ] consists of the translation of the role path ri into a conjunction of binary atoms (using fresh variables at the intermediate steps), plus the atom Yai = V , where V is the last variable introduced in the translation of ri . Similarly, each [pi = ti ] consists of the translation of the role path pi plus u = V , where V is the last variable introduced in the translation of pi , and u = ti if ti ∈ Nc , otherwise (i.e. if ti ∈ Na ) u = Xti . Example 1. In our running example, a condition like hasAddr.hasStr=street is translated into hasAddr(Z, V1 ), hasStr(V1 , V2 ), Xstreet = V2 , where V1 , V2 are new variables. Due to space limitations we omit the (straightforward) translation of message restructur- ing and restrictions. Let us denote the translation of a specification Σ with PΣ . The above translation is pretty natural and it is not hard to see that it preserves the meaning of the given normal specification under unrestricted query evaluation, as stated by the following theorem. Theorem 1. Let Σ be a normalized service specification and let PΣ be its translation. Let S be the name of a graph G ∈ Σ and f1 , . . . , fm (resp. g1 , . . . , gn ) be the lexico- graphic ordering of in(S) (resp. out(S)). Then for all models W of Σ w.r.t. KB , (t1 , . . . , tm , u1 , . . . , un ) ∈ u ans(pS , PΣ , W) iff for some message pair (m, m0 ) ∈ [S ]W , m(fi ) = ti and m0 (gj ) = uj (1 ≤ i ≤ m, 1 ≤ j ≤ n). The above result can be reformulated in terms similar to query containment. For all predicates pS1 and pS2 , let pS1 ⊆KB,Σ pS2 iff for all models W of Σ w.r.t. KB , u ans(pS1 , PΣ , W) ⊆ u ans(pS2 , PΣ , W). Corollary 1. For all normalized specifications Σ, S1 vKB,Σ S2 iff pS1 ⊆KB,Σ pS2 . 6 Complexity results In this section we exploit Theorem 1 and the many recent complexity results on certain query answers against DL knowledge bases to derive a preliminary set of complexity bounds for our service description language. In order to illustrate decidable cases and complexity sources, we introduce a uniform notation for the fragments of our service description language SDLfull : – SD restricts the language by forbidding union, else, negative conditions (such as r 6= t and ¬A(t)), and equality within conditions (equality is allowed in the with clause of selections); – superscripts u and e stand for union and else, respectively; when they are present, the language supports the corresponding constructs; – similarly, superscripts =, 6= and ¬ stand for conditions with equalities, disequalities and concept complements, respectively; – the superscript k imposes that the maximum nesting level of union and else is bounded by a constant k. For example, SDu,6= stands for the sublanguage of SDLfull supporting union and condi- tions with disequalities, but neither else nor negative conditions like ¬A(t). By SDk,u,6= we denote a similar language, where the nesting level of union is bounded by a constant k. In this preliminary paper, we adopt the following reduction to obtain a first set of decidability results and complexity upper bounds: 1. Service comparison in Σ is reduced to unrestricted answer containment in PΣ by Theorem 1; note that PΣ can be constructed in polynomial time from Σ; 2. unrestricted answer containment is further reduced to unrestricted containment of CQ¬,6= /UCQ¬,6= by unfolding PΣ ; unfolding means that whenever an atom B in the body of some rule r unifies with the head of some rule r0 , then B is replaced with body(r0 ) (as in SLD resolution); the process is exhaustively repeated; if multiple rules r1 , . . . , rn unify with B, then r is replaced with all n possible rewritings; since PΣ is acyclic (because Σ is), the unfolding process terminates, however it may in- crease the size of PΣ exponentially when some predicates are defined by multiple rules; 3. finally, if (the unfolding of) PΣ is positive (i.e., it contains no negations nor any disequality), then unrestricted answer containment in the unfolded version of PΣ is reduced to certain answering of CQs/UCQs against DL knowledge bases, see Theo- rem 2 below. Theorem 2 says that there exists a PTIME reduction of unrestricted CQ (resp. UCQ) containment to the evaluation of certain answers of CQs (resp. UCQs) against DL knowl- edge bases. Theorem 2. Let Σ be a normalized specification and let PΣU be the unfolding of PΣ . For i = 1, 2, let Qi be the definition of pSi , i.e. the set of rules r ∈ PΣU with pSi in head(r) (where S1 and S2 are the names of two graphs in Σ). If PΣ is positive, then checking whether pS1 ⊆KB,Σ pS2 can be reduced in poly- nomial time to evaluating for all q ∈ Q1 an answer c ans(Q2 , KB q ), where KB q is obtained from KB by binding the variables in q to fresh constants, and adding the in- stantiated body to KB ’s ABox as a set of assertions. More precisely, for each q ∈ Q1 , one has to check whether the tuple of fresh constants assigned to the variables in head(q) belongs to c ans(Q2 , KB q ). Basically, the reduction is centred around a form of skolemization. Note that this result is slightly different from the known relationships between query answering and query containment, since pS1 ⊆KB,Σ pS2 is based on a nonstandard (un- restricted) notion of evaluation, similar to the one used for service comparison. The above reduction suffices to derive complexity bounds for positive PΣ . Note that PΣ is positive when else, 6=, and ¬ are not supported, that is, in SDu and its fragments. When Σ is formulated in SDu , then the unfolding of PΣ may be exponentially larger, as the translation of unions into rules introduces predicates defined by multiple rules. It is not hard to see, however, that SDk,u specifications lead to unfoldings that are only poly- nomially larger than PΣ . Then the above reduction steps tell us that complexity of service comparison within SDk,u and its fragments is bounded by the complexity of computing certain answers against DL KBs; for SDu there is a further exponential explosion due to unfolding. The complexity of query answering is NP-complete for EL [9], EXPTIME-complete for DLR [3], and co3NEXPTIME complete for SHIQ (cf. [6]). Moreover, query con- tainment w.r.t. empty knowledge bases is NP-hard [4], and it is not difficult to see that the complexity of the standard reasoning tasks in ALC with general TBoxes (EXPTIME- complete) provides a lower bound to CQ answering against ALC KBs, so the upper bounds for EL and ALC are strict. These observations support the following theorem: Theorem 3. The complexity of service comparison in SD(X ) and SDk,u (X ) is – NP-complete for X = EL; – EXPTIME-complete for X ranging from ALC to DLR; – in co3NEXPTIME for X = SHIQ. If the underlying description logic supports unrestricted negation (or equivalently, atomic negation and GCI), then negative literals in rule and query bodies (if any) can be internalized in the KB in a simple way: just replace each literal ¬A(t) with Ā(t) where Ā is a fresh atom, and extend the TBox with the axioms Ā v ¬A and ¬A v Ā. Internal- ization makes it possible to support constructs such as negated conditions, disequalities, and else, that introduces negation implicitly through the translations [c̄i ]. After remov- ing negation from PΣ via internalization, we can exploit the available complexity results for the extensions of ALC (that allow internalization). Theorem 4. The complexity of service comparison in SD¬ (X ), and between SDk,e (X ) and SDk,u,e,¬ (X ) is – in EXPTIME for X = EL; – EXPTIME-complete for X ranging from ALC to DLR; – in co3NEXPTIME for X = SHIQ. Remark 1. EL does not support negation, therefore internalization is not possible. In the above theorem we inherit the upper bound for ALC, but whether this is a tight bound is still an open question. From the above results, we derive upper complexity bounds for more general logics, without any nesting bounds. In the absence of nesting bounds and in the presence of dis- junctive constructs like unions and conditionals, the unfolding of PΣ may be exponential. Theorem 5. The complexity of service comparison in SDu,e,¬ (X ) is – in 2-EXPTIME for X = EL; – in 2-EXPTIME for X ranging from ALC to DLR; – in 4-EXPTIME for X = SHIQ. Also in this case, whether these bounds are tight is still an open question. Currently, we do not know whether SDLfull or even its fragment SD6= are decidable. There exist some undecidability results for CQs and UCQs with disequalities, and we conjecture they can be carried over to service comparison. This will be a subject for future work. EL ALC DLR SHIQ k,u SD, SD NP-complete EXPTIME-complete in co3NEXPTIME SD¬ (in EXPTIME) EXPTIME-complete in co3NEXPTIME SDk,e , SDk,e,¬ (in EXPTIME) EXPTIME-complete in co3NEXPTIME SDk,u,e , SDk,u,e,¬ SDu , SDe , SDu,e (in 2-EXPTIME) in 2-EXPTIME in 4-EXPTIME SD , SDe,¬ , SDu,e,¬ u,¬ Table 1. Some complexity results for decidable cases 7 Related work The language introduced in [1], SDL(X ), was based on an embedding of service compar- ison into subsumption in an expressive description logic, µALCIO. With the reduction to query containment we adopt here, it is possible to support service intersection and dataflow graphs, even if they violate the quasi-forest structure of µALCIO. Moreover, we provide an articulated complexity analysis not available in [1]. The idea of formalizing services as queries has been first introduced in [5]. The lan- guage adopted there is simpler than ours: only one construct combining our selection and restriction, and a form of composition where output and input messages must per- fectly match. The semantics of services in [5] is restricted to the constants occurring in a KB rather than domain elements. Furthermore, all upper bounds provided there are EX- PTIME or beyond. Currently our NP bounds for EL identify the most efficient service description logics in the literature. Moreover, even in the hardest cases, our language is never more complex than [5]. In OWL-S, services are described by means of preconditions, postconditions, and add/delete lists. Pre- and postconditions are like ABoxes; add/delete lists specify the side effects of the services. The same mechanism can describe functional services. WSMO is built upon an articulated model, including user roles and goals, that lead to a planning- like view of service composition. In WSDL-S, WSDL service specifications (that are basically type definitions) are bound to concepts defined in an underlying ontology. No good computational results are currently available for any of the above standards. 8 Conclusions and future work SDLfull and its fragments are rich service description languages that—however—enjoy numerous decidability results (reported in Table 1), and in some case (SDk,u (EL)) ser- vice comparison is significantly less complex than in previous competing logics. Encour- aging experimental results are available for an analogous problem [2]. We are planning an experimental implementation based on the same technology. Many issues need further work, here we mention just the main open problems. Au- tomated service composition needs efficient heuristics for quickly selecting promising candidate dataflows. The bounds for EL reported in parentheses in Table 1 are simply inherited from more complex logic and it is not obvious whether they are tight. Dise- qualities and negation over roles would be helpful, but the undecidability results of [8] warn that some restrictions may be needed. It would also be interesting to check whether service comparison can be in NP also for other low-complexity logics such as the DL-lite family. References 1. Piero A. Bonatti. Towards service description logics. In JELIA, LNCS 2424, pages 74–85. Springer, 2002. 2. Piero A. Bonatti and F. Mogavero. Comparing rule-based policies. In 9th IEEE Int. Work. on Policies for Distributed Systems and Networks (POLICY 2008), pages 11–18, 2008. 3. D. Calvanese, G. De Giacomo, and M. Lenzerini. Conjunctive query containment and answering under description logic constraints. ACM Trans. Comput. Log., 9(3), 2008. 4. A.K. Chandra and P.M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In Proc. Ninth Annual ACM Symp. on Theory of Computing, pages 77–90, 1976. 5. D. Hull, E. Zolin, A. Bovykin, I. Horrocks, U. Sattler, and R. Stevens. Deciding semantic matching of stateless services. In Proc. of AAAI 2006. AAAI Press, 2006. 6. Magdalena Ortiz, Diego Calvanese, and Thomas Eiter. Data complexity of query answering in expressive description logics via tableaux. J. of Automated Reasoning, 41(1):61–98, 2008. 7. Riccardo Rosati. Integrating ontologies and rules: Semantic and computational issues. In Rea- soning Web, LNCS 4126, pages 128–151. Springer, 2006. 8. Riccardo Rosati. The limits of querying ontologies. In ICDT, LNCS 4353, pages 164–178. Springer, 2007. 9. Riccardo Rosati. On conjunctive query answering in EL. In Description Logics. CEUR-WS.org, 2007.